Could I ask you to review and test this? https://github.com/mmoayyed/cas/compare/master...mmoayyed:cas:redis-fix
On Thu, Nov 10, 2022 at 1:03 PM Misagh <[email protected]> wrote: > > Thank you Jérôme. I'll take a look. > > -- Misagh > > On Wed, Nov 9, 2022, 2:39 PM Jérôme LELEU <[email protected]> wrote: >> >> Hi, >> >> I have made a Redis performance test between v6.5.9 and v7.0.0-RC1 and >> figures are very relevant. >> >> In both cases, I have a CAS server using a Redis ticket registry (Docker >> image) and a custom authentication handler to validate any username starting >> with "jleleu". >> I have a script which performs a certain number of logins (without any >> service) for jleleuRANDOMVALUE. >> I have overriden the RedisTickerRegistry class to add time counting in v6.5: >> >> private static AtomicLong getTime = new AtomicLong(); >> private static AtomicInteger nbGet = new AtomicInteger(); >> >> @Override >> public Ticket getTicket(final String ticketId, final Predicate<Ticket> >> predicate) { >> val t0 = System.currentTimeMillis(); >> try { >> val redisKey = getTicketRedisKey(encodeTicketId(ticketId)); >> val t = this.client.boundValueOps(redisKey).get(); >> if (t != null) { >> val result = decodeTicket(t); >> if (predicate.test(result)) { >> return result; >> } >> LOGGER.trace("The condition enforced by [{}] cannot successfully >> accept/test the ticket id [{}]", ticketId, >> predicate.getClass().getSimpleName()); >> return null; >> } >> } catch (final Exception e) { >> LOGGER.error("Failed fetching [{}]", ticketId); >> LoggingUtils.error(LOGGER, e); >> } finally { >> val t1 = System.currentTimeMillis(); >> val time = t1 - t0; >> val t = getTime.addAndGet(time); >> val n = nbGet.incrementAndGet(); >> LOGGER.info("### GET time: {} ms | Average time: {} ms", time, t / >> n); >> } >> return null; >> } >> >> >> And in v7: >> >> @Override >> public Ticket getTicket(final String ticketId, final Predicate<Ticket> >> predicate) { >> val t0 = System.currentTimeMillis(); >> try { >> val redisKey = >> RedisCompositeKey.builder().id(encodeTicketId(ticketId)).build().toKeyPattern(); >> return getKeysStream(redisKey) >> .map(key -> redisTemplate.boundValueOps(key).get()) >> .filter(Objects::nonNull) >> .map(this::decodeTicket) >> .filter(predicate) >> .findFirst() >> .orElse(null); >> } catch (final Exception e) { >> LOGGER.error("Failed fetching [{}]", ticketId); >> LoggingUtils.error(LOGGER, e); >> } finally { >> val t1 = System.currentTimeMillis(); >> val time = t1 - t0; >> val t = getTime.addAndGet(time); >> val n = nbGet.incrementAndGet(); >> LOGGER.info("### GET time: {} ms | Average time: {} ms", time, t / >> n); >> } >> return null; >> } >> >> >> Then I perform 1 000 and 10 000 logins (with my script) and check my logs to >> see the average time: >> >> v6.5.9: >> 1000 logins -> Average time: 3 ms >> 10000 logins -> Average time: 3 ms >> >> v7.0.0-RC1: >> 1000 logins -> Average time: 22 ms >> 10000 logins -> Average time: 195 ms >> >> So indeed, I notice a big performance issue. >> >> Do you need more information? >> >> Thanks. >> Best regards, >> Jérôme >> >> >> Le mar. 8 nov. 2022 à 07:56, Jérôme LELEU <[email protected]> a écrit : >>> >>> Hi, >>> >>> Yes, double indexing is harder than simple indexing as the second operation >>> may fail and you should revert the first one (transactional aspect). >>> If we did that for all tickets, we would double the size of the keys, but >>> not the size of the database though. >>> And maybe we should have two different databases for better performance. >>> >>> I will make a test to check the problem. >>> >>> Thanks. >>> Best regards, >>> Jérôme >>> >>> >>> Le lun. 7 nov. 2022 à 18:30, Misagh <[email protected]> a écrit : >>>> >>>> > Our main problem was the full SCAN of tickets to check if a ticket is a >>>> > TGT and if the principal is the right one ("count/get SSO sessions"). >>>> > For this one, I would have created a "double key indexing", big words >>>> > for a simple thing ;-) >>>> > On 6.5.x, we stored tickets this way: key=CAS_TICKET:ticketId => >>>> > VALUE=serialized ticket >>>> > I propose for the "add ticket" operation to check if it is a TGT: in >>>> > that case, I would add to Redis: key=CAS_TICKET_USER:ticketId:userId => >>>> > VALUE=nothing. >>>> > This way, we would "only" SCAN keys of TGT sessions to find the right >>>> > user (CAS_TICKET_USER:*:userId) and we would retrieve all the TGT >>>> > identifiers for a given user. >>>> > Then, a multi GET on these identifiers would find the SSO sessions of >>>> > the user. >>>> >>>> That's quite clever. But it's not without complications. These are not >>>> strictly blockers, but we should likely take these into account: >>>> >>>> Doing the double-indexing for a TGT would also imply that the same >>>> thing would/could be done for OIDC codes, access tokens and refresh >>>> tokens. For example, think of operations where you'd want to execute >>>> "get me all the access tokens issued to user X", or "all the refresh >>>> tokens issued to user Y", etc. This would mean that the registry >>>> would have to somehow be tied to modules that present those extra >>>> ticket types though I imagine this can be somewhat solved with the >>>> ticket catalog concept. And of course, the registry size sort of >>>> grows. 10 TGTs for unique users would actually mean 20 entries, not to >>>> mention for every update/remove operation you'd be issuing double >>>> queries. So it's double the index, double the number of operations. At >>>> scale, I am not so sure this would actually be all that better, but I >>>> have not run any conclusive tests. >>>> >>>> I would also be interested to see an actual test that showcases the >>>> slowness. For example, I ran a test against a basic local redis >>>> cluster, 7.0.5. My test, added 1000 TGTs to the registry, then fetched >>>> them all, and then looped through the resultset asking the registry >>>> for each ticket that was fetched again. This operation completed in >>>> approx 13 seconds for non-encrypted tickets, and 14seconds encrypted >>>> tickets. Then, I reverted the pattern back to what 6.5 used to do, and >>>> I ran the same test, and I more or less saw the same execution time. >>>> Double checked to make sure there are no obvious mistakes. Is this >>>> also what you see? and if so, can you share some sort of a test that >>>> actually shows or demonstrates the problem? -- You received this message because you are subscribed to the Google Groups "CAS Developer" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-dev/CAGSBKkd9v9ss_BCoQwH6X%2B9OrtPWyuDjNaZJy3SusMMD578gGA%40mail.gmail.com.
