Could I ask you to review and test this?
https://github.com/mmoayyed/cas/compare/master...mmoayyed:cas:redis-fix

On Thu, Nov 10, 2022 at 1:03 PM Misagh <[email protected]> wrote:
>
> Thank you Jérôme. I'll take a look.
>
> -- Misagh
>
> On Wed, Nov 9, 2022, 2:39 PM Jérôme LELEU <[email protected]> wrote:
>>
>> Hi,
>>
>> I have made a Redis performance test between v6.5.9 and v7.0.0-RC1 and 
>> figures are very relevant.
>>
>> In both cases, I have a CAS server using a Redis ticket registry (Docker 
>> image) and a custom authentication handler to validate any username starting 
>> with "jleleu".
>> I have a script which performs a certain number of logins (without any 
>> service) for jleleuRANDOMVALUE.
>> I have overriden the RedisTickerRegistry class to add time counting in v6.5:
>>
>> private static AtomicLong getTime = new AtomicLong();
>> private static AtomicInteger nbGet = new AtomicInteger();
>>
>> @Override
>> public Ticket getTicket(final String ticketId, final Predicate<Ticket> 
>> predicate) {
>>     val t0 = System.currentTimeMillis();
>>     try {
>>         val redisKey = getTicketRedisKey(encodeTicketId(ticketId));
>>         val t = this.client.boundValueOps(redisKey).get();
>>         if (t != null) {
>>             val result = decodeTicket(t);
>>             if (predicate.test(result)) {
>>                 return result;
>>             }
>>             LOGGER.trace("The condition enforced by [{}] cannot successfully 
>> accept/test the ticket id [{}]", ticketId,
>>                     predicate.getClass().getSimpleName());
>>             return null;
>>         }
>>     } catch (final Exception e) {
>>         LOGGER.error("Failed fetching [{}]", ticketId);
>>         LoggingUtils.error(LOGGER, e);
>>     } finally {
>>         val t1 = System.currentTimeMillis();
>>         val time = t1 - t0;
>>         val t = getTime.addAndGet(time);
>>         val n = nbGet.incrementAndGet();
>>         LOGGER.info("### GET time: {} ms | Average time: {} ms", time, t / 
>> n);
>>     }
>>     return null;
>> }
>>
>>
>> And in v7:
>>
>> @Override
>> public Ticket getTicket(final String ticketId, final Predicate<Ticket> 
>> predicate) {
>>     val t0 = System.currentTimeMillis();
>>     try {
>>         val redisKey = 
>> RedisCompositeKey.builder().id(encodeTicketId(ticketId)).build().toKeyPattern();
>>         return getKeysStream(redisKey)
>>                 .map(key -> redisTemplate.boundValueOps(key).get())
>>                 .filter(Objects::nonNull)
>>                 .map(this::decodeTicket)
>>                 .filter(predicate)
>>                 .findFirst()
>>                 .orElse(null);
>>     } catch (final Exception e) {
>>         LOGGER.error("Failed fetching [{}]", ticketId);
>>         LoggingUtils.error(LOGGER, e);
>>     } finally {
>>         val t1 = System.currentTimeMillis();
>>         val time = t1 - t0;
>>         val t = getTime.addAndGet(time);
>>         val n = nbGet.incrementAndGet();
>>         LOGGER.info("### GET time: {} ms | Average time: {} ms", time, t / 
>> n);
>>     }
>>     return null;
>> }
>>
>>
>> Then I perform 1 000 and 10 000 logins (with my script) and check my logs to 
>> see the average time:
>>
>> v6.5.9:
>> 1000 logins -> Average time: 3 ms
>> 10000 logins -> Average time: 3 ms
>>
>> v7.0.0-RC1:
>> 1000 logins -> Average time: 22 ms
>> 10000 logins -> Average time: 195 ms
>>
>> So indeed, I notice a big performance issue.
>>
>> Do you need more information?
>>
>> Thanks.
>> Best regards,
>> Jérôme
>>
>>
>> Le mar. 8 nov. 2022 à 07:56, Jérôme LELEU <[email protected]> a écrit :
>>>
>>> Hi,
>>>
>>> Yes, double indexing is harder than simple indexing as the second operation 
>>> may fail and you should revert the first one (transactional aspect).
>>> If we did that for all tickets, we would double the size of the keys, but 
>>> not the size of the database though.
>>> And maybe we should have two different databases for better performance.
>>>
>>> I will make a test to check the problem.
>>>
>>> Thanks.
>>> Best regards,
>>> Jérôme
>>>
>>>
>>> Le lun. 7 nov. 2022 à 18:30, Misagh <[email protected]> a écrit :
>>>>
>>>> > Our main problem was the full SCAN of tickets to check if a ticket is a 
>>>> > TGT and if the principal is the right one ("count/get SSO sessions").
>>>> > For this one, I would have created a "double key indexing", big words 
>>>> > for a simple thing ;-)
>>>> > On 6.5.x, we stored tickets this way: key=CAS_TICKET:ticketId => 
>>>> > VALUE=serialized ticket
>>>> > I propose for the "add ticket" operation to check if it is a TGT: in 
>>>> > that case, I would add to Redis: key=CAS_TICKET_USER:ticketId:userId => 
>>>> > VALUE=nothing.
>>>> > This way, we would "only" SCAN keys of TGT sessions to find the right 
>>>> > user (CAS_TICKET_USER:*:userId) and we would retrieve all the TGT 
>>>> > identifiers for a given user.
>>>> > Then, a multi GET on these identifiers would find the SSO sessions of 
>>>> > the user.
>>>>
>>>> That's quite clever. But it's not without complications. These are not
>>>> strictly blockers, but we should likely take these into account:
>>>>
>>>> Doing the double-indexing for a TGT would also imply that the same
>>>> thing would/could be done for OIDC codes, access tokens and refresh
>>>> tokens. For example, think of operations where you'd want to execute
>>>> "get me all the access tokens issued to user X", or "all the refresh
>>>> tokens issued to user Y", etc.  This would mean that the registry
>>>> would have to somehow be tied to modules that present those extra
>>>> ticket types though I imagine this can be somewhat solved with the
>>>> ticket catalog concept. And of course, the registry size sort of
>>>> grows. 10 TGTs for unique users would actually mean 20 entries, not to
>>>> mention for every update/remove operation you'd be issuing double
>>>> queries. So it's double the index, double the number of operations. At
>>>> scale, I am not so sure this would actually be all that better, but I
>>>> have not run any conclusive tests.
>>>>
>>>> I would also be interested to see an actual test that showcases the
>>>> slowness. For example, I ran a test against a basic local redis
>>>> cluster, 7.0.5. My test, added 1000 TGTs to the registry, then fetched
>>>> them all, and then looped through the resultset asking the registry
>>>> for each ticket that was fetched again. This operation completed in
>>>> approx 13 seconds for non-encrypted tickets, and 14seconds encrypted
>>>> tickets. Then, I reverted the pattern back to what 6.5 used to do, and
>>>> I ran the same test, and I more or less saw the same execution time.
>>>> Double checked to make sure there are no obvious mistakes. Is this
>>>> also what you see? and if so, can you share some sort of a test that
>>>> actually shows or demonstrates the problem?

-- 
You received this message because you are subscribed to the Google Groups "CAS 
Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-dev/CAGSBKkd9v9ss_BCoQwH6X%2B9OrtPWyuDjNaZJy3SusMMD578gGA%40mail.gmail.com.

Reply via email to