Re: [cas-dev] Re: [cas-user] very slow ticket delivery on CAS 6.6 & redis ticket registry

Misagh Thu, 10 Nov 2022 08:27:32 -0800

Could I ask you to review and test this?
https://github.com/mmoayyed/cas/compare/master...mmoayyed:cas:redis-fix


On Thu, Nov 10, 2022 at 1:03 PM Misagh <misagh.moay...@gmail.com> wrote:
>
> Thank you Jérôme. I'll take a look.
>
> -- Misagh
>
> On Wed, Nov 9, 2022, 2:39 PM Jérôme LELEU <lel...@gmail.com> wrote:
>>
>> Hi,
>>
>> I have made a Redis performance test between v6.5.9 and v7.0.0-RC1 and 
>> figures are very relevant.
>>
>> In both cases, I have a CAS server using a Redis ticket registry (Docker 
>> image) and a custom authentication handler to validate any username starting 
>> with "jleleu".
>> I have a script which performs a certain number of logins (without any 
>> service) for jleleuRANDOMVALUE.
>> I have overriden the RedisTickerRegistry class to add time counting in v6.5:
>>
>> private static AtomicLong getTime = new AtomicLong();
>> private static AtomicInteger nbGet = new AtomicInteger();
>>
>> @Override
>> public Ticket getTicket(final String ticketId, final Predicate<Ticket> 
>> predicate) {
>>     val t0 = System.currentTimeMillis();
>>     try {
>>         val redisKey = getTicketRedisKey(encodeTicketId(ticketId));
>>         val t = this.client.boundValueOps(redisKey).get();
>>         if (t != null) {
>>             val result = decodeTicket(t);
>>             if (predicate.test(result)) {
>>                 return result;
>>             }
>>             LOGGER.trace("The condition enforced by [{}] cannot successfully 
>> accept/test the ticket id [{}]", ticketId,
>>                     predicate.getClass().getSimpleName());
>>             return null;
>>         }
>>     } catch (final Exception e) {
>>         LOGGER.error("Failed fetching [{}]", ticketId);
>>         LoggingUtils.error(LOGGER, e);
>>     } finally {
>>         val t1 = System.currentTimeMillis();
>>         val time = t1 - t0;
>>         val t = getTime.addAndGet(time);
>>         val n = nbGet.incrementAndGet();
>>         LOGGER.info("### GET time: {} ms | Average time: {} ms", time, t / 
>> n);
>>     }
>>     return null;
>> }
>>
>>
>> And in v7:
>>
>> @Override
>> public Ticket getTicket(final String ticketId, final Predicate<Ticket> 
>> predicate) {
>>     val t0 = System.currentTimeMillis();
>>     try {
>>         val redisKey = 
>> RedisCompositeKey.builder().id(encodeTicketId(ticketId)).build().toKeyPattern();
>>         return getKeysStream(redisKey)
>>                 .map(key -> redisTemplate.boundValueOps(key).get())
>>                 .filter(Objects::nonNull)
>>                 .map(this::decodeTicket)
>>                 .filter(predicate)
>>                 .findFirst()
>>                 .orElse(null);
>>     } catch (final Exception e) {
>>         LOGGER.error("Failed fetching [{}]", ticketId);
>>         LoggingUtils.error(LOGGER, e);
>>     } finally {
>>         val t1 = System.currentTimeMillis();
>>         val time = t1 - t0;
>>         val t = getTime.addAndGet(time);
>>         val n = nbGet.incrementAndGet();
>>         LOGGER.info("### GET time: {} ms | Average time: {} ms", time, t / 
>> n);
>>     }
>>     return null;
>> }
>>
>>
>> Then I perform 1 000 and 10 000 logins (with my script) and check my logs to 
>> see the average time:
>>
>> v6.5.9:
>> 1000 logins -> Average time: 3 ms
>> 10000 logins -> Average time: 3 ms
>>
>> v7.0.0-RC1:
>> 1000 logins -> Average time: 22 ms
>> 10000 logins -> Average time: 195 ms
>>
>> So indeed, I notice a big performance issue.
>>
>> Do you need more information?
>>
>> Thanks.
>> Best regards,
>> Jérôme
>>
>>
>> Le mar. 8 nov. 2022 à 07:56, Jérôme LELEU <lel...@gmail.com> a écrit :
>>>
>>> Hi,
>>>
>>> Yes, double indexing is harder than simple indexing as the second operation 
>>> may fail and you should revert the first one (transactional aspect).
>>> If we did that for all tickets, we would double the size of the keys, but 
>>> not the size of the database though.
>>> And maybe we should have two different databases for better performance.
>>>
>>> I will make a test to check the problem.
>>>
>>> Thanks.
>>> Best regards,
>>> Jérôme
>>>
>>>
>>> Le lun. 7 nov. 2022 à 18:30, Misagh <misagh.moay...@gmail.com> a écrit :
>>>>
>>>> > Our main problem was the full SCAN of tickets to check if a ticket is a 
>>>> > TGT and if the principal is the right one ("count/get SSO sessions").
>>>> > For this one, I would have created a "double key indexing", big words 
>>>> > for a simple thing ;-)
>>>> > On 6.5.x, we stored tickets this way: key=CAS_TICKET:ticketId => 
>>>> > VALUE=serialized ticket
>>>> > I propose for the "add ticket" operation to check if it is a TGT: in 
>>>> > that case, I would add to Redis: key=CAS_TICKET_USER:ticketId:userId => 
>>>> > VALUE=nothing.
>>>> > This way, we would "only" SCAN keys of TGT sessions to find the right 
>>>> > user (CAS_TICKET_USER:*:userId) and we would retrieve all the TGT 
>>>> > identifiers for a given user.
>>>> > Then, a multi GET on these identifiers would find the SSO sessions of 
>>>> > the user.
>>>>
>>>> That's quite clever. But it's not without complications. These are not
>>>> strictly blockers, but we should likely take these into account:
>>>>
>>>> Doing the double-indexing for a TGT would also imply that the same
>>>> thing would/could be done for OIDC codes, access tokens and refresh
>>>> tokens. For example, think of operations where you'd want to execute
>>>> "get me all the access tokens issued to user X", or "all the refresh
>>>> tokens issued to user Y", etc.  This would mean that the registry
>>>> would have to somehow be tied to modules that present those extra
>>>> ticket types though I imagine this can be somewhat solved with the
>>>> ticket catalog concept. And of course, the registry size sort of
>>>> grows. 10 TGTs for unique users would actually mean 20 entries, not to
>>>> mention for every update/remove operation you'd be issuing double
>>>> queries. So it's double the index, double the number of operations. At
>>>> scale, I am not so sure this would actually be all that better, but I
>>>> have not run any conclusive tests.
>>>>
>>>> I would also be interested to see an actual test that showcases the
>>>> slowness. For example, I ran a test against a basic local redis
>>>> cluster, 7.0.5. My test, added 1000 TGTs to the registry, then fetched
>>>> them all, and then looped through the resultset asking the registry
>>>> for each ticket that was fetched again. This operation completed in
>>>> approx 13 seconds for non-encrypted tickets, and 14seconds encrypted
>>>> tickets. Then, I reverted the pattern back to what 6.5 used to do, and
>>>> I ran the same test, and I more or less saw the same execution time.
>>>> Double checked to make sure there are no obvious mistakes. Is this
>>>> also what you see? and if so, can you share some sort of a test that
>>>> actually shows or demonstrates the problem?

-- 
You received this message because you are subscribed to the Google Groups "CAS 
Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to cas-dev+unsubscr...@apereo.org.
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-dev/CAGSBKkd9v9ss_BCoQwH6X%2B9OrtPWyuDjNaZJy3SusMMD578gGA%40mail.gmail.com.

Re: [cas-dev] Re: [cas-user] very slow ticket delivery on CAS 6.6 & redis ticket registry

Reply via email to