Thank you Jérôme. I'll take a look.

-- Misagh

On Wed, Nov 9, 2022, 2:39 PM Jérôme LELEU <[email protected]> wrote:

> Hi,
>
> I have made a Redis performance test between v6.5.9 and v7.0.0-RC1 and
> figures are very relevant.
>
> In both cases, I have a CAS server using a Redis ticket registry (Docker
> image) and a custom authentication handler to validate any username
> starting with "jleleu".
> I have a script which performs a certain number of logins (without any
> service) for *jleleuRANDOMVALUE*.
> I have overriden the *RedisTickerRegistry* class to add time counting in
> v6.5:
>
> private static AtomicLong getTime = new AtomicLong();
> private static AtomicInteger nbGet = new AtomicInteger();
>
> @Override
> public Ticket getTicket(final String ticketId, final Predicate<Ticket> 
> predicate) {
>     val t0 = System.currentTimeMillis();
>     try {
>         val redisKey = getTicketRedisKey(encodeTicketId(ticketId));
>         val t = this.client.boundValueOps(redisKey).get();
>         if (t != null) {
>             val result = decodeTicket(t);
>             if (predicate.test(result)) {
>                 return result;
>             }
>             LOGGER.trace("The condition enforced by [{}] cannot successfully 
> accept/test the ticket id [{}]", ticketId,
>                     predicate.getClass().getSimpleName());
>             return null;
>         }
>     } catch (final Exception e) {
>         LOGGER.error("Failed fetching [{}]", ticketId);
>         LoggingUtils.error(LOGGER, e);
>     } finally {
>         val t1 = System.currentTimeMillis();
>         val time = t1 - t0;
>         val t = getTime.addAndGet(time);
>         val n = nbGet.incrementAndGet();
>         LOGGER.info("### GET time: {} ms | Average time: {} ms", time, t / n);
>     }
>     return null;
> }
>
>
> And in v7:
>
> @Override
> public Ticket getTicket(final String ticketId, final Predicate<Ticket> 
> predicate) {
>     val t0 = System.currentTimeMillis();
>     try {
>         val redisKey = 
> RedisCompositeKey.builder().id(encodeTicketId(ticketId)).build().toKeyPattern();
>         return getKeysStream(redisKey)
>                 .map(key -> redisTemplate.boundValueOps(key).get())
>                 .filter(Objects::nonNull)
>                 .map(this::decodeTicket)
>                 .filter(predicate)
>                 .findFirst()
>                 .orElse(null);
>     } catch (final Exception e) {
>         LOGGER.error("Failed fetching [{}]", ticketId);
>         LoggingUtils.error(LOGGER, e);
>     } finally {
>         val t1 = System.currentTimeMillis();
>         val time = t1 - t0;
>         val t = getTime.addAndGet(time);
>         val n = nbGet.incrementAndGet();
>         LOGGER.info("### GET time: {} ms | Average time: {} ms", time, t / n);
>     }
>     return null;
> }
>
>
> Then I perform 1 000 and 10 000 logins (with my script) and check my logs
> to see the average time:
>
> v6.5.9:
> 1000 logins -> Average time: 3 ms
> 10000 logins -> Average time: 3 ms
>
> v7.0.0-RC1:
> 1000 logins -> Average time: 22 ms
> 10000 logins -> Average time: 195 ms
>
> So indeed, I notice a big performance issue.
>
> Do you need more information?
>
> Thanks.
> Best regards,
> Jérôme
>
>
> Le mar. 8 nov. 2022 à 07:56, Jérôme LELEU <[email protected]> a écrit :
>
>> Hi,
>>
>> Yes, double indexing is harder than simple indexing as the second
>> operation may fail and you should revert the first one (transactional
>> aspect).
>> If we did that for all tickets, we would double the size of the keys, but
>> not the size of the database though.
>> And maybe we should have two different databases for better performance.
>>
>> I will make a test to check the problem.
>>
>> Thanks.
>> Best regards,
>> Jérôme
>>
>>
>> Le lun. 7 nov. 2022 à 18:30, Misagh <[email protected]> a écrit :
>>
>>> > Our main problem was the full SCAN of tickets to check if a ticket is
>>> a TGT and if the principal is the right one ("count/get SSO sessions").
>>> > For this one, I would have created a "double key indexing", big words
>>> for a simple thing ;-)
>>> > On 6.5.x, we stored tickets this way: key=CAS_TICKET:ticketId =>
>>> VALUE=serialized ticket
>>> > I propose for the "add ticket" operation to check if it is a TGT: in
>>> that case, I would add to Redis: key=CAS_TICKET_USER:ticketId:userId =>
>>> VALUE=nothing.
>>> > This way, we would "only" SCAN keys of TGT sessions to find the right
>>> user (CAS_TICKET_USER:*:userId) and we would retrieve all the TGT
>>> identifiers for a given user.
>>> > Then, a multi GET on these identifiers would find the SSO sessions of
>>> the user.
>>>
>>> That's quite clever. But it's not without complications. These are not
>>> strictly blockers, but we should likely take these into account:
>>>
>>> Doing the double-indexing for a TGT would also imply that the same
>>> thing would/could be done for OIDC codes, access tokens and refresh
>>> tokens. For example, think of operations where you'd want to execute
>>> "get me all the access tokens issued to user X", or "all the refresh
>>> tokens issued to user Y", etc.  This would mean that the registry
>>> would have to somehow be tied to modules that present those extra
>>> ticket types though I imagine this can be somewhat solved with the
>>> ticket catalog concept. And of course, the registry size sort of
>>> grows. 10 TGTs for unique users would actually mean 20 entries, not to
>>> mention for every update/remove operation you'd be issuing double
>>> queries. So it's double the index, double the number of operations. At
>>> scale, I am not so sure this would actually be all that better, but I
>>> have not run any conclusive tests.
>>>
>>> I would also be interested to see an actual test that showcases the
>>> slowness. For example, I ran a test against a basic local redis
>>> cluster, 7.0.5. My test, added 1000 TGTs to the registry, then fetched
>>> them all, and then looped through the resultset asking the registry
>>> for each ticket that was fetched again. This operation completed in
>>> approx 13 seconds for non-encrypted tickets, and 14seconds encrypted
>>> tickets. Then, I reverted the pattern back to what 6.5 used to do, and
>>> I ran the same test, and I more or less saw the same execution time.
>>> Double checked to make sure there are no obvious mistakes. Is this
>>> also what you see? and if so, can you share some sort of a test that
>>> actually shows or demonstrates the problem?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "CAS 
Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-dev/CAGSBKkdLcoetNrEki68DCgNqL3K0BJ_CcWc8F6_fRK%3DDNmPv-g%40mail.gmail.com.

Reply via email to