Re: [cas-dev] Re: [cas-user] very slow ticket delivery on CAS 6.6 & redis ticket registry

Jérôme LELEU Mon, 14 Nov 2022 04:58:32 -0800

Hi,

I have made new tests.


With the new implementation, I have experienced Redis crashes, but I'm not
sure this is meaningful.
In any case, I have updated to Redis v7 with 500Mo of memory.

I have launched my previous scenario again (10 000 logins).
CAS v6.5: Average time: 2 ms
CAS v7.0.0 fix REDIS: Average time: 0 ms

Things are now blazing fast with the new implementation but I see you have
added a memory cache so this is expected on a single node.

So I have created a 2 nodes scenario with 10 000 login?service + service
ticket validation, each call (GET /login, POST /login, GET
/serviceValidate) being performed on a different node than the previous
call (round robin).

CAS v6.5 :
Average time node 1: 1 ms
Average time node 2: 1 ms

CAS v7.0.0 fix REDIS :
Average time node 1: 2 ms
Average time node 2: 2 ms

While it performs better on CAS v6.5, it now performs very well on CAS v7
as well.

Did you change something else in addition to the cache?

Thanks.
Best regards,
Jérôme


Le jeu. 10 nov. 2022 à 17:47, Jérôme LELEU <lel...@gmail.com> a écrit :

> Sure. I will test that on Monday (tomorrow is off in France :-)
>
> Le jeu. 10 nov. 2022 à 17:27, Misagh <misagh.moay...@gmail.com> a écrit :
>
>> Could I ask you to review and test this?
>> https://github.com/mmoayyed/cas/compare/master...mmoayyed:cas:redis-fix
>>
>> On Thu, Nov 10, 2022 at 1:03 PM Misagh <misagh.moay...@gmail.com> wrote:
>> >
>> > Thank you Jérôme. I'll take a look.
>> >
>> > -- Misagh
>> >
>> > On Wed, Nov 9, 2022, 2:39 PM Jérôme LELEU <lel...@gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have made a Redis performance test between v6.5.9 and v7.0.0-RC1 and
>> figures are very relevant.
>> >>
>> >> In both cases, I have a CAS server using a Redis ticket registry
>> (Docker image) and a custom authentication handler to validate any username
>> starting with "jleleu".
>> >> I have a script which performs a certain number of logins (without any
>> service) for jleleuRANDOMVALUE.
>> >> I have overriden the RedisTickerRegistry class to add time counting in
>> v6.5:
>> >>
>> >> private static AtomicLong getTime = new AtomicLong();
>> >> private static AtomicInteger nbGet = new AtomicInteger();
>> >>
>> >> @Override
>> >> public Ticket getTicket(final String ticketId, final Predicate<Ticket>
>> predicate) {
>> >>     val t0 = System.currentTimeMillis();
>> >>     try {
>> >>         val redisKey = getTicketRedisKey(encodeTicketId(ticketId));
>> >>         val t = this.client.boundValueOps(redisKey).get();
>> >>         if (t != null) {
>> >>             val result = decodeTicket(t);
>> >>             if (predicate.test(result)) {
>> >>                 return result;
>> >>             }
>> >>             LOGGER.trace("The condition enforced by [{}] cannot
>> successfully accept/test the ticket id [{}]", ticketId,
>> >>                     predicate.getClass().getSimpleName());
>> >>             return null;
>> >>         }
>> >>     } catch (final Exception e) {
>> >>         LOGGER.error("Failed fetching [{}]", ticketId);
>> >>         LoggingUtils.error(LOGGER, e);
>> >>     } finally {
>> >>         val t1 = System.currentTimeMillis();
>> >>         val time = t1 - t0;
>> >>         val t = getTime.addAndGet(time);
>> >>         val n = nbGet.incrementAndGet();
>> >>         LOGGER.info("### GET time: {} ms | Average time: {} ms", time,
>> t / n);
>> >>     }
>> >>     return null;
>> >> }
>> >>
>> >>
>> >> And in v7:
>> >>
>> >> @Override
>> >> public Ticket getTicket(final String ticketId, final Predicate<Ticket>
>> predicate) {
>> >>     val t0 = System.currentTimeMillis();
>> >>     try {
>> >>         val redisKey =
>> RedisCompositeKey.builder().id(encodeTicketId(ticketId)).build().toKeyPattern();
>> >>         return getKeysStream(redisKey)
>> >>                 .map(key -> redisTemplate.boundValueOps(key).get())
>> >>                 .filter(Objects::nonNull)
>> >>                 .map(this::decodeTicket)
>> >>                 .filter(predicate)
>> >>                 .findFirst()
>> >>                 .orElse(null);
>> >>     } catch (final Exception e) {
>> >>         LOGGER.error("Failed fetching [{}]", ticketId);
>> >>         LoggingUtils.error(LOGGER, e);
>> >>     } finally {
>> >>         val t1 = System.currentTimeMillis();
>> >>         val time = t1 - t0;
>> >>         val t = getTime.addAndGet(time);
>> >>         val n = nbGet.incrementAndGet();
>> >>         LOGGER.info("### GET time: {} ms | Average time: {} ms", time,
>> t / n);
>> >>     }
>> >>     return null;
>> >> }
>> >>
>> >>
>> >> Then I perform 1 000 and 10 000 logins (with my script) and check my
>> logs to see the average time:
>> >>
>> >> v6.5.9:
>> >> 1000 logins -> Average time: 3 ms
>> >> 10000 logins -> Average time: 3 ms
>> >>
>> >> v7.0.0-RC1:
>> >> 1000 logins -> Average time: 22 ms
>> >> 10000 logins -> Average time: 195 ms
>> >>
>> >> So indeed, I notice a big performance issue.
>> >>
>> >> Do you need more information?
>> >>
>> >> Thanks.
>> >> Best regards,
>> >> Jérôme
>> >>
>> >>
>> >> Le mar. 8 nov. 2022 à 07:56, Jérôme LELEU <lel...@gmail.com> a écrit :
>> >>>
>> >>> Hi,
>> >>>
>> >>> Yes, double indexing is harder than simple indexing as the second
>> operation may fail and you should revert the first one (transactional
>> aspect).
>> >>> If we did that for all tickets, we would double the size of the keys,
>> but not the size of the database though.
>> >>> And maybe we should have two different databases for better
>> performance.
>> >>>
>> >>> I will make a test to check the problem.
>> >>>
>> >>> Thanks.
>> >>> Best regards,
>> >>> Jérôme
>> >>>
>> >>>
>> >>> Le lun. 7 nov. 2022 à 18:30, Misagh <misagh.moay...@gmail.com> a
>> écrit :
>> >>>>
>> >>>> > Our main problem was the full SCAN of tickets to check if a ticket
>> is a TGT and if the principal is the right one ("count/get SSO sessions").
>> >>>> > For this one, I would have created a "double key indexing", big
>> words for a simple thing ;-)
>> >>>> > On 6.5.x, we stored tickets this way: key=CAS_TICKET:ticketId =>
>> VALUE=serialized ticket
>> >>>> > I propose for the "add ticket" operation to check if it is a TGT:
>> in that case, I would add to Redis: key=CAS_TICKET_USER:ticketId:userId =>
>> VALUE=nothing.
>> >>>> > This way, we would "only" SCAN keys of TGT sessions to find the
>> right user (CAS_TICKET_USER:*:userId) and we would retrieve all the TGT
>> identifiers for a given user.
>> >>>> > Then, a multi GET on these identifiers would find the SSO sessions
>> of the user.
>> >>>>
>> >>>> That's quite clever. But it's not without complications. These are
>> not
>> >>>> strictly blockers, but we should likely take these into account:
>> >>>>
>> >>>> Doing the double-indexing for a TGT would also imply that the same
>> >>>> thing would/could be done for OIDC codes, access tokens and refresh
>> >>>> tokens. For example, think of operations where you'd want to execute
>> >>>> "get me all the access tokens issued to user X", or "all the refresh
>> >>>> tokens issued to user Y", etc.  This would mean that the registry
>> >>>> would have to somehow be tied to modules that present those extra
>> >>>> ticket types though I imagine this can be somewhat solved with the
>> >>>> ticket catalog concept. And of course, the registry size sort of
>> >>>> grows. 10 TGTs for unique users would actually mean 20 entries, not
>> to
>> >>>> mention for every update/remove operation you'd be issuing double
>> >>>> queries. So it's double the index, double the number of operations.
>> At
>> >>>> scale, I am not so sure this would actually be all that better, but I
>> >>>> have not run any conclusive tests.
>> >>>>
>> >>>> I would also be interested to see an actual test that showcases the
>> >>>> slowness. For example, I ran a test against a basic local redis
>> >>>> cluster, 7.0.5. My test, added 1000 TGTs to the registry, then
>> fetched
>> >>>> them all, and then looped through the resultset asking the registry
>> >>>> for each ticket that was fetched again. This operation completed in
>> >>>> approx 13 seconds for non-encrypted tickets, and 14seconds encrypted
>> >>>> tickets. Then, I reverted the pattern back to what 6.5 used to do,
>> and
>> >>>> I ran the same test, and I more or less saw the same execution time.
>> >>>> Double checked to make sure there are no obvious mistakes. Is this
>> >>>> also what you see? and if so, can you share some sort of a test that
>> >>>> actually shows or demonstrates the problem?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "CAS Developer" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cas-dev+unsubscr...@apereo.org.
>> To view this discussion on the web visit
>> https://groups.google.com/a/apereo.org/d/msgid/cas-dev/CAGSBKkd9v9ss_BCoQwH6X%2B9OrtPWyuDjNaZJy3SusMMD578gGA%40mail.gmail.com
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "CAS 
Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to cas-dev+unsubscr...@apereo.org.
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-dev/CAP279LyH1ZhdNd1vJXm_kMnzaJiTAhcAr1Ab%2BXRk0T_G1QmqKg%40mail.gmail.com.

Re: [cas-dev] Re: [cas-user] very slow ticket delivery on CAS 6.6 & redis ticket registry

Reply via email to