Hi, folks This is the discussion about "ARTEMIS-2399 Fix performance degradation when there are a lot of subscribers".
First apologize i didn't clarify our thoughts. As noted in the part of Environment, page-max-cache-size is set to 1 meaning at most one page is allowed in softValueCache. We have tested with the default page-max-cache-size which is 5, it would take some time to see the performance degradation since at start the cursor positions of 100 subscribers are similar when all the messages read hits the softValueCache. But after some time, the cursor positions are different. When these positions are located more than 5 pages, it means some page would be read back and forth. This can be proved by the trace log "adding pageCache pageNr=xxx into cursor = test-topic" in PageCursorProviderImpl where some pages are read a lot of times for the same subscriber. From the time on, the performance starts to degrade. So we set page-max-cache-size to 1 here just to make the test process more fast and it doesn't change the final result. The softValueCache would be removed if memory is really low, in addition the map size reaches capacity(default 5). In most cases, the subscribers are tailing read which are served by softValueCache(no need to bother disk), thus we need to keep it. But When some subscribers fall behind, they need to read page not in softValueCache. After looking up code, we found one depage round is following at most MAX_SCHEDULED_RUNNERS deliver round in most situations, and that's to say at most MAX_DELIVERIES_IN_LOOP * MAX_SCHEDULED_RUNNERS number of messages would be depaged next. If you adjust QueueImpl logger to debug level, you would see logs like "Queue Memory Size after depage on queue=sub4 is 53478769 with maxSize = 52428800. Depaged 68 messages, pendingDelivery=1002, intermediateMessageReferences= 23162, queueDelivering=0". In order to depage less than 2000 messages, each subscriber has to read a whole page which is unnecessary and wasteful. In our test where one page(50MB) contains ~40000 messages, one subscriber maybe read 40000/2000=20 times of page if softValueCache is evicted to finish delivering it. This has drastically slowed down the process and burdened on the disk. So we add the PageIndexCacheImpl and read one message each time rather than read all messages of page. In this way, for each subscriber each page is read only once after finishing delivering. Having said that, the softValueCache is used for tailing read. If it's evicted, it won't be reloaded to prevent from the issue illustrated above. Instead the pageIndexCache would be used. Regarding implementation details, we noted that before delivering page, a pageCursorInfo is constructed which needs to read the whole page. We can take this opportunity to construct the pageIndexCache. It's very simple to code. We also think of building a offset index file and some concerns stemed from following: 1. When to write and sync index file? Would it have some performance implications? 2. If we have a index file, we can construct pageCursorInfo through it(no need to read the page like before), but we need to write the total message number into it first. Seems a little weird putting this into the index file. 3. If experiencing hard crash, a recover mechanism would be needed to recover page and page index files, E.g. truncating to the valid size. So how do we know which files need to be sanity checked? 4. A variant binary search algorithm maybe needed, see https://github.com/apache/kafka/blob/70ddd8af71938b4f5f6d1bb3df6243ef13359bcf/core/src/main/scala/kafka/log/AbstractIndex.scala . 5. Unlike kafka from which user fetches lots of messages at once and broker just needs to look up start offset from the index file once, artemis delivers message one by one and that means we have to look up the index every time we deliver a message. Although the index file is possibly in page cache, there are still chances we miss cache. 6. Compatibility with old files. To sum that, kafka uses a mmaped index file and we use a index cache. Both are designed to find physical file position according offset(kafka) or message number(artemis). And we prefer the index cache bcs it's easy to understand and maintain. We also tested the one subscriber case with the same setup. The original: consumer tps(11000msg/s) and latency: [image: orig_single_subscriber.png] producer tps(30000msg/s) and latency: [image: orig_single_producer.png] The pr: consumer tps(14000msg/s) and latency: [image: pr_single_consumer.png] producer tps(30000msg/s) and latency: [image: pr_single_producer.png] It showed result is similar and event a little better in the case of single subscriber. We used our inner test platform and i think jmeter can also be used to test again it.
