Sorry, I missed the PageReferece part.

The lifecyle of PageReference is: depage(in intermediateMessageReferences)
-> deliver(in messageReferences) -> waiting for ack(in deliveringRefs) ->
removed. Every queue would create it's own PageReference which means the
offset info is 100 times large compared to the shared page index cache.
If we keep 51MB pageReference size in memory, as i said in pr, "For
multiple subscribers to the same address, just one executor is responsible
for delivering which means at the same moment only one queue is delivering.
Thus the queue maybe stalled for a long time. We get queueMemorySize
messages into memory, and when we deliver these after a long time, we
probably need to query message and read page file again.".  At last for one
message we maybe read twice: first we read page and create pagereference;
second we requery message after its reference is removed.

For the shared page index cache design, each message just need to be read
from file once.

Michael Pearce <[email protected]> 于2019年6月27日周四 下午3:03写道:

> Hi
>
> First of all i think this is an excellent effort, and could be a potential
> massive positive change.
>
> Before making any change on such scale, i do think we need to ensure we
> have sufficient benchmarks on a number of scenarios, not just one use case,
> and the benchmark tool used does need to be available openly so that others
> can verify the measures and check on their setups.
>
> Some additional scenarios i would want/need covering are:
>
> PageCache set to 5, and all consumers keeping up, but lagging enough to be
> reading from the same 1st page cache, latency and throughput need to be
> measured for all.
> PageCache set to 5 and all consumers but one keeping up but lagging enough
> to be reading from the same 1st page cahce, but the one is falling off the
> end, causing the page cache swapping, measure latecy and througput of those
> keeping up in the 1st page cache not caring for the one.
>
> Regards to solution some alternative approach to discuss
>
> In your scenario if i understand correctly each subscriber is effectivly
> having their own queue (1 to 1 mapping) not sharing.
> You mention kafka and say multiple consumers doent read serailly on the
> address and this is true, but per queue processing through messages
> (dispatch) is still serial even with multiple shared consumers on a queue.
>
> What about keeping the existing mechanism but having a queue hold reference
> to a page cache that the queue is currently on, being kept from gc (e.g.
> not soft) therefore meaning page cache isnt being swapped around, when you
> have queues (in your case subscribers) swapping pagecaches back and forth
> avoidning the constant re-read issue.
>
> Also i think Franz had an excellent idea, do away with pagecache in its
> current form entirely, ensure the offset is kept with the reference and
> rely on OS caching keeping hot blocks/data.
>
> Best
> Michael
>
>
>
> On Thu, 27 Jun 2019 at 05:13, yw yw <[email protected]> wrote:
>
> > Hi, folks
> >
> > This is the discussion about "ARTEMIS-2399 Fix performance degradation
> > when there are a lot of subscribers".
> >
> > First apologize i didn't clarify our thoughts.
> >
> > As noted in the part of Environment, page-max-cache-size is set to 1
> > meaning at most one page is allowed in softValueCache. We have tested
> with
> > the default page-max-cache-size which is 5, it would take some time to
> > see the performance degradation since at start the cursor positions of
> 100
> > subscribers are similar when all the messages read hits the
> softValueCache.
> > But after some time, the cursor positions are different. When these
> > positions are located more than 5 pages, it means some page would be read
> > back and forth. This can be proved by the trace log "adding pageCache
> > pageNr=xxx into cursor = test-topic" in PageCursorProviderImpl where some
> > pages are read a lot of times for the same subscriber. From the time on,
> > the performance starts to degrade. So we set page-max-cache-size to 1
> > here just to make the test process more fast and it doesn't change the
> > final result.
> >
> > The softValueCache would be removed if memory is really low, in addition
> > the map size reaches capacity(default 5). In most cases, the subscribers
> > are tailing read which are served by softValueCache(no need to bother
> > disk), thus we need to keep it. But When some subscribers fall behind,
> they
> > need to read page not in softValueCache. After looking up code, we found
> one
> > depage round is following at most MAX_SCHEDULED_RUNNERS deliver round in
> > most situations, and that's to say at most MAX_DELIVERIES_IN_LOOP *
> > MAX_SCHEDULED_RUNNERS number of messages would be depaged next. If you
> > adjust QueueImpl logger to debug level, you would see logs like "Queue
> > Memory Size after depage on queue=sub4 is 53478769 with maxSize =
> 52428800.
> > Depaged 68 messages, pendingDelivery=1002, intermediateMessageReferences=
> > 23162, queueDelivering=0". In order to depage less than 2000 messages,
> > each subscriber has to read a whole page which is unnecessary and
> wasteful.
> > In our test where one page(50MB) contains ~40000 messages, one subscriber
> > maybe read 40000/2000=20 times of page if softValueCache is evicted to
> > finish delivering it. This has drastically slowed down the process and
> > burdened on the disk. So we add the PageIndexCacheImpl and read one
> message
> > each time rather than read all messages of page. In this way, for each
> > subscriber each page is read only once after finishing delivering.
> >
> > Having said that, the softValueCache is used for tailing read. If it's
> > evicted, it won't be reloaded to prevent from the issue illustrated
> above.
> > Instead the pageIndexCache would be used.
> >
> > Regarding implementation details, we noted that before delivering page, a
> > pageCursorInfo is constructed which needs to read the whole page. We can
> > take this opportunity to construct the pageIndexCache. It's very simple
> to
> > code. We also think of building a offset index file and some concerns
> > stemed from following:
> >
> >    1. When to write and sync index file? Would it have some performance
> >    implications?
> >    2. If we have a index file, we can construct pageCursorInfo through
> >    it(no need to read the page like before), but we need to write the
> total
> >    message number into it first. Seems a little weird putting this into
> the
> >    index file.
> >    3. If experiencing hard crash, a recover mechanism would be needed to
> >    recover page and page index files, E.g. truncating to the valid size.
> So
> >    how do we know which files need to be sanity checked?
> >    4. A variant binary search algorithm maybe needed, see
> >
> https://github.com/apache/kafka/blob/70ddd8af71938b4f5f6d1bb3df6243ef13359bcf/core/src/main/scala/kafka/log/AbstractIndex.scala
> >     .
> >    5. Unlike kafka from which user fetches lots of messages at once and
> >    broker just needs to look up start offset from the index file once,
> artemis
> >    delivers message one by one and that means we have to look up the
> index
> >    every time we deliver a message. Although the index file is possibly
> in
> >    page cache, there are still chances we miss cache.
> >    6. Compatibility with old files.
> >
> > To sum that, kafka uses a mmaped index file and we use a index cache.
> Both
> > are designed to find physical file position according offset(kafka) or
> > message number(artemis). And we prefer the index cache bcs it's easy to
> > understand and maintain.
> >
> > We also tested the one subscriber case with the same setup.
> > The original:
> > consumer tps(11000msg/s) and latency:
> > [image: orig_single_subscriber.png]
> > producer tps(30000msg/s) and latency:
> > [image: orig_single_producer.png]
> > The pr:
> > consumer tps(14000msg/s) and latency:
> > [image: pr_single_consumer.png]
> > producer tps(30000msg/s) and latency:
> > [image: pr_single_producer.png]
> > It showed result is similar and event a little better in the case of
> > single subscriber.
> >
> > We used our inner test platform and i think jmeter can also be used to
> > test again it.
> >
>

Reply via email to