Re: Improve paging performance when there are lots of subscribers

michael . andre . pearce Fri, 28 Jun 2019 11:35:30 -0700

I think some of that is down to configuration. If you think you could configure 
paging to have much smaller page files but have many more held. That way the 
reference sizes will be far smaller and pages dropping in and out would be 
less. E.g. if you expect 100 being read make it 100 but make the page sizes 
smaller so the overhead is far less





Get Outlook for Android







On Thu, Jun 27, 2019 at 11:10 AM +0100, "yw yw" <[email protected]> wrote:










"At last for one message we maybe read twice: first we read page and create
pagereference; second we requery message after its reference is removed.  "

I just realized it was wrong. One message maybe read many times. Think of
this: When #1~#2000 msg is delivered, need to depage #2001-#4000 msg,
reading the whole page; When #2001~#4000 msg is deliverd, need to depage
#4001~#6000 msg, reading page again, etc.

One message maybe read three times if we don't depage until all messages
are delivered. For example, we have 3 pages p1, p2,p3 and message m1 which
is at top part of the p2. In our case(max-size-bytes=51MB, a little bigger
than page size), first depage round reads bottom half of p1 and top part of
p2; second depage round reads bottom half of p2 and top part of p3.
Therforce p2 is read twice and m1 maybe read three times if requeryed.

Be honest, i don't know how to fix the problem above with the
decrentralized approch. The point is not how we rely on os cache, it's that
we do it the wrong way, shouldn't read whole page(50MB) just for ~2000
messages. Also there is no need to save 51MB PagedReferenceImpl in memory.
When 100 queues occupy 5100MB memory, the message references are very
likely to be removed.


Francesco Nigro  于2019年6月27日周四 下午5:05写道：

> >
> >  which means the offset info is 100 times large compared to the shared
> > page index cache.
>
>
> I would check with JOL plugin for exact numbers..
> I see with it that we would have an increase of 4 bytes for each
> PagedRefeferenceImpl, totally decrentralized vs
> a centralized approach (the cache). In the economy of a fully loaded
> broker, if we care about scaling need to understand if the memory tradeoff
> is important enough
> to choose one of the 2 approaches.
> My point is that paging could be made totally based on the OS page cache if
> GC would get in the middle, deleting any previous mechanism of page
> caching...simplifying the process at it is.
> Using a 2 level cache with such centralized approach can work, but will add
> a level of complexity that IMO could be saved...
> What do you think could be the benefit of the decentralized solution if
> compared with the one proposed in the PR?
>
>
> Il giorno gio 27 giu 2019 alle ore 10:41 yw yw  ha
> scritto:
>
> > Sorry, I missed the PageReferece part.
> >
> > The lifecyle of PageReference is: depage(in
> intermediateMessageReferences)
> > -> deliver(in messageReferences) -> waiting for ack(in deliveringRefs) ->
> > removed. Every queue would create it's own PageReference which means the
> > offset info is 100 times large compared to the shared page index cache.
> > If we keep 51MB pageReference size in memory, as i said in pr, "For
> > multiple subscribers to the same address, just one executor is
> responsible
> > for delivering which means at the same moment only one queue is
> delivering.
> > Thus the queue maybe stalled for a long time. We get queueMemorySize
> > messages into memory, and when we deliver these after a long time, we
> > probably need to query message and read page file again.".  At last for
> one
> > message we maybe read twice: first we read page and create pagereference;
> > second we requery message after its reference is removed.
> >
> > For the shared page index cache design, each message just need to be read
> > from file once.
> >
> > Michael Pearce  于2019年6月27日周四 下午3:03写道：
> >
> > > Hi
> > >
> > > First of all i think this is an excellent effort, and could be a
> > potential
> > > massive positive change.
> > >
> > > Before making any change on such scale, i do think we need to ensure we
> > > have sufficient benchmarks on a number of scenarios, not just one use
> > case,
> > > and the benchmark tool used does need to be available openly so that
> > others
> > > can verify the measures and check on their setups.
> > >
> > > Some additional scenarios i would want/need covering are:
> > >
> > > PageCache set to 5, and all consumers keeping up, but lagging enough to
> > be
> > > reading from the same 1st page cache, latency and throughput need to be
> > > measured for all.
> > > PageCache set to 5 and all consumers but one keeping up but lagging
> > enough
> > > to be reading from the same 1st page cahce, but the one is falling off
> > the
> > > end, causing the page cache swapping, measure latecy and througput of
> > those
> > > keeping up in the 1st page cache not caring for the one.
> > >
> > > Regards to solution some alternative approach to discuss
> > >
> > > In your scenario if i understand correctly each subscriber is
> effectivly
> > > having their own queue (1 to 1 mapping) not sharing.
> > > You mention kafka and say multiple consumers doent read serailly on the
> > > address and this is true, but per queue processing through messages
> > > (dispatch) is still serial even with multiple shared consumers on a
> > queue.
> > >
> > > What about keeping the existing mechanism but having a queue hold
> > reference
> > > to a page cache that the queue is currently on, being kept from gc
> (e.g.
> > > not soft) therefore meaning page cache isnt being swapped around, when
> > you
> > > have queues (in your case subscribers) swapping pagecaches back and
> forth
> > > avoidning the constant re-read issue.
> > >
> > > Also i think Franz had an excellent idea, do away with pagecache in its
> > > current form entirely, ensure the offset is kept with the reference and
> > > rely on OS caching keeping hot blocks/data.
> > >
> > > Best
> > > Michael
> > >
> > >
> > >
> > > On Thu, 27 Jun 2019 at 05:13, yw yw  wrote:
> > >
> > > > Hi, folks
> > > >
> > > > This is the discussion about "ARTEMIS-2399 Fix performance
> degradation
> > > > when there are a lot of subscribers".
> > > >
> > > > First apologize i didn't clarify our thoughts.
> > > >
> > > > As noted in the part of Environment, page-max-cache-size is set to 1
> > > > meaning at most one page is allowed in softValueCache. We have tested
> > > with
> > > > the default page-max-cache-size which is 5, it would take some time
> to
> > > > see the performance degradation since at start the cursor positions
> of
> > > 100
> > > > subscribers are similar when all the messages read hits the
> > > softValueCache.
> > > > But after some time, the cursor positions are different. When these
> > > > positions are located more than 5 pages, it means some page would be
> > read
> > > > back and forth. This can be proved by the trace log "adding pageCache
> > > > pageNr=xxx into cursor = test-topic" in PageCursorProviderImpl where
> > some
> > > > pages are read a lot of times for the same subscriber. From the time
> > on,
> > > > the performance starts to degrade. So we set page-max-cache-size to 1
> > > > here just to make the test process more fast and it doesn't change
> the
> > > > final result.
> > > >
> > > > The softValueCache would be removed if memory is really low, in
> > addition
> > > > the map size reaches capacity(default 5). In most cases, the
> > subscribers
> > > > are tailing read which are served by softValueCache(no need to bother
> > > > disk), thus we need to keep it. But When some subscribers fall
> behind,
> > > they
> > > > need to read page not in softValueCache. After looking up code, we
> > found
> > > one
> > > > depage round is following at most MAX_SCHEDULED_RUNNERS deliver round
> > in
> > > > most situations, and that's to say at most MAX_DELIVERIES_IN_LOOP *
> > > > MAX_SCHEDULED_RUNNERS number of messages would be depaged next. If
> you
> > > > adjust QueueImpl logger to debug level, you would see logs like
> "Queue
> > > > Memory Size after depage on queue=sub4 is 53478769 with maxSize =
> > > 52428800.
> > > > Depaged 68 messages, pendingDelivery=1002,
> > intermediateMessageReferences=
> > > > 23162, queueDelivering=0". In order to depage less than 2000
> messages,
> > > > each subscriber has to read a whole page which is unnecessary and
> > > wasteful.
> > > > In our test where one page(50MB) contains ~40000 messages, one
> > subscriber
> > > > maybe read 40000/2000=20 times of page if softValueCache is evicted
> to
> > > > finish delivering it. This has drastically slowed down the process
> and
> > > > burdened on the disk. So we add the PageIndexCacheImpl and read one
> > > message
> > > > each time rather than read all messages of page. In this way, for
> each
> > > > subscriber each page is read only once after finishing delivering.
> > > >
> > > > Having said that, the softValueCache is used for tailing read. If
> it's
> > > > evicted, it won't be reloaded to prevent from the issue illustrated
> > > above.
> > > > Instead the pageIndexCache would be used.
> > > >
> > > > Regarding implementation details, we noted that before delivering
> > page, a
> > > > pageCursorInfo is constructed which needs to read the whole page. We
> > can
> > > > take this opportunity to construct the pageIndexCache. It's very
> simple
> > > to
> > > > code. We also think of building a offset index file and some concerns
> > > > stemed from following:
> > > >
> > > >    1. When to write and sync index file? Would it have some
> performance
> > > >    implications?
> > > >    2. If we have a index file, we can construct pageCursorInfo
> through
> > > >    it(no need to read the page like before), but we need to write the
> > > total
> > > >    message number into it first. Seems a little weird putting this
> into
> > > the
> > > >    index file.
> > > >    3. If experiencing hard crash, a recover mechanism would be needed
> > to
> > > >    recover page and page index files, E.g. truncating to the valid
> > size.
> > > So
> > > >    how do we know which files need to be sanity checked?
> > > >    4. A variant binary search algorithm maybe needed, see
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/70ddd8af71938b4f5f6d1bb3df6243ef13359bcf/core/src/main/scala/kafka/log/AbstractIndex.scala
> > > >     .
> > > >    5. Unlike kafka from which user fetches lots of messages at once
> and
> > > >    broker just needs to look up start offset from the index file
> once,
> > > artemis
> > > >    delivers message one by one and that means we have to look up the
> > > index
> > > >    every time we deliver a message. Although the index file is
> possibly
> > > in
> > > >    page cache, there are still chances we miss cache.
> > > >    6. Compatibility with old files.
> > > >
> > > > To sum that, kafka uses a mmaped index file and we use a index cache.
> > > Both
> > > > are designed to find physical file position according offset(kafka)
> or
> > > > message number(artemis). And we prefer the index cache bcs it's easy
> to
> > > > understand and maintain.
> > > >
> > > > We also tested the one subscriber case with the same setup.
> > > > The original:
> > > > consumer tps(11000msg/s) and latency:
> > > > [image: orig_single_subscriber.png]
> > > > producer tps(30000msg/s) and latency:
> > > > [image: orig_single_producer.png]
> > > > The pr:
> > > > consumer tps(14000msg/s) and latency:
> > > > [image: pr_single_consumer.png]
> > > > producer tps(30000msg/s) and latency:
> > > > [image: pr_single_producer.png]
> > > > It showed result is similar and event a little better in the case of
> > > > single subscriber.
> > > >
> > > > We used our inner test platform and i think jmeter can also be used
> to
> > > > test again it.
> > > >
> > >
> >
>

Re: Improve paging performance when there are lots of subscribers

Reply via email to