Re: Improve paging performance when there are lots of subscribers

Clebert Suconic Mon, 22 Jul 2019 17:10:04 -0700

It shouldn’t need to read it unless it is a rollback and it’s being
redelovered.  They would be an exceptional case.


On Mon, Jul 22, 2019 at 5:46 AM yw yw <[email protected]> wrote:

> Yes, the reader per queue would not close page file until moving to next
> page. However there are chances that files might be opened constantly:
> 1. Paged transactions. Suppose current cursor position is at page2 and page
> transactions are at page1, when transactions are committed, page1 might be
> opened constantly to read message.
> 2. Scheduled or rollbacked messages. Positions of theses messages might
> fall behind current cursor position, leading to page files opening
> constantly.
>
> <[email protected]> 于2019年7月19日周五 下午3:45写道：
>
> > Surly you keep the file open, else you will incur perf penalty of having
> > to open the file constantly.
> >
> >
> >
> >
> > Would be faster to have the reader hold the file open and have one per
> > queue. Avoiding constant opening and closing of a file. And all the
> > overhead of that at the os level
> >
> >
> >
> >
> > Get Outlook for Android
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Jul 19, 2019 at 8:30 AM +0100, "yw yw" <[email protected]>
> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > >  But the real problem here will be the number of openFiles. Each Page
> > will have an Open File, what will keep a lot of open files on the
> > system. Correct?
> > Not sure I made it clear enough.
> > My thought is: Since PageCursorInfo decides whether entire page is
> consumed
> > based on numberOfMessages and PageSubscriptionImpl decides whether to
> move
> > to next page based on the current cursor page position and
> > numberOfMessages, we store a map of  in
> > PageCursorProviderImpl after page is evicted. According to
> > numberOfMessageseach PageSubscriptionImpl can build PageCursorInfo and if
> > current cursor page position is in the range of current page, PageReader
> > can be built to help read messages. So there are really no opened page
> > files in PageCursorProviderImpl.
> > Without this map, each PageSubscriptionImpl has to first read the page
> file
> > to get numberOfMessages, then build PageCursorInfo/PageReader.
> >
> > I agree to put the PageReader to PageSubscriptionImpl, just not sure
> > specific implementation details :)
> >
> >  于2019年7月19日周五 下午2:10写道：
> >
> > > +1 for having one per queue. Def a better idea than having to hold a
> > > cache.
> > >
> > >
> > >
> > >
> > > Get Outlook for Android
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jul 19, 2019 at 4:37 AM +0100, "Clebert Suconic" <
> > > [email protected]> wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > But the real problem here will be the number of openFiles. Each Page
> > > will have an Open File, what will keep a lot of open files on the
> > > system. Correct?
> > >
> > > I believe the impact of having the files moving to the Subscription
> > > wouldn't be that much, and we would fix the problem. WE wouldn't need
> > > a cache at all, as we just keep the File we need at the current
> > > cursor.
> > >
> > > On Tue, Jul 16, 2019 at 10:40 PM yw yw  wrote:
> > > >
> > > > I did consider the case where all pages are instantiated as
> > PageReaders.
> > > > That's really a problem.
> > > >
> > > > The pros of pr is every page is read only once to build PageReader
> and
> > > > shared by all the queues. The cons of pr is many PageReaders are
> > probably
> > > > instantiated if consumers make slow/no progress in several queues
> > whereas
> > > > fast in other queues(I think it's the only cause leading to the
> corner
> > > > case, right?). This means too many open files and too much memory.
> > > >
> > > > The pros of duplicated PageReader is there are fixed number of
> > > PageReaders
> > > > as with queues at the same time.
> > > > The cons is each queue has to read the page once to build their own
> > > > PageReader if page cache is evicted. I'm not sure how this will
> affect
> > > > performance.
> > > >
> > > > The point is we need the number of messages in the page which is used
> > by
> > > > PageCursorInfo and PageSubscription::internalGetNext, so we have to
> > read
> > > > the page file. How about we only cache the number of messages in each
> > > page
> > > > instead of PageReader and build PageReader in each queue. While we
> > > > encounter the corner case, only  pair data is permanently in
> > > > memory that I assume is smaller than completed PageCursorInfo data.
> > This
> > > > way we achieve the performance gain at a small price.
> > > >
> > > > Clebert Suconic  于2019年7月16日周二 下午10:18写道：
> > > >
> > > > > I just came back after a 2 weeks deserved break and I was looking
> at
> > > > > this.. and I can say. it's well done.. nice job! it's a lot
> simpler!
> > > > >
> > > > > However there's one question now. which is probably a further
> > > > > improvement. Shouldn't the pageReader be instantiated at the
> > > > > PageSubscription.
> > > > >
> > > > > That means.. if there's no page cache, in case of the page been
> > > > > evicted, the Subscription would then create a new Page/PageReader
> > > > > pair. and dispose it when it's done (meaning, moved to a different
> > > > > page).
> > > > >
> > > > > As you are solving the case with many subscriptions, wouldn't you
> hit
> > > > > a corner case where all Pages are instantiated as PageReaders?
> > > > >
> > > > >
> > > > > I feel like it would be better to eventually duplicate a PageReader
> > > > > and close it when done.
> > > > >
> > > > >
> > > > > Or did you already consider that possibility and still think it's
> > best
> > > > > to keep this cache of PageReaders?
> > > > >
> > > > > On Sat, Jul 13, 2019 at 12:15 AM
> > > > > wrote:
> > > > > >
> > > > > > Could a squashed PR be sent?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Get Outlook for Android
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 12, 2019 at 2:23 PM +0100, "yw yw"
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have finished work on the new implementation(not yet tests and
> > > > > > configuration) as suggested by franz.
> > > > > >
> > > > > > I put fileOffsetset in the PagePosition and add a new class
> > > PageReader
> > > > > > which is a wrapper of the page that implements PageCache
> interface.
> > > The
> > > > > > PageReader class is used to read page file if cache is evicted.
> For
> > > > > detail,
> > > > > > see
> > > > > >
> > > > >
> > >
> >
> https://github.com/wy96f/activemq-artemis/commit/3f388c2324738f01f53ce806b813220d28d40987
> > > > > >
> > > > > > I deployed some tests and results below:
> > > > > > 1. Running in 51MB size page and 1 page cache in the case of 100
> > > > > multicast
> > > > > > queues.
> > > > > > https://filebin.net/wnyan7d2n1qgfsvg
> > > > > > 2. Running in 5MB size page and 100 page cache in the case of 100
> > > > > multicast
> > > > > > queues.
> > > > > > https://filebin.net/re0989vz7ib1c5mc
> > > > > > 3. Running in 51MB size page and 1 page cache in the case of 1
> > queue.
> > > > > > https://filebin.net/3qndct7f11qckrus
> > > > > >
> > > > > > The results seem good, similar with the implementation in the pr.
> > The
> > > > > most
> > > > > > important is the index cache data is removed, no worry about
> extra
> > > > > overhead
> > > > > > :)
> > > > > >
> > > > > > yw yw  于2019年7月4日周四 下午5:38写道：
> > > > > >
> > > > > > > Hi,  michael
> > > > > > >
> > > > > > > Thanks for the advise. For the current pr, we can use two
> arrays
> > > where
> > > > > one
> > > > > > > records the message number and the other one corresponding
> offset
> > > to
> > > > > > > optimize the memory usage. For the franz's approch, we will
> also
> > > work
> > > > > on
> > > > > > > a early prototyping implementation. After that, we would take
> > some
> > > > > basic
> > > > > > > tests in different scenarios.
> > > > > > >
> > > > > > >  于2019年7月2日周二 上午7:08写道：
> > > > > > >
> > > > > > >> Point though is an extra index cache layer is needed. The
> > > overhead of
> > > > > > >> that means the total paged capacity will be more limited as
> that
> > > > > overhead
> > > > > > >> isnt just an extra int per reference. E.g. in the pr the
> current
> > > impl
> > > > > isnt
> > > > > > >> very memory optimised, could an int array be used or at worst
> an
> > > open
> > > > > > >> primitive int int hashmap.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> This is why i really prefer franz's approach.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> Also what ever we do, we need the new behaviour configurable,
> so
> > > > > should a
> > > > > > >> use case not thought about they won't be impacted. E.g. the
> > change
> > > > > should
> > > > > > >> not be a surprise, it should be something you toggle on.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> Get Outlook for Android
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> On Mon, Jul 1, 2019 at 1:01 PM +0100, "yw yw"  wrote:
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> Hi,
> > > > > > >> We've took a test against your configuration:
> > > > > > >> 5Mb10010Mb.
> > > > > > >> The current code: 7000msg/s sent and 18000msg/s received.
> > > > > > >> Pr code:16000msg/s received and 8200msg/s sent.
> > > > > > >> Like you said, the performance boosts by using much smaller
> page
> > > file
> > > > > and
> > > > > > >> holding many more for current code.
> > > > > > >>
> > > > > > >> Not sure what implications would have using smaller page file,
> > the
> > > > > > >> producer
> > > > > > >> performance may reduce since switching files is more frequent,
> > > number
> > > > > of
> > > > > > >> file handle would increase?
> > > > > > >>
> > > > > > >> While our consumer in the test just echos, nothing to do after
> > > > > receiving
> > > > > > >> message, the consumer in the real world may be busy doing
> > > business.
> > > > > This
> > > > > > >> means references and page caches reside in memory longer and
> may
> > > be
> > > > > > >> evicted
> > > > > > >> more easily when producers are sending all the time.
> > > > > > >>
> > > > > > >> Since We don't know how many subscribers there are, it is not
> a
> > > > > scalable
> > > > > > >> approch. We can't reduce page file size unlimited to fit the
> > > number of
> > > > > > >> subscribers. The code should accommodate to all kinds of
> > > > > configurations.
> > > > > > >> We
> > > > > > >> adjust configuration for trade off as needed, not work around
> > IMO.
> > > > > > >> In our company, ~200 queues(60% are owned by some addresses)
> are
> > > > > deployed
> > > > > > >> in the broker. We can't set all to e.g. 100 page caches(too
> much
> > > > > memory),
> > > > > > >> and neither set different size according to address
> pattern(hard
> > > for
> > > > > > >> operation). In the multi tenants cluster, we prefer
> availability
> > > and
> > > > > to
> > > > > > >> avoid memory exhausted, we set pageSize to 30MB, max cache
> size
> > > to 1
> > > > > and
> > > > > > >> max size to 31MB. It's running well in one of our clusters
> now:)
> > > > > > >>
> > > > > > >>  于2019年6月29日周六 上午2:35写道：
> > > > > > >>
> > > > > > >> > I think some of that is down to configuration. If you think
> > you
> > > > > could
> > > > > > >> > configure paging to have much smaller page files but have
> many
> > > more
> > > > > > >> held.
> > > > > > >> > That way the reference sizes will be far smaller and pages
> > > dropping
> > > > > in
> > > > > > >> and
> > > > > > >> > out would be less. E.g. if you expect 100 being read make it
> > > 100 but
> > > > > > >> make
> > > > > > >> > the page sizes smaller so the overhead is far less
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Get Outlook for Android
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Thu, Jun 27, 2019 at 11:10 AM +0100, "yw yw"  wrote:
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > "At last for one message we maybe read twice: first we read
> > > page and
> > > > > > >> create
> > > > > > >> > pagereference; second we requery message after its reference
> > is
> > > > > > >> removed.  "
> > > > > > >> >
> > > > > > >> > I just realized it was wrong. One message maybe read many
> > times.
> > > > > Think
> > > > > > >> of
> > > > > > >> > this: When #1~#2000 msg is delivered, need to depage
> > #2001-#4000
> > > > > msg,
> > > > > > >> > reading the whole page; When #2001~#4000 msg is deliverd,
> need
> > > to
> > > > > depage
> > > > > > >> > #4001~#6000 msg, reading page again, etc.
> > > > > > >> >
> > > > > > >> > One message maybe read three times if we don't depage until
> > all
> > > > > messages
> > > > > > >> > are delivered. For example, we have 3 pages p1, p2,p3 and
> > > message m1
> > > > > > >> which
> > > > > > >> > is at top part of the p2. In our case(max-size-bytes=51MB, a
> > > little
> > > > > > >> bigger
> > > > > > >> > than page size), first depage round reads bottom half of p1
> > and
> > > top
> > > > > > >> part of
> > > > > > >> > p2; second depage round reads bottom half of p2 and top part
> > of
> > > p3.
> > > > > > >> > Therforce p2 is read twice and m1 maybe read three times if
> > > > > requeryed.
> > > > > > >> >
> > > > > > >> > Be honest, i don't know how to fix the problem above with
> the
> > > > > > >> > decrentralized approch. The point is not how we rely on os
> > > cache,
> > > > > it's
> > > > > > >> that
> > > > > > >> > we do it the wrong way, shouldn't read whole page(50MB) just
> > for
> > > > > ~2000
> > > > > > >> > messages. Also there is no need to save 51MB
> > PagedReferenceImpl
> > > in
> > > > > > >> memory.
> > > > > > >> > When 100 queues occupy 5100MB memory, the message references
> > are
> > > > > very
> > > > > > >> > likely to be removed.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Francesco Nigro  于2019年6月27日周四 下午5:05写道：
> > > > > > >> >
> > > > > > >> > > >
> > > > > > >> > > >  which means the offset info is 100 times large compared
> > to
> > > the
> > > > > > >> shared
> > > > > > >> > > > page index cache.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > I would check with JOL plugin for exact numbers..
> > > > > > >> > > I see with it that we would have an increase of 4 bytes
> for
> > > each
> > > > > > >> > > PagedRefeferenceImpl, totally decrentralized vs
> > > > > > >> > > a centralized approach (the cache). In the economy of a
> > fully
> > > > > loaded
> > > > > > >> > > broker, if we care about scaling need to understand if the
> > > memory
> > > > > > >> > tradeoff
> > > > > > >> > > is important enough
> > > > > > >> > > to choose one of the 2 approaches.
> > > > > > >> > > My point is that paging could be made totally based on the
> > OS
> > > page
> > > > > > >> cache
> > > > > > >> > if
> > > > > > >> > > GC would get in the middle, deleting any previous
> mechanism
> > of
> > > > > page
> > > > > > >> > > caching...simplifying the process at it is.
> > > > > > >> > > Using a 2 level cache with such centralized approach can
> > > work, but
> > > > > > >> will
> > > > > > >> > add
> > > > > > >> > > a level of complexity that IMO could be saved...
> > > > > > >> > > What do you think could be the benefit of the
> decentralized
> > > > > solution
> > > > > > >> if
> > > > > > >> > > compared with the one proposed in the PR?
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > Il giorno gio 27 giu 2019 alle ore 10:41 yw yw  ha
> > > > > > >> > > scritto:
> > > > > > >> > >
> > > > > > >> > > > Sorry, I missed the PageReferece part.
> > > > > > >> > > >
> > > > > > >> > > > The lifecyle of PageReference is: depage(in
> > > > > > >> > > intermediateMessageReferences)
> > > > > > >> > > > -> deliver(in messageReferences) -> waiting for ack(in
> > > > > > >> deliveringRefs)
> > > > > > >> > ->
> > > > > > >> > > > removed. Every queue would create it's own PageReference
> > > which
> > > > > means
> > > > > > >> > the
> > > > > > >> > > > offset info is 100 times large compared to the shared
> page
> > > index
> > > > > > >> cache.
> > > > > > >> > > > If we keep 51MB pageReference size in memory, as i said
> in
> > > pr,
> > > > > "For
> > > > > > >> > > > multiple subscribers to the same address, just one
> > executor
> > > is
> > > > > > >> > > responsible
> > > > > > >> > > > for delivering which means at the same moment only one
> > > queue is
> > > > > > >> > > delivering.
> > > > > > >> > > > Thus the queue maybe stalled for a long time. We get
> > > > > queueMemorySize
> > > > > > >> > > > messages into memory, and when we deliver these after a
> > long
> > > > > time,
> > > > > > >> we
> > > > > > >> > > > probably need to query message and read page file
> again.".
> > > At
> > > > > last
> > > > > > >> for
> > > > > > >> > > one
> > > > > > >> > > > message we maybe read twice: first we read page and
> create
> > > > > > >> > pagereference;
> > > > > > >> > > > second we requery message after its reference is
> removed.
> > > > > > >> > > >
> > > > > > >> > > > For the shared page index cache design, each message
> just
> > > need
> > > > > to be
> > > > > > >> > read
> > > > > > >> > > > from file once.
> > > > > > >> > > >
> > > > > > >> > > > Michael Pearce  于2019年6月27日周四 下午3:03写道：
> > > > > > >> > > >
> > > > > > >> > > > > Hi
> > > > > > >> > > > >
> > > > > > >> > > > > First of all i think this is an excellent effort, and
> > > could
> > > > > be a
> > > > > > >> > > > potential
> > > > > > >> > > > > massive positive change.
> > > > > > >> > > > >
> > > > > > >> > > > > Before making any change on such scale, i do think we
> > > need to
> > > > > > >> ensure
> > > > > > >> > we
> > > > > > >> > > > > have sufficient benchmarks on a number of scenarios,
> not
> > > just
> > > > > one
> > > > > > >> use
> > > > > > >> > > > case,
> > > > > > >> > > > > and the benchmark tool used does need to be available
> > > openly
> > > > > so
> > > > > > >> that
> > > > > > >> > > > others
> > > > > > >> > > > > can verify the measures and check on their setups.
> > > > > > >> > > > >
> > > > > > >> > > > > Some additional scenarios i would want/need covering
> > are:
> > > > > > >> > > > >
> > > > > > >> > > > > PageCache set to 5, and all consumers keeping up, but
> > > lagging
> > > > > > >> enough
> > > > > > >> > to
> > > > > > >> > > > be
> > > > > > >> > > > > reading from the same 1st page cache, latency and
> > > throughput
> > > > > need
> > > > > > >> to
> > > > > > >> > be
> > > > > > >> > > > > measured for all.
> > > > > > >> > > > > PageCache set to 5 and all consumers but one keeping
> up
> > > but
> > > > > > >> lagging
> > > > > > >> > > > enough
> > > > > > >> > > > > to be reading from the same 1st page cahce, but the
> one
> > is
> > > > > falling
> > > > > > >> > off
> > > > > > >> > > > the
> > > > > > >> > > > > end, causing the page cache swapping, measure latecy
> and
> > > > > > >> througput of
> > > > > > >> > > > those
> > > > > > >> > > > > keeping up in the 1st page cache not caring for the
> one.
> > > > > > >> > > > >
> > > > > > >> > > > > Regards to solution some alternative approach to
> discuss
> > > > > > >> > > > >
> > > > > > >> > > > > In your scenario if i understand correctly each
> > > subscriber is
> > > > > > >> > > effectivly
> > > > > > >> > > > > having their own queue (1 to 1 mapping) not sharing.
> > > > > > >> > > > > You mention kafka and say multiple consumers doent
> read
> > > > > serailly
> > > > > > >> on
> > > > > > >> > the
> > > > > > >> > > > > address and this is true, but per queue processing
> > through
> > > > > > >> messages
> > > > > > >> > > > > (dispatch) is still serial even with multiple shared
> > > > > consumers on
> > > > > > >> a
> > > > > > >> > > > queue.
> > > > > > >> > > > >
> > > > > > >> > > > > What about keeping the existing mechanism but having a
> > > queue
> > > > > hold
> > > > > > >> > > > reference
> > > > > > >> > > > > to a page cache that the queue is currently on, being
> > kept
> > > > > from gc
> > > > > > >> > > (e.g.
> > > > > > >> > > > > not soft) therefore meaning page cache isnt being
> > swapped
> > > > > around,
> > > > > > >> > when
> > > > > > >> > > > you
> > > > > > >> > > > > have queues (in your case subscribers) swapping
> > pagecaches
> > > > > back
> > > > > > >> and
> > > > > > >> > > forth
> > > > > > >> > > > > avoidning the constant re-read issue.
> > > > > > >> > > > >
> > > > > > >> > > > > Also i think Franz had an excellent idea, do away with
> > > > > pagecache
> > > > > > >> in
> > > > > > >> > its
> > > > > > >> > > > > current form entirely, ensure the offset is kept with
> > the
> > > > > > >> reference
> > > > > > >> > and
> > > > > > >> > > > > rely on OS caching keeping hot blocks/data.
> > > > > > >> > > > >
> > > > > > >> > > > > Best
> > > > > > >> > > > > Michael
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Thu, 27 Jun 2019 at 05:13, yw yw  wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Hi, folks
> > > > > > >> > > > > >
> > > > > > >> > > > > > This is the discussion about "ARTEMIS-2399 Fix
> > > performance
> > > > > > >> > > degradation
> > > > > > >> > > > > > when there are a lot of subscribers".
> > > > > > >> > > > > >
> > > > > > >> > > > > > First apologize i didn't clarify our thoughts.
> > > > > > >> > > > > >
> > > > > > >> > > > > > As noted in the part of Environment,
> > > page-max-cache-size is
> > > > > set
> > > > > > >> to
> > > > > > >> > 1
> > > > > > >> > > > > > meaning at most one page is allowed in
> softValueCache.
> > > We
> > > > > have
> > > > > > >> > tested
> > > > > > >> > > > > with
> > > > > > >> > > > > > the default page-max-cache-size which is 5, it would
> > > take
> > > > > some
> > > > > > >> time
> > > > > > >> > > to
> > > > > > >> > > > > > see the performance degradation since at start the
> > > cursor
> > > > > > >> positions
> > > > > > >> > > of
> > > > > > >> > > > > 100
> > > > > > >> > > > > > subscribers are similar when all the messages read
> > hits
> > > the
> > > > > > >> > > > > softValueCache.
> > > > > > >> > > > > > But after some time, the cursor positions are
> > different.
> > > > > When
> > > > > > >> these
> > > > > > >> > > > > > positions are located more than 5 pages, it means
> some
> > > page
> > > > > > >> would
> > > > > > >> > be
> > > > > > >> > > > read
> > > > > > >> > > > > > back and forth. This can be proved by the trace log
> > > "adding
> > > > > > >> > pageCache
> > > > > > >> > > > > > pageNr=xxx into cursor = test-topic" in
> > > > > PageCursorProviderImpl
> > > > > > >> > where
> > > > > > >> > > > some
> > > > > > >> > > > > > pages are read a lot of times for the same
> subscriber.
> > > From
> > > > > the
> > > > > > >> > time
> > > > > > >> > > > on,
> > > > > > >> > > > > > the performance starts to degrade. So we set
> > > > > page-max-cache-size
> > > > > > >> > to 1
> > > > > > >> > > > > > here just to make the test process more fast and it
> > > doesn't
> > > > > > >> change
> > > > > > >> > > the
> > > > > > >> > > > > > final result.
> > > > > > >> > > > > >
> > > > > > >> > > > > > The softValueCache would be removed if memory is
> > really
> > > > > low, in
> > > > > > >> > > > addition
> > > > > > >> > > > > > the map size reaches capacity(default 5). In most
> > > cases, the
> > > > > > >> > > > subscribers
> > > > > > >> > > > > > are tailing read which are served by
> softValueCache(no
> > > need
> > > > > to
> > > > > > >> > bother
> > > > > > >> > > > > > disk), thus we need to keep it. But When some
> > > subscribers
> > > > > fall
> > > > > > >> > > behind,
> > > > > > >> > > > > they
> > > > > > >> > > > > > need to read page not in softValueCache. After
> looking
> > > up
> > > > > code,
> > > > > > >> we
> > > > > > >> > > > found
> > > > > > >> > > > > one
> > > > > > >> > > > > > depage round is following at most
> > MAX_SCHEDULED_RUNNERS
> > > > > deliver
> > > > > > >> > round
> > > > > > >> > > > in
> > > > > > >> > > > > > most situations, and that's to say at most
> > > > > > >> MAX_DELIVERIES_IN_LOOP *
> > > > > > >> > > > > > MAX_SCHEDULED_RUNNERS number of messages would be
> > > depaged
> > > > > next.
> > > > > > >> If
> > > > > > >> > > you
> > > > > > >> > > > > > adjust QueueImpl logger to debug level, you would
> see
> > > logs
> > > > > like
> > > > > > >> > > "Queue
> > > > > > >> > > > > > Memory Size after depage on queue=sub4 is 53478769
> > with
> > > > > maxSize
> > > > > > >> =
> > > > > > >> > > > > 52428800.
> > > > > > >> > > > > > Depaged 68 messages, pendingDelivery=1002,
> > > > > > >> > > > intermediateMessageReferences=
> > > > > > >> > > > > > 23162, queueDelivering=0". In order to depage less
> > than
> > > 2000
> > > > > > >> > > messages,
> > > > > > >> > > > > > each subscriber has to read a whole page which is
> > > > > unnecessary
> > > > > > >> and
> > > > > > >> > > > > wasteful.
> > > > > > >> > > > > > In our test where one page(50MB) contains ~40000
> > > messages,
> > > > > one
> > > > > > >> > > > subscriber
> > > > > > >> > > > > > maybe read 40000/2000=20 times of page if
> > > softValueCache is
> > > > > > >> evicted
> > > > > > >> > > to
> > > > > > >> > > > > > finish delivering it. This has drastically slowed
> down
> > > the
> > > > > > >> process
> > > > > > >> > > and
> > > > > > >> > > > > > burdened on the disk. So we add the
> PageIndexCacheImpl
> > > and
> > > > > read
> > > > > > >> one
> > > > > > >> > > > > message
> > > > > > >> > > > > > each time rather than read all messages of page. In
> > this
> > > > > way,
> > > > > > >> for
> > > > > > >> > > each
> > > > > > >> > > > > > subscriber each page is read only once after
> finishing
> > > > > > >> delivering.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Having said that, the softValueCache is used for
> > tailing
> > > > > read.
> > > > > > >> If
> > > > > > >> > > it's
> > > > > > >> > > > > > evicted, it won't be reloaded to prevent from the
> > issue
> > > > > > >> illustrated
> > > > > > >> > > > > above.
> > > > > > >> > > > > > Instead the pageIndexCache would be used.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Regarding implementation details, we noted that
> before
> > > > > > >> delivering
> > > > > > >> > > > page, a
> > > > > > >> > > > > > pageCursorInfo is constructed which needs to read
> the
> > > whole
> > > > > > >> page.
> > > > > > >> > We
> > > > > > >> > > > can
> > > > > > >> > > > > > take this opportunity to construct the
> pageIndexCache.
> > > It's
> > > > > very
> > > > > > >> > > simple
> > > > > > >> > > > > to
> > > > > > >> > > > > > code. We also think of building a offset index file
> > and
> > > some
> > > > > > >> > concerns
> > > > > > >> > > > > > stemed from following:
> > > > > > >> > > > > >
> > > > > > >> > > > > >    1. When to write and sync index file? Would it
> have
> > > some
> > > > > > >> > > performance
> > > > > > >> > > > > >    implications?
> > > > > > >> > > > > >    2. If we have a index file, we can construct
> > > > > pageCursorInfo
> > > > > > >> > > through
> > > > > > >> > > > > >    it(no need to read the page like before), but we
> > > need to
> > > > > > >> write
> > > > > > >> > the
> > > > > > >> > > > > total
> > > > > > >> > > > > >    message number into it first. Seems a little
> weird
> > > > > putting
> > > > > > >> this
> > > > > > >> > > into
> > > > > > >> > > > > the
> > > > > > >> > > > > >    index file.
> > > > > > >> > > > > >    3. If experiencing hard crash, a recover
> mechanism
> > > would
> > > > > be
> > > > > > >> > needed
> > > > > > >> > > > to
> > > > > > >> > > > > >    recover page and page index files, E.g.
> truncating
> > > to the
> > > > > > >> valid
> > > > > > >> > > > size.
> > > > > > >> > > > > So
> > > > > > >> > > > > >    how do we know which files need to be sanity
> > checked?
> > > > > > >> > > > > >    4. A variant binary search algorithm maybe
> needed,
> > > see
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > >
> > >
> >
> https://github.com/apache/kafka/blob/70ddd8af71938b4f5f6d1bb3df6243ef13359bcf/core/src/main/scala/kafka/log/AbstractIndex.scala
> > > > > > >> > > > > >     .
> > > > > > >> > > > > >    5. Unlike kafka from which user fetches lots of
> > > messages
> > > > > at
> > > > > > >> once
> > > > > > >> > > and
> > > > > > >> > > > > >    broker just needs to look up start offset from
> the
> > > index
> > > > > file
> > > > > > >> > > once,
> > > > > > >> > > > > artemis
> > > > > > >> > > > > >    delivers message one by one and that means we
> have
> > to
> > > > > look up
> > > > > > >> > the
> > > > > > >> > > > > index
> > > > > > >> > > > > >    every time we deliver a message. Although the
> index
> > > file
> > > > > is
> > > > > > >> > > possibly
> > > > > > >> > > > > in
> > > > > > >> > > > > >    page cache, there are still chances we miss
> cache.
> > > > > > >> > > > > >    6. Compatibility with old files.
> > > > > > >> > > > > >
> > > > > > >> > > > > > To sum that, kafka uses a mmaped index file and we
> > use a
> > > > > index
> > > > > > >> > cache.
> > > > > > >> > > > > Both
> > > > > > >> > > > > > are designed to find physical file position
> according
> > > > > > >> offset(kafka)
> > > > > > >> > > or
> > > > > > >> > > > > > message number(artemis). And we prefer the index
> cache
> > > bcs
> > > > > it's
> > > > > > >> > easy
> > > > > > >> > > to
> > > > > > >> > > > > > understand and maintain.
> > > > > > >> > > > > >
> > > > > > >> > > > > > We also tested the one subscriber case with the same
> > > setup.
> > > > > > >> > > > > > The original:
> > > > > > >> > > > > > consumer tps(11000msg/s) and latency:
> > > > > > >> > > > > > [image: orig_single_subscriber.png]
> > > > > > >> > > > > > producer tps(30000msg/s) and latency:
> > > > > > >> > > > > > [image: orig_single_producer.png]
> > > > > > >> > > > > > The pr:
> > > > > > >> > > > > > consumer tps(14000msg/s) and latency:
> > > > > > >> > > > > > [image: pr_single_consumer.png]
> > > > > > >> > > > > > producer tps(30000msg/s) and latency:
> > > > > > >> > > > > > [image: pr_single_producer.png]
> > > > > > >> > > > > > It showed result is similar and event a little
> better
> > > in the
> > > > > > >> case
> > > > > > >> > of
> > > > > > >> > > > > > single subscriber.
> > > > > > >> > > > > >
> > > > > > >> > > > > > We used our inner test platform and i think jmeter
> can
> > > also
> > > > > be
> > > > > > >> used
> > > > > > >> > > to
> > > > > > >> > > > > > test again it.
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Clebert Suconic
> > > > >
> > >
> > >
> > >
> > > --
> > > Clebert Suconic
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
>
-- 
Clebert Suconic

Re: Improve paging performance when there are lots of subscribers

Reply via email to