+1 for having one per queue. Def a better idea than having to hold a cache. 




Get Outlook for Android







On Fri, Jul 19, 2019 at 4:37 AM +0100, "Clebert Suconic" 
<[email protected]> wrote:










But the real problem here will be the number of openFiles. Each Page
will have an Open File, what will keep a lot of open files on the
system. Correct?

I believe the impact of having the files moving to the Subscription
wouldn't be that much, and we would fix the problem. WE wouldn't need
a cache at all, as we just keep the File we need at the current
cursor.

On Tue, Jul 16, 2019 at 10:40 PM yw yw  wrote:
>
> I did consider the case where all pages are instantiated as PageReaders.
> That's really a problem.
>
> The pros of pr is every page is read only once to build PageReader and
> shared by all the queues. The cons of pr is many PageReaders are probably
> instantiated if consumers make slow/no progress in several queues whereas
> fast in other queues(I think it's the only cause leading to the corner
> case, right?). This means too many open files and too much memory.
>
> The pros of duplicated PageReader is there are fixed number of PageReaders
> as with queues at the same time.
> The cons is each queue has to read the page once to build their own
> PageReader if page cache is evicted. I'm not sure how this will affect
> performance.
>
> The point is we need the number of messages in the page which is used by
> PageCursorInfo and PageSubscription::internalGetNext, so we have to read
> the page file. How about we only cache the number of messages in each page
> instead of PageReader and build PageReader in each queue. While we
> encounter the corner case, only  pair data is permanently in
> memory that I assume is smaller than completed PageCursorInfo data. This
> way we achieve the performance gain at a small price.
>
> Clebert Suconic  于2019年7月16日周二 下午10:18写道:
>
> > I just came back after a 2 weeks deserved break and I was looking at
> > this.. and I can say. it's well done.. nice job! it's a lot simpler!
> >
> > However there's one question now. which is probably a further
> > improvement. Shouldn't the pageReader be instantiated at the
> > PageSubscription.
> >
> > That means.. if there's no page cache, in case of the page been
> > evicted, the Subscription would then create a new Page/PageReader
> > pair. and dispose it when it's done (meaning, moved to a different
> > page).
> >
> > As you are solving the case with many subscriptions, wouldn't you hit
> > a corner case where all Pages are instantiated as PageReaders?
> >
> >
> > I feel like it would be better to eventually duplicate a PageReader
> > and close it when done.
> >
> >
> > Or did you already consider that possibility and still think it's best
> > to keep this cache of PageReaders?
> >
> > On Sat, Jul 13, 2019 at 12:15 AM 
> > wrote:
> > >
> > > Could a squashed PR be sent?
> > >
> > >
> > >
> > >
> > > Get Outlook for Android
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jul 12, 2019 at 2:23 PM +0100, "yw yw" 
> > wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hi,
> > >
> > > I have finished work on the new implementation(not yet tests and
> > > configuration) as suggested by franz.
> > >
> > > I put fileOffsetset in the PagePosition and add a new class PageReader
> > > which is a wrapper of the page that implements PageCache interface. The
> > > PageReader class is used to read page file if cache is evicted. For
> > detail,
> > > see
> > >
> > https://github.com/wy96f/activemq-artemis/commit/3f388c2324738f01f53ce806b813220d28d40987
> > >
> > > I deployed some tests and results below:
> > > 1. Running in 51MB size page and 1 page cache in the case of 100
> > multicast
> > > queues.
> > > https://filebin.net/wnyan7d2n1qgfsvg
> > > 2. Running in 5MB size page and 100 page cache in the case of 100
> > multicast
> > > queues.
> > > https://filebin.net/re0989vz7ib1c5mc
> > > 3. Running in 51MB size page and 1 page cache in the case of 1 queue.
> > > https://filebin.net/3qndct7f11qckrus
> > >
> > > The results seem good, similar with the implementation in the pr. The
> > most
> > > important is the index cache data is removed, no worry about extra
> > overhead
> > > :)
> > >
> > > yw yw  于2019年7月4日周四 下午5:38写道:
> > >
> > > > Hi,  michael
> > > >
> > > > Thanks for the advise. For the current pr, we can use two arrays where
> > one
> > > > records the message number and the other one corresponding offset to
> > > > optimize the memory usage. For the franz's approch, we will also work
> > on
> > > > a early prototyping implementation. After that, we would take some
> > basic
> > > > tests in different scenarios.
> > > >
> > > >  于2019年7月2日周二 上午7:08写道:
> > > >
> > > >> Point though is an extra index cache layer is needed. The overhead of
> > > >> that means the total paged capacity will be more limited as that
> > overhead
> > > >> isnt just an extra int per reference. E.g. in the pr the current impl
> > isnt
> > > >> very memory optimised, could an int array be used or at worst an open
> > > >> primitive int int hashmap.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> This is why i really prefer franz's approach.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Also what ever we do, we need the new behaviour configurable, so
> > should a
> > > >> use case not thought about they won't be impacted. E.g. the change
> > should
> > > >> not be a surprise, it should be something you toggle on.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Get Outlook for Android
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Jul 1, 2019 at 1:01 PM +0100, "yw yw"  wrote:
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Hi,
> > > >> We've took a test against your configuration:
> > > >> 5Mb10010Mb.
> > > >> The current code: 7000msg/s sent and 18000msg/s received.
> > > >> Pr code:16000msg/s received and 8200msg/s sent.
> > > >> Like you said, the performance boosts by using much smaller page file
> > and
> > > >> holding many more for current code.
> > > >>
> > > >> Not sure what implications would have using smaller page file, the
> > > >> producer
> > > >> performance may reduce since switching files is more frequent, number
> > of
> > > >> file handle would increase?
> > > >>
> > > >> While our consumer in the test just echos, nothing to do after
> > receiving
> > > >> message, the consumer in the real world may be busy doing business.
> > This
> > > >> means references and page caches reside in memory longer and may be
> > > >> evicted
> > > >> more easily when producers are sending all the time.
> > > >>
> > > >> Since We don't know how many subscribers there are, it is not a
> > scalable
> > > >> approch. We can't reduce page file size unlimited to fit the number of
> > > >> subscribers. The code should accommodate to all kinds of
> > configurations.
> > > >> We
> > > >> adjust configuration for trade off as needed, not work around IMO.
> > > >> In our company, ~200 queues(60% are owned by some addresses) are
> > deployed
> > > >> in the broker. We can't set all to e.g. 100 page caches(too much
> > memory),
> > > >> and neither set different size according to address pattern(hard for
> > > >> operation). In the multi tenants cluster, we prefer availability and
> > to
> > > >> avoid memory exhausted, we set pageSize to 30MB, max cache size to 1
> > and
> > > >> max size to 31MB. It's running well in one of our clusters now:)
> > > >>
> > > >>  于2019年6月29日周六 上午2:35写道:
> > > >>
> > > >> > I think some of that is down to configuration. If you think you
> > could
> > > >> > configure paging to have much smaller page files but have many more
> > > >> held.
> > > >> > That way the reference sizes will be far smaller and pages dropping
> > in
> > > >> and
> > > >> > out would be less. E.g. if you expect 100 being read make it 100 but
> > > >> make
> > > >> > the page sizes smaller so the overhead is far less
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > Get Outlook for Android
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Thu, Jun 27, 2019 at 11:10 AM +0100, "yw yw"  wrote:
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > "At last for one message we maybe read twice: first we read page and
> > > >> create
> > > >> > pagereference; second we requery message after its reference is
> > > >> removed.  "
> > > >> >
> > > >> > I just realized it was wrong. One message maybe read many times.
> > Think
> > > >> of
> > > >> > this: When #1~#2000 msg is delivered, need to depage #2001-#4000
> > msg,
> > > >> > reading the whole page; When #2001~#4000 msg is deliverd, need to
> > depage
> > > >> > #4001~#6000 msg, reading page again, etc.
> > > >> >
> > > >> > One message maybe read three times if we don't depage until all
> > messages
> > > >> > are delivered. For example, we have 3 pages p1, p2,p3 and message m1
> > > >> which
> > > >> > is at top part of the p2. In our case(max-size-bytes=51MB, a little
> > > >> bigger
> > > >> > than page size), first depage round reads bottom half of p1 and top
> > > >> part of
> > > >> > p2; second depage round reads bottom half of p2 and top part of p3.
> > > >> > Therforce p2 is read twice and m1 maybe read three times if
> > requeryed.
> > > >> >
> > > >> > Be honest, i don't know how to fix the problem above with the
> > > >> > decrentralized approch. The point is not how we rely on os cache,
> > it's
> > > >> that
> > > >> > we do it the wrong way, shouldn't read whole page(50MB) just for
> > ~2000
> > > >> > messages. Also there is no need to save 51MB PagedReferenceImpl in
> > > >> memory.
> > > >> > When 100 queues occupy 5100MB memory, the message references are
> > very
> > > >> > likely to be removed.
> > > >> >
> > > >> >
> > > >> > Francesco Nigro  于2019年6月27日周四 下午5:05写道:
> > > >> >
> > > >> > > >
> > > >> > > >  which means the offset info is 100 times large compared to the
> > > >> shared
> > > >> > > > page index cache.
> > > >> > >
> > > >> > >
> > > >> > > I would check with JOL plugin for exact numbers..
> > > >> > > I see with it that we would have an increase of 4 bytes for each
> > > >> > > PagedRefeferenceImpl, totally decrentralized vs
> > > >> > > a centralized approach (the cache). In the economy of a fully
> > loaded
> > > >> > > broker, if we care about scaling need to understand if the memory
> > > >> > tradeoff
> > > >> > > is important enough
> > > >> > > to choose one of the 2 approaches.
> > > >> > > My point is that paging could be made totally based on the OS page
> > > >> cache
> > > >> > if
> > > >> > > GC would get in the middle, deleting any previous mechanism of
> > page
> > > >> > > caching...simplifying the process at it is.
> > > >> > > Using a 2 level cache with such centralized approach can work, but
> > > >> will
> > > >> > add
> > > >> > > a level of complexity that IMO could be saved...
> > > >> > > What do you think could be the benefit of the decentralized
> > solution
> > > >> if
> > > >> > > compared with the one proposed in the PR?
> > > >> > >
> > > >> > >
> > > >> > > Il giorno gio 27 giu 2019 alle ore 10:41 yw yw  ha
> > > >> > > scritto:
> > > >> > >
> > > >> > > > Sorry, I missed the PageReferece part.
> > > >> > > >
> > > >> > > > The lifecyle of PageReference is: depage(in
> > > >> > > intermediateMessageReferences)
> > > >> > > > -> deliver(in messageReferences) -> waiting for ack(in
> > > >> deliveringRefs)
> > > >> > ->
> > > >> > > > removed. Every queue would create it's own PageReference which
> > means
> > > >> > the
> > > >> > > > offset info is 100 times large compared to the shared page index
> > > >> cache.
> > > >> > > > If we keep 51MB pageReference size in memory, as i said in pr,
> > "For
> > > >> > > > multiple subscribers to the same address, just one executor is
> > > >> > > responsible
> > > >> > > > for delivering which means at the same moment only one queue is
> > > >> > > delivering.
> > > >> > > > Thus the queue maybe stalled for a long time. We get
> > queueMemorySize
> > > >> > > > messages into memory, and when we deliver these after a long
> > time,
> > > >> we
> > > >> > > > probably need to query message and read page file again.".  At
> > last
> > > >> for
> > > >> > > one
> > > >> > > > message we maybe read twice: first we read page and create
> > > >> > pagereference;
> > > >> > > > second we requery message after its reference is removed.
> > > >> > > >
> > > >> > > > For the shared page index cache design, each message just need
> > to be
> > > >> > read
> > > >> > > > from file once.
> > > >> > > >
> > > >> > > > Michael Pearce  于2019年6月27日周四 下午3:03写道:
> > > >> > > >
> > > >> > > > > Hi
> > > >> > > > >
> > > >> > > > > First of all i think this is an excellent effort, and could
> > be a
> > > >> > > > potential
> > > >> > > > > massive positive change.
> > > >> > > > >
> > > >> > > > > Before making any change on such scale, i do think we need to
> > > >> ensure
> > > >> > we
> > > >> > > > > have sufficient benchmarks on a number of scenarios, not just
> > one
> > > >> use
> > > >> > > > case,
> > > >> > > > > and the benchmark tool used does need to be available openly
> > so
> > > >> that
> > > >> > > > others
> > > >> > > > > can verify the measures and check on their setups.
> > > >> > > > >
> > > >> > > > > Some additional scenarios i would want/need covering are:
> > > >> > > > >
> > > >> > > > > PageCache set to 5, and all consumers keeping up, but lagging
> > > >> enough
> > > >> > to
> > > >> > > > be
> > > >> > > > > reading from the same 1st page cache, latency and throughput
> > need
> > > >> to
> > > >> > be
> > > >> > > > > measured for all.
> > > >> > > > > PageCache set to 5 and all consumers but one keeping up but
> > > >> lagging
> > > >> > > > enough
> > > >> > > > > to be reading from the same 1st page cahce, but the one is
> > falling
> > > >> > off
> > > >> > > > the
> > > >> > > > > end, causing the page cache swapping, measure latecy and
> > > >> througput of
> > > >> > > > those
> > > >> > > > > keeping up in the 1st page cache not caring for the one.
> > > >> > > > >
> > > >> > > > > Regards to solution some alternative approach to discuss
> > > >> > > > >
> > > >> > > > > In your scenario if i understand correctly each subscriber is
> > > >> > > effectivly
> > > >> > > > > having their own queue (1 to 1 mapping) not sharing.
> > > >> > > > > You mention kafka and say multiple consumers doent read
> > serailly
> > > >> on
> > > >> > the
> > > >> > > > > address and this is true, but per queue processing through
> > > >> messages
> > > >> > > > > (dispatch) is still serial even with multiple shared
> > consumers on
> > > >> a
> > > >> > > > queue.
> > > >> > > > >
> > > >> > > > > What about keeping the existing mechanism but having a queue
> > hold
> > > >> > > > reference
> > > >> > > > > to a page cache that the queue is currently on, being kept
> > from gc
> > > >> > > (e.g.
> > > >> > > > > not soft) therefore meaning page cache isnt being swapped
> > around,
> > > >> > when
> > > >> > > > you
> > > >> > > > > have queues (in your case subscribers) swapping pagecaches
> > back
> > > >> and
> > > >> > > forth
> > > >> > > > > avoidning the constant re-read issue.
> > > >> > > > >
> > > >> > > > > Also i think Franz had an excellent idea, do away with
> > pagecache
> > > >> in
> > > >> > its
> > > >> > > > > current form entirely, ensure the offset is kept with the
> > > >> reference
> > > >> > and
> > > >> > > > > rely on OS caching keeping hot blocks/data.
> > > >> > > > >
> > > >> > > > > Best
> > > >> > > > > Michael
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Thu, 27 Jun 2019 at 05:13, yw yw  wrote:
> > > >> > > > >
> > > >> > > > > > Hi, folks
> > > >> > > > > >
> > > >> > > > > > This is the discussion about "ARTEMIS-2399 Fix performance
> > > >> > > degradation
> > > >> > > > > > when there are a lot of subscribers".
> > > >> > > > > >
> > > >> > > > > > First apologize i didn't clarify our thoughts.
> > > >> > > > > >
> > > >> > > > > > As noted in the part of Environment, page-max-cache-size is
> > set
> > > >> to
> > > >> > 1
> > > >> > > > > > meaning at most one page is allowed in softValueCache. We
> > have
> > > >> > tested
> > > >> > > > > with
> > > >> > > > > > the default page-max-cache-size which is 5, it would take
> > some
> > > >> time
> > > >> > > to
> > > >> > > > > > see the performance degradation since at start the cursor
> > > >> positions
> > > >> > > of
> > > >> > > > > 100
> > > >> > > > > > subscribers are similar when all the messages read hits the
> > > >> > > > > softValueCache.
> > > >> > > > > > But after some time, the cursor positions are different.
> > When
> > > >> these
> > > >> > > > > > positions are located more than 5 pages, it means some page
> > > >> would
> > > >> > be
> > > >> > > > read
> > > >> > > > > > back and forth. This can be proved by the trace log "adding
> > > >> > pageCache
> > > >> > > > > > pageNr=xxx into cursor = test-topic" in
> > PageCursorProviderImpl
> > > >> > where
> > > >> > > > some
> > > >> > > > > > pages are read a lot of times for the same subscriber. From
> > the
> > > >> > time
> > > >> > > > on,
> > > >> > > > > > the performance starts to degrade. So we set
> > page-max-cache-size
> > > >> > to 1
> > > >> > > > > > here just to make the test process more fast and it doesn't
> > > >> change
> > > >> > > the
> > > >> > > > > > final result.
> > > >> > > > > >
> > > >> > > > > > The softValueCache would be removed if memory is really
> > low, in
> > > >> > > > addition
> > > >> > > > > > the map size reaches capacity(default 5). In most cases, the
> > > >> > > > subscribers
> > > >> > > > > > are tailing read which are served by softValueCache(no need
> > to
> > > >> > bother
> > > >> > > > > > disk), thus we need to keep it. But When some subscribers
> > fall
> > > >> > > behind,
> > > >> > > > > they
> > > >> > > > > > need to read page not in softValueCache. After looking up
> > code,
> > > >> we
> > > >> > > > found
> > > >> > > > > one
> > > >> > > > > > depage round is following at most MAX_SCHEDULED_RUNNERS
> > deliver
> > > >> > round
> > > >> > > > in
> > > >> > > > > > most situations, and that's to say at most
> > > >> MAX_DELIVERIES_IN_LOOP *
> > > >> > > > > > MAX_SCHEDULED_RUNNERS number of messages would be depaged
> > next.
> > > >> If
> > > >> > > you
> > > >> > > > > > adjust QueueImpl logger to debug level, you would see logs
> > like
> > > >> > > "Queue
> > > >> > > > > > Memory Size after depage on queue=sub4 is 53478769 with
> > maxSize
> > > >> =
> > > >> > > > > 52428800.
> > > >> > > > > > Depaged 68 messages, pendingDelivery=1002,
> > > >> > > > intermediateMessageReferences=
> > > >> > > > > > 23162, queueDelivering=0". In order to depage less than 2000
> > > >> > > messages,
> > > >> > > > > > each subscriber has to read a whole page which is
> > unnecessary
> > > >> and
> > > >> > > > > wasteful.
> > > >> > > > > > In our test where one page(50MB) contains ~40000 messages,
> > one
> > > >> > > > subscriber
> > > >> > > > > > maybe read 40000/2000=20 times of page if softValueCache is
> > > >> evicted
> > > >> > > to
> > > >> > > > > > finish delivering it. This has drastically slowed down the
> > > >> process
> > > >> > > and
> > > >> > > > > > burdened on the disk. So we add the PageIndexCacheImpl and
> > read
> > > >> one
> > > >> > > > > message
> > > >> > > > > > each time rather than read all messages of page. In this
> > way,
> > > >> for
> > > >> > > each
> > > >> > > > > > subscriber each page is read only once after finishing
> > > >> delivering.
> > > >> > > > > >
> > > >> > > > > > Having said that, the softValueCache is used for tailing
> > read.
> > > >> If
> > > >> > > it's
> > > >> > > > > > evicted, it won't be reloaded to prevent from the issue
> > > >> illustrated
> > > >> > > > > above.
> > > >> > > > > > Instead the pageIndexCache would be used.
> > > >> > > > > >
> > > >> > > > > > Regarding implementation details, we noted that before
> > > >> delivering
> > > >> > > > page, a
> > > >> > > > > > pageCursorInfo is constructed which needs to read the whole
> > > >> page.
> > > >> > We
> > > >> > > > can
> > > >> > > > > > take this opportunity to construct the pageIndexCache. It's
> > very
> > > >> > > simple
> > > >> > > > > to
> > > >> > > > > > code. We also think of building a offset index file and some
> > > >> > concerns
> > > >> > > > > > stemed from following:
> > > >> > > > > >
> > > >> > > > > >    1. When to write and sync index file? Would it have some
> > > >> > > performance
> > > >> > > > > >    implications?
> > > >> > > > > >    2. If we have a index file, we can construct
> > pageCursorInfo
> > > >> > > through
> > > >> > > > > >    it(no need to read the page like before), but we need to
> > > >> write
> > > >> > the
> > > >> > > > > total
> > > >> > > > > >    message number into it first. Seems a little weird
> > putting
> > > >> this
> > > >> > > into
> > > >> > > > > the
> > > >> > > > > >    index file.
> > > >> > > > > >    3. If experiencing hard crash, a recover mechanism would
> > be
> > > >> > needed
> > > >> > > > to
> > > >> > > > > >    recover page and page index files, E.g. truncating to the
> > > >> valid
> > > >> > > > size.
> > > >> > > > > So
> > > >> > > > > >    how do we know which files need to be sanity checked?
> > > >> > > > > >    4. A variant binary search algorithm maybe needed, see
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > https://github.com/apache/kafka/blob/70ddd8af71938b4f5f6d1bb3df6243ef13359bcf/core/src/main/scala/kafka/log/AbstractIndex.scala
> > > >> > > > > >     .
> > > >> > > > > >    5. Unlike kafka from which user fetches lots of messages
> > at
> > > >> once
> > > >> > > and
> > > >> > > > > >    broker just needs to look up start offset from the index
> > file
> > > >> > > once,
> > > >> > > > > artemis
> > > >> > > > > >    delivers message one by one and that means we have to
> > look up
> > > >> > the
> > > >> > > > > index
> > > >> > > > > >    every time we deliver a message. Although the index file
> > is
> > > >> > > possibly
> > > >> > > > > in
> > > >> > > > > >    page cache, there are still chances we miss cache.
> > > >> > > > > >    6. Compatibility with old files.
> > > >> > > > > >
> > > >> > > > > > To sum that, kafka uses a mmaped index file and we use a
> > index
> > > >> > cache.
> > > >> > > > > Both
> > > >> > > > > > are designed to find physical file position according
> > > >> offset(kafka)
> > > >> > > or
> > > >> > > > > > message number(artemis). And we prefer the index cache bcs
> > it's
> > > >> > easy
> > > >> > > to
> > > >> > > > > > understand and maintain.
> > > >> > > > > >
> > > >> > > > > > We also tested the one subscriber case with the same setup.
> > > >> > > > > > The original:
> > > >> > > > > > consumer tps(11000msg/s) and latency:
> > > >> > > > > > [image: orig_single_subscriber.png]
> > > >> > > > > > producer tps(30000msg/s) and latency:
> > > >> > > > > > [image: orig_single_producer.png]
> > > >> > > > > > The pr:
> > > >> > > > > > consumer tps(14000msg/s) and latency:
> > > >> > > > > > [image: pr_single_consumer.png]
> > > >> > > > > > producer tps(30000msg/s) and latency:
> > > >> > > > > > [image: pr_single_producer.png]
> > > >> > > > > > It showed result is similar and event a little better in the
> > > >> case
> > > >> > of
> > > >> > > > > > single subscriber.
> > > >> > > > > >
> > > >> > > > > > We used our inner test platform and i think jmeter can also
> > be
> > > >> used
> > > >> > > to
> > > >> > > > > > test again it.
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Clebert Suconic
> >



-- 
Clebert Suconic





Reply via email to