It shouldn’t need to read it unless it is a rollback and it’s being redelovered. They would be an exceptional case.
On Mon, Jul 22, 2019 at 5:46 AM yw yw <[email protected]> wrote: > Yes, the reader per queue would not close page file until moving to next > page. However there are chances that files might be opened constantly: > 1. Paged transactions. Suppose current cursor position is at page2 and page > transactions are at page1, when transactions are committed, page1 might be > opened constantly to read message. > 2. Scheduled or rollbacked messages. Positions of theses messages might > fall behind current cursor position, leading to page files opening > constantly. > > <[email protected]> 于2019年7月19日周五 下午3:45写道: > > > Surly you keep the file open, else you will incur perf penalty of having > > to open the file constantly. > > > > > > > > > > Would be faster to have the reader hold the file open and have one per > > queue. Avoiding constant opening and closing of a file. And all the > > overhead of that at the os level > > > > > > > > > > Get Outlook for Android > > > > > > > > > > > > > > > > On Fri, Jul 19, 2019 at 8:30 AM +0100, "yw yw" <[email protected]> > wrote: > > > > > > > > > > > > > > > > > > > > > > > But the real problem here will be the number of openFiles. Each Page > > will have an Open File, what will keep a lot of open files on the > > system. Correct? > > Not sure I made it clear enough. > > My thought is: Since PageCursorInfo decides whether entire page is > consumed > > based on numberOfMessages and PageSubscriptionImpl decides whether to > move > > to next page based on the current cursor page position and > > numberOfMessages, we store a map of in > > PageCursorProviderImpl after page is evicted. According to > > numberOfMessageseach PageSubscriptionImpl can build PageCursorInfo and if > > current cursor page position is in the range of current page, PageReader > > can be built to help read messages. So there are really no opened page > > files in PageCursorProviderImpl. > > Without this map, each PageSubscriptionImpl has to first read the page > file > > to get numberOfMessages, then build PageCursorInfo/PageReader. > > > > I agree to put the PageReader to PageSubscriptionImpl, just not sure > > specific implementation details :) > > > > 于2019年7月19日周五 下午2:10写道: > > > > > +1 for having one per queue. Def a better idea than having to hold a > > > cache. > > > > > > > > > > > > > > > Get Outlook for Android > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 19, 2019 at 4:37 AM +0100, "Clebert Suconic" < > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But the real problem here will be the number of openFiles. Each Page > > > will have an Open File, what will keep a lot of open files on the > > > system. Correct? > > > > > > I believe the impact of having the files moving to the Subscription > > > wouldn't be that much, and we would fix the problem. WE wouldn't need > > > a cache at all, as we just keep the File we need at the current > > > cursor. > > > > > > On Tue, Jul 16, 2019 at 10:40 PM yw yw wrote: > > > > > > > > I did consider the case where all pages are instantiated as > > PageReaders. > > > > That's really a problem. > > > > > > > > The pros of pr is every page is read only once to build PageReader > and > > > > shared by all the queues. The cons of pr is many PageReaders are > > probably > > > > instantiated if consumers make slow/no progress in several queues > > whereas > > > > fast in other queues(I think it's the only cause leading to the > corner > > > > case, right?). This means too many open files and too much memory. > > > > > > > > The pros of duplicated PageReader is there are fixed number of > > > PageReaders > > > > as with queues at the same time. > > > > The cons is each queue has to read the page once to build their own > > > > PageReader if page cache is evicted. I'm not sure how this will > affect > > > > performance. > > > > > > > > The point is we need the number of messages in the page which is used > > by > > > > PageCursorInfo and PageSubscription::internalGetNext, so we have to > > read > > > > the page file. How about we only cache the number of messages in each > > > page > > > > instead of PageReader and build PageReader in each queue. While we > > > > encounter the corner case, only pair data is permanently in > > > > memory that I assume is smaller than completed PageCursorInfo data. > > This > > > > way we achieve the performance gain at a small price. > > > > > > > > Clebert Suconic 于2019年7月16日周二 下午10:18写道: > > > > > > > > > I just came back after a 2 weeks deserved break and I was looking > at > > > > > this.. and I can say. it's well done.. nice job! it's a lot > simpler! > > > > > > > > > > However there's one question now. which is probably a further > > > > > improvement. Shouldn't the pageReader be instantiated at the > > > > > PageSubscription. > > > > > > > > > > That means.. if there's no page cache, in case of the page been > > > > > evicted, the Subscription would then create a new Page/PageReader > > > > > pair. and dispose it when it's done (meaning, moved to a different > > > > > page). > > > > > > > > > > As you are solving the case with many subscriptions, wouldn't you > hit > > > > > a corner case where all Pages are instantiated as PageReaders? > > > > > > > > > > > > > > > I feel like it would be better to eventually duplicate a PageReader > > > > > and close it when done. > > > > > > > > > > > > > > > Or did you already consider that possibility and still think it's > > best > > > > > to keep this cache of PageReaders? > > > > > > > > > > On Sat, Jul 13, 2019 at 12:15 AM > > > > > wrote: > > > > > > > > > > > > Could a squashed PR be sent? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Get Outlook for Android > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 12, 2019 at 2:23 PM +0100, "yw yw" > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > I have finished work on the new implementation(not yet tests and > > > > > > configuration) as suggested by franz. > > > > > > > > > > > > I put fileOffsetset in the PagePosition and add a new class > > > PageReader > > > > > > which is a wrapper of the page that implements PageCache > interface. > > > The > > > > > > PageReader class is used to read page file if cache is evicted. > For > > > > > detail, > > > > > > see > > > > > > > > > > > > > > > > > https://github.com/wy96f/activemq-artemis/commit/3f388c2324738f01f53ce806b813220d28d40987 > > > > > > > > > > > > I deployed some tests and results below: > > > > > > 1. Running in 51MB size page and 1 page cache in the case of 100 > > > > > multicast > > > > > > queues. > > > > > > https://filebin.net/wnyan7d2n1qgfsvg > > > > > > 2. Running in 5MB size page and 100 page cache in the case of 100 > > > > > multicast > > > > > > queues. > > > > > > https://filebin.net/re0989vz7ib1c5mc > > > > > > 3. Running in 51MB size page and 1 page cache in the case of 1 > > queue. > > > > > > https://filebin.net/3qndct7f11qckrus > > > > > > > > > > > > The results seem good, similar with the implementation in the pr. > > The > > > > > most > > > > > > important is the index cache data is removed, no worry about > extra > > > > > overhead > > > > > > :) > > > > > > > > > > > > yw yw 于2019年7月4日周四 下午5:38写道: > > > > > > > > > > > > > Hi, michael > > > > > > > > > > > > > > Thanks for the advise. For the current pr, we can use two > arrays > > > where > > > > > one > > > > > > > records the message number and the other one corresponding > offset > > > to > > > > > > > optimize the memory usage. For the franz's approch, we will > also > > > work > > > > > on > > > > > > > a early prototyping implementation. After that, we would take > > some > > > > > basic > > > > > > > tests in different scenarios. > > > > > > > > > > > > > > 于2019年7月2日周二 上午7:08写道: > > > > > > > > > > > > > >> Point though is an extra index cache layer is needed. The > > > overhead of > > > > > > >> that means the total paged capacity will be more limited as > that > > > > > overhead > > > > > > >> isnt just an extra int per reference. E.g. in the pr the > current > > > impl > > > > > isnt > > > > > > >> very memory optimised, could an int array be used or at worst > an > > > open > > > > > > >> primitive int int hashmap. > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> This is why i really prefer franz's approach. > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> Also what ever we do, we need the new behaviour configurable, > so > > > > > should a > > > > > > >> use case not thought about they won't be impacted. E.g. the > > change > > > > > should > > > > > > >> not be a surprise, it should be something you toggle on. > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> Get Outlook for Android > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> On Mon, Jul 1, 2019 at 1:01 PM +0100, "yw yw" wrote: > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> Hi, > > > > > > >> We've took a test against your configuration: > > > > > > >> 5Mb10010Mb. > > > > > > >> The current code: 7000msg/s sent and 18000msg/s received. > > > > > > >> Pr code:16000msg/s received and 8200msg/s sent. > > > > > > >> Like you said, the performance boosts by using much smaller > page > > > file > > > > > and > > > > > > >> holding many more for current code. > > > > > > >> > > > > > > >> Not sure what implications would have using smaller page file, > > the > > > > > > >> producer > > > > > > >> performance may reduce since switching files is more frequent, > > > number > > > > > of > > > > > > >> file handle would increase? > > > > > > >> > > > > > > >> While our consumer in the test just echos, nothing to do after > > > > > receiving > > > > > > >> message, the consumer in the real world may be busy doing > > > business. > > > > > This > > > > > > >> means references and page caches reside in memory longer and > may > > > be > > > > > > >> evicted > > > > > > >> more easily when producers are sending all the time. > > > > > > >> > > > > > > >> Since We don't know how many subscribers there are, it is not > a > > > > > scalable > > > > > > >> approch. We can't reduce page file size unlimited to fit the > > > number of > > > > > > >> subscribers. The code should accommodate to all kinds of > > > > > configurations. > > > > > > >> We > > > > > > >> adjust configuration for trade off as needed, not work around > > IMO. > > > > > > >> In our company, ~200 queues(60% are owned by some addresses) > are > > > > > deployed > > > > > > >> in the broker. We can't set all to e.g. 100 page caches(too > much > > > > > memory), > > > > > > >> and neither set different size according to address > pattern(hard > > > for > > > > > > >> operation). In the multi tenants cluster, we prefer > availability > > > and > > > > > to > > > > > > >> avoid memory exhausted, we set pageSize to 30MB, max cache > size > > > to 1 > > > > > and > > > > > > >> max size to 31MB. It's running well in one of our clusters > now:) > > > > > > >> > > > > > > >> 于2019年6月29日周六 上午2:35写道: > > > > > > >> > > > > > > >> > I think some of that is down to configuration. If you think > > you > > > > > could > > > > > > >> > configure paging to have much smaller page files but have > many > > > more > > > > > > >> held. > > > > > > >> > That way the reference sizes will be far smaller and pages > > > dropping > > > > > in > > > > > > >> and > > > > > > >> > out would be less. E.g. if you expect 100 being read make it > > > 100 but > > > > > > >> make > > > > > > >> > the page sizes smaller so the overhead is far less > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > Get Outlook for Android > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > On Thu, Jun 27, 2019 at 11:10 AM +0100, "yw yw" wrote: > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > "At last for one message we maybe read twice: first we read > > > page and > > > > > > >> create > > > > > > >> > pagereference; second we requery message after its reference > > is > > > > > > >> removed. " > > > > > > >> > > > > > > > >> > I just realized it was wrong. One message maybe read many > > times. > > > > > Think > > > > > > >> of > > > > > > >> > this: When #1~#2000 msg is delivered, need to depage > > #2001-#4000 > > > > > msg, > > > > > > >> > reading the whole page; When #2001~#4000 msg is deliverd, > need > > > to > > > > > depage > > > > > > >> > #4001~#6000 msg, reading page again, etc. > > > > > > >> > > > > > > > >> > One message maybe read three times if we don't depage until > > all > > > > > messages > > > > > > >> > are delivered. For example, we have 3 pages p1, p2,p3 and > > > message m1 > > > > > > >> which > > > > > > >> > is at top part of the p2. In our case(max-size-bytes=51MB, a > > > little > > > > > > >> bigger > > > > > > >> > than page size), first depage round reads bottom half of p1 > > and > > > top > > > > > > >> part of > > > > > > >> > p2; second depage round reads bottom half of p2 and top part > > of > > > p3. > > > > > > >> > Therforce p2 is read twice and m1 maybe read three times if > > > > > requeryed. > > > > > > >> > > > > > > > >> > Be honest, i don't know how to fix the problem above with > the > > > > > > >> > decrentralized approch. The point is not how we rely on os > > > cache, > > > > > it's > > > > > > >> that > > > > > > >> > we do it the wrong way, shouldn't read whole page(50MB) just > > for > > > > > ~2000 > > > > > > >> > messages. Also there is no need to save 51MB > > PagedReferenceImpl > > > in > > > > > > >> memory. > > > > > > >> > When 100 queues occupy 5100MB memory, the message references > > are > > > > > very > > > > > > >> > likely to be removed. > > > > > > >> > > > > > > > >> > > > > > > > >> > Francesco Nigro 于2019年6月27日周四 下午5:05写道: > > > > > > >> > > > > > > > >> > > > > > > > > > >> > > > which means the offset info is 100 times large compared > > to > > > the > > > > > > >> shared > > > > > > >> > > > page index cache. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > I would check with JOL plugin for exact numbers.. > > > > > > >> > > I see with it that we would have an increase of 4 bytes > for > > > each > > > > > > >> > > PagedRefeferenceImpl, totally decrentralized vs > > > > > > >> > > a centralized approach (the cache). In the economy of a > > fully > > > > > loaded > > > > > > >> > > broker, if we care about scaling need to understand if the > > > memory > > > > > > >> > tradeoff > > > > > > >> > > is important enough > > > > > > >> > > to choose one of the 2 approaches. > > > > > > >> > > My point is that paging could be made totally based on the > > OS > > > page > > > > > > >> cache > > > > > > >> > if > > > > > > >> > > GC would get in the middle, deleting any previous > mechanism > > of > > > > > page > > > > > > >> > > caching...simplifying the process at it is. > > > > > > >> > > Using a 2 level cache with such centralized approach can > > > work, but > > > > > > >> will > > > > > > >> > add > > > > > > >> > > a level of complexity that IMO could be saved... > > > > > > >> > > What do you think could be the benefit of the > decentralized > > > > > solution > > > > > > >> if > > > > > > >> > > compared with the one proposed in the PR? > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > Il giorno gio 27 giu 2019 alle ore 10:41 yw yw ha > > > > > > >> > > scritto: > > > > > > >> > > > > > > > > >> > > > Sorry, I missed the PageReferece part. > > > > > > >> > > > > > > > > > >> > > > The lifecyle of PageReference is: depage(in > > > > > > >> > > intermediateMessageReferences) > > > > > > >> > > > -> deliver(in messageReferences) -> waiting for ack(in > > > > > > >> deliveringRefs) > > > > > > >> > -> > > > > > > >> > > > removed. Every queue would create it's own PageReference > > > which > > > > > means > > > > > > >> > the > > > > > > >> > > > offset info is 100 times large compared to the shared > page > > > index > > > > > > >> cache. > > > > > > >> > > > If we keep 51MB pageReference size in memory, as i said > in > > > pr, > > > > > "For > > > > > > >> > > > multiple subscribers to the same address, just one > > executor > > > is > > > > > > >> > > responsible > > > > > > >> > > > for delivering which means at the same moment only one > > > queue is > > > > > > >> > > delivering. > > > > > > >> > > > Thus the queue maybe stalled for a long time. We get > > > > > queueMemorySize > > > > > > >> > > > messages into memory, and when we deliver these after a > > long > > > > > time, > > > > > > >> we > > > > > > >> > > > probably need to query message and read page file > again.". > > > At > > > > > last > > > > > > >> for > > > > > > >> > > one > > > > > > >> > > > message we maybe read twice: first we read page and > create > > > > > > >> > pagereference; > > > > > > >> > > > second we requery message after its reference is > removed. > > > > > > >> > > > > > > > > > >> > > > For the shared page index cache design, each message > just > > > need > > > > > to be > > > > > > >> > read > > > > > > >> > > > from file once. > > > > > > >> > > > > > > > > > >> > > > Michael Pearce 于2019年6月27日周四 下午3:03写道: > > > > > > >> > > > > > > > > > >> > > > > Hi > > > > > > >> > > > > > > > > > > >> > > > > First of all i think this is an excellent effort, and > > > could > > > > > be a > > > > > > >> > > > potential > > > > > > >> > > > > massive positive change. > > > > > > >> > > > > > > > > > > >> > > > > Before making any change on such scale, i do think we > > > need to > > > > > > >> ensure > > > > > > >> > we > > > > > > >> > > > > have sufficient benchmarks on a number of scenarios, > not > > > just > > > > > one > > > > > > >> use > > > > > > >> > > > case, > > > > > > >> > > > > and the benchmark tool used does need to be available > > > openly > > > > > so > > > > > > >> that > > > > > > >> > > > others > > > > > > >> > > > > can verify the measures and check on their setups. > > > > > > >> > > > > > > > > > > >> > > > > Some additional scenarios i would want/need covering > > are: > > > > > > >> > > > > > > > > > > >> > > > > PageCache set to 5, and all consumers keeping up, but > > > lagging > > > > > > >> enough > > > > > > >> > to > > > > > > >> > > > be > > > > > > >> > > > > reading from the same 1st page cache, latency and > > > throughput > > > > > need > > > > > > >> to > > > > > > >> > be > > > > > > >> > > > > measured for all. > > > > > > >> > > > > PageCache set to 5 and all consumers but one keeping > up > > > but > > > > > > >> lagging > > > > > > >> > > > enough > > > > > > >> > > > > to be reading from the same 1st page cahce, but the > one > > is > > > > > falling > > > > > > >> > off > > > > > > >> > > > the > > > > > > >> > > > > end, causing the page cache swapping, measure latecy > and > > > > > > >> througput of > > > > > > >> > > > those > > > > > > >> > > > > keeping up in the 1st page cache not caring for the > one. > > > > > > >> > > > > > > > > > > >> > > > > Regards to solution some alternative approach to > discuss > > > > > > >> > > > > > > > > > > >> > > > > In your scenario if i understand correctly each > > > subscriber is > > > > > > >> > > effectivly > > > > > > >> > > > > having their own queue (1 to 1 mapping) not sharing. > > > > > > >> > > > > You mention kafka and say multiple consumers doent > read > > > > > serailly > > > > > > >> on > > > > > > >> > the > > > > > > >> > > > > address and this is true, but per queue processing > > through > > > > > > >> messages > > > > > > >> > > > > (dispatch) is still serial even with multiple shared > > > > > consumers on > > > > > > >> a > > > > > > >> > > > queue. > > > > > > >> > > > > > > > > > > >> > > > > What about keeping the existing mechanism but having a > > > queue > > > > > hold > > > > > > >> > > > reference > > > > > > >> > > > > to a page cache that the queue is currently on, being > > kept > > > > > from gc > > > > > > >> > > (e.g. > > > > > > >> > > > > not soft) therefore meaning page cache isnt being > > swapped > > > > > around, > > > > > > >> > when > > > > > > >> > > > you > > > > > > >> > > > > have queues (in your case subscribers) swapping > > pagecaches > > > > > back > > > > > > >> and > > > > > > >> > > forth > > > > > > >> > > > > avoidning the constant re-read issue. > > > > > > >> > > > > > > > > > > >> > > > > Also i think Franz had an excellent idea, do away with > > > > > pagecache > > > > > > >> in > > > > > > >> > its > > > > > > >> > > > > current form entirely, ensure the offset is kept with > > the > > > > > > >> reference > > > > > > >> > and > > > > > > >> > > > > rely on OS caching keeping hot blocks/data. > > > > > > >> > > > > > > > > > > >> > > > > Best > > > > > > >> > > > > Michael > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > On Thu, 27 Jun 2019 at 05:13, yw yw wrote: > > > > > > >> > > > > > > > > > > >> > > > > > Hi, folks > > > > > > >> > > > > > > > > > > > >> > > > > > This is the discussion about "ARTEMIS-2399 Fix > > > performance > > > > > > >> > > degradation > > > > > > >> > > > > > when there are a lot of subscribers". > > > > > > >> > > > > > > > > > > > >> > > > > > First apologize i didn't clarify our thoughts. > > > > > > >> > > > > > > > > > > > >> > > > > > As noted in the part of Environment, > > > page-max-cache-size is > > > > > set > > > > > > >> to > > > > > > >> > 1 > > > > > > >> > > > > > meaning at most one page is allowed in > softValueCache. > > > We > > > > > have > > > > > > >> > tested > > > > > > >> > > > > with > > > > > > >> > > > > > the default page-max-cache-size which is 5, it would > > > take > > > > > some > > > > > > >> time > > > > > > >> > > to > > > > > > >> > > > > > see the performance degradation since at start the > > > cursor > > > > > > >> positions > > > > > > >> > > of > > > > > > >> > > > > 100 > > > > > > >> > > > > > subscribers are similar when all the messages read > > hits > > > the > > > > > > >> > > > > softValueCache. > > > > > > >> > > > > > But after some time, the cursor positions are > > different. > > > > > When > > > > > > >> these > > > > > > >> > > > > > positions are located more than 5 pages, it means > some > > > page > > > > > > >> would > > > > > > >> > be > > > > > > >> > > > read > > > > > > >> > > > > > back and forth. This can be proved by the trace log > > > "adding > > > > > > >> > pageCache > > > > > > >> > > > > > pageNr=xxx into cursor = test-topic" in > > > > > PageCursorProviderImpl > > > > > > >> > where > > > > > > >> > > > some > > > > > > >> > > > > > pages are read a lot of times for the same > subscriber. > > > From > > > > > the > > > > > > >> > time > > > > > > >> > > > on, > > > > > > >> > > > > > the performance starts to degrade. So we set > > > > > page-max-cache-size > > > > > > >> > to 1 > > > > > > >> > > > > > here just to make the test process more fast and it > > > doesn't > > > > > > >> change > > > > > > >> > > the > > > > > > >> > > > > > final result. > > > > > > >> > > > > > > > > > > > >> > > > > > The softValueCache would be removed if memory is > > really > > > > > low, in > > > > > > >> > > > addition > > > > > > >> > > > > > the map size reaches capacity(default 5). In most > > > cases, the > > > > > > >> > > > subscribers > > > > > > >> > > > > > are tailing read which are served by > softValueCache(no > > > need > > > > > to > > > > > > >> > bother > > > > > > >> > > > > > disk), thus we need to keep it. But When some > > > subscribers > > > > > fall > > > > > > >> > > behind, > > > > > > >> > > > > they > > > > > > >> > > > > > need to read page not in softValueCache. After > looking > > > up > > > > > code, > > > > > > >> we > > > > > > >> > > > found > > > > > > >> > > > > one > > > > > > >> > > > > > depage round is following at most > > MAX_SCHEDULED_RUNNERS > > > > > deliver > > > > > > >> > round > > > > > > >> > > > in > > > > > > >> > > > > > most situations, and that's to say at most > > > > > > >> MAX_DELIVERIES_IN_LOOP * > > > > > > >> > > > > > MAX_SCHEDULED_RUNNERS number of messages would be > > > depaged > > > > > next. > > > > > > >> If > > > > > > >> > > you > > > > > > >> > > > > > adjust QueueImpl logger to debug level, you would > see > > > logs > > > > > like > > > > > > >> > > "Queue > > > > > > >> > > > > > Memory Size after depage on queue=sub4 is 53478769 > > with > > > > > maxSize > > > > > > >> = > > > > > > >> > > > > 52428800. > > > > > > >> > > > > > Depaged 68 messages, pendingDelivery=1002, > > > > > > >> > > > intermediateMessageReferences= > > > > > > >> > > > > > 23162, queueDelivering=0". In order to depage less > > than > > > 2000 > > > > > > >> > > messages, > > > > > > >> > > > > > each subscriber has to read a whole page which is > > > > > unnecessary > > > > > > >> and > > > > > > >> > > > > wasteful. > > > > > > >> > > > > > In our test where one page(50MB) contains ~40000 > > > messages, > > > > > one > > > > > > >> > > > subscriber > > > > > > >> > > > > > maybe read 40000/2000=20 times of page if > > > softValueCache is > > > > > > >> evicted > > > > > > >> > > to > > > > > > >> > > > > > finish delivering it. This has drastically slowed > down > > > the > > > > > > >> process > > > > > > >> > > and > > > > > > >> > > > > > burdened on the disk. So we add the > PageIndexCacheImpl > > > and > > > > > read > > > > > > >> one > > > > > > >> > > > > message > > > > > > >> > > > > > each time rather than read all messages of page. In > > this > > > > > way, > > > > > > >> for > > > > > > >> > > each > > > > > > >> > > > > > subscriber each page is read only once after > finishing > > > > > > >> delivering. > > > > > > >> > > > > > > > > > > > >> > > > > > Having said that, the softValueCache is used for > > tailing > > > > > read. > > > > > > >> If > > > > > > >> > > it's > > > > > > >> > > > > > evicted, it won't be reloaded to prevent from the > > issue > > > > > > >> illustrated > > > > > > >> > > > > above. > > > > > > >> > > > > > Instead the pageIndexCache would be used. > > > > > > >> > > > > > > > > > > > >> > > > > > Regarding implementation details, we noted that > before > > > > > > >> delivering > > > > > > >> > > > page, a > > > > > > >> > > > > > pageCursorInfo is constructed which needs to read > the > > > whole > > > > > > >> page. > > > > > > >> > We > > > > > > >> > > > can > > > > > > >> > > > > > take this opportunity to construct the > pageIndexCache. > > > It's > > > > > very > > > > > > >> > > simple > > > > > > >> > > > > to > > > > > > >> > > > > > code. We also think of building a offset index file > > and > > > some > > > > > > >> > concerns > > > > > > >> > > > > > stemed from following: > > > > > > >> > > > > > > > > > > > >> > > > > > 1. When to write and sync index file? Would it > have > > > some > > > > > > >> > > performance > > > > > > >> > > > > > implications? > > > > > > >> > > > > > 2. If we have a index file, we can construct > > > > > pageCursorInfo > > > > > > >> > > through > > > > > > >> > > > > > it(no need to read the page like before), but we > > > need to > > > > > > >> write > > > > > > >> > the > > > > > > >> > > > > total > > > > > > >> > > > > > message number into it first. Seems a little > weird > > > > > putting > > > > > > >> this > > > > > > >> > > into > > > > > > >> > > > > the > > > > > > >> > > > > > index file. > > > > > > >> > > > > > 3. If experiencing hard crash, a recover > mechanism > > > would > > > > > be > > > > > > >> > needed > > > > > > >> > > > to > > > > > > >> > > > > > recover page and page index files, E.g. > truncating > > > to the > > > > > > >> valid > > > > > > >> > > > size. > > > > > > >> > > > > So > > > > > > >> > > > > > how do we know which files need to be sanity > > checked? > > > > > > >> > > > > > 4. A variant binary search algorithm maybe > needed, > > > see > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > https://github.com/apache/kafka/blob/70ddd8af71938b4f5f6d1bb3df6243ef13359bcf/core/src/main/scala/kafka/log/AbstractIndex.scala > > > > > > >> > > > > > . > > > > > > >> > > > > > 5. Unlike kafka from which user fetches lots of > > > messages > > > > > at > > > > > > >> once > > > > > > >> > > and > > > > > > >> > > > > > broker just needs to look up start offset from > the > > > index > > > > > file > > > > > > >> > > once, > > > > > > >> > > > > artemis > > > > > > >> > > > > > delivers message one by one and that means we > have > > to > > > > > look up > > > > > > >> > the > > > > > > >> > > > > index > > > > > > >> > > > > > every time we deliver a message. Although the > index > > > file > > > > > is > > > > > > >> > > possibly > > > > > > >> > > > > in > > > > > > >> > > > > > page cache, there are still chances we miss > cache. > > > > > > >> > > > > > 6. Compatibility with old files. > > > > > > >> > > > > > > > > > > > >> > > > > > To sum that, kafka uses a mmaped index file and we > > use a > > > > > index > > > > > > >> > cache. > > > > > > >> > > > > Both > > > > > > >> > > > > > are designed to find physical file position > according > > > > > > >> offset(kafka) > > > > > > >> > > or > > > > > > >> > > > > > message number(artemis). And we prefer the index > cache > > > bcs > > > > > it's > > > > > > >> > easy > > > > > > >> > > to > > > > > > >> > > > > > understand and maintain. > > > > > > >> > > > > > > > > > > > >> > > > > > We also tested the one subscriber case with the same > > > setup. > > > > > > >> > > > > > The original: > > > > > > >> > > > > > consumer tps(11000msg/s) and latency: > > > > > > >> > > > > > [image: orig_single_subscriber.png] > > > > > > >> > > > > > producer tps(30000msg/s) and latency: > > > > > > >> > > > > > [image: orig_single_producer.png] > > > > > > >> > > > > > The pr: > > > > > > >> > > > > > consumer tps(14000msg/s) and latency: > > > > > > >> > > > > > [image: pr_single_consumer.png] > > > > > > >> > > > > > producer tps(30000msg/s) and latency: > > > > > > >> > > > > > [image: pr_single_producer.png] > > > > > > >> > > > > > It showed result is similar and event a little > better > > > in the > > > > > > >> case > > > > > > >> > of > > > > > > >> > > > > > single subscriber. > > > > > > >> > > > > > > > > > > > >> > > > > > We used our inner test platform and i think jmeter > can > > > also > > > > > be > > > > > > >> used > > > > > > >> > > to > > > > > > >> > > > > > test again it. > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Clebert Suconic > > > > > > > > > > > > > > > > > -- > > > Clebert Suconic > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Clebert Suconic
