[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915797#action_12915797 ]
Jason Rutherglen commented on LUCENE-2575: ------------------------------------------ {quote}Hmm so we also copy-on-write a given byte[] block? Is this because JMM can't make the guarantees we need about other threads reading the bytes written?{quote} Correct. The example of where everything could go wrong is the rewriting of a byte slice forwarding address while a reader is traversing the same slice. The forwarding address could be half-written, and suddenly we're bowling in lane 6 when we should be in lane 9. By making a [read-only] ref copy of the byte[]s we're ensuring that the byte[]s are in a consistent state while being read. So I'm using a boolean[] to tell the writer whether it needs to make a copy of the byte[]. The boolean[] also tells the writer if it's already made a copy. Whereas in IndexReader.clone we're keeping ref counts of the norms byte[], and decrementing each time we make a copy until finally it's 0, and then we give it to the GC (here we'd do the same or give it back to the allocator). {quote}But even if we do reuse, we will cause tons of garbage, until the still-open readers are closed? Ie we cannot re-use the byte[] being "held open" by any NRT reader that's still referencing the in-RAM segment after that segment had been flushed to disk.{quote} If we do pool, it won't be very difficult to implement, we have a single point of check-in/out of the byte[]s in the allocator class. In terms of the first implementation, by all means we should minimize "tricky" areas of the code by not implementing skip lists and byte[] pooling. {quote}It's not like 3.x's situation with FieldCache or terms dict index, for example....{quote} What's the GC issue with FieldCache and terms dict? {quote}BTW I'm assuming IW will now be modal? Ie caller must tell IW up front if NRT readers will be used? Because non-NRT users shouldn't have to pay all this added RAM cost?{quote} At present it's still all on demand. Skip lists will require going modal because we need to build those upfront (well we could go back and build them on demand, that'd be fun). There's the term-freq parallel array, however if getReader is never called, it's a single additional array that's essentially innocuous, if useful. {quote}Hmm your'e right that each reader needs a private copy, to remain truly "point in time". This (4 bytes per unique term X number of readers reading that term) is a non-trivial addition of RAM.{quote} PagedInt time? However even that's not going to help much if in between getReader calls, 10,000s of terms were seen, we could have updated 1000s of pages. AtomicIntArray does not help because concurrency isn't the issue, it's point-in-timeness that's required. Still I guess PagedInt won't hurt, and in the case of minimal term freq changes, we'd still be potentially saving RAM. Is there some other data structure we could pull out of a hat and use? > Concurrent byte and int block implementations > --------------------------------------------- > > Key: LUCENE-2575 > URL: https://issues.apache.org/jira/browse/LUCENE-2575 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Fix For: Realtime Branch > > Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, > LUCENE-2575.patch > > > The current *BlockPool implementations aren't quite concurrent. > We really need something that has a locking flush method, where > flush is called at the end of adding a document. Once flushed, > the newly written data would be available to all other reading > threads (ie, postings etc). I'm not sure I understand the slices > concept, it seems like it'd be easier to implement a seekable > random access file like API. One'd seek to a given position, > then read or write from there. The underlying management of byte > arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org