[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915713#action_12915713 ]
Simon Willnauer commented on LUCENE-2662: ----------------------------------------- {quote} How about renaming key back to ord? And then maybe rename values to bytesStart? And in their decls add comments saying they are indexed by hash code? And maybe rename addByOffset -> addByBytesStart? {quote} I don't like addByBytesStart I would like to keep offset since it really is an offset into the pool. addByPoolOffset? The names ord and bytesStart are a good compromise :) lets shoot for that. {quote} On the nocommit in ByteBlockPool - I think that's fine? It's an internal class.... {quote} you refer to this: // nocommit - public arrays are not nice! ? yeah that more of an style thing but if somebody changes them its their fault for being stupid I guess. {quote} The nocommit in BytesRefHash seems wrong? (Ie, compact is used internally)... though maybe we make it private if it's not used externally? {quote} Ah yeah thats bogus - its from a previous iteration which was wrong as well, I will remove. {quote} On the "nocommit factor this out!" in THPF.java... I agree, the postingsArray.textStarts should go away right? Ie, it's a [wasteful] copy of what the BytesRefHash is already storing? {quote} Yeah that is the reason for that nocommit. Yet, I though about this a little and I have two options for this. * we could factor out a super class from ParallelPostingArray which only has the textStart int array, the grow and copy method and let ParallelPostingArray subclass it. BytesRefHash would accept this class, don't have a good name for it but lets call it TextStartArray for now, and use it internally. It would call grow() once needed inside BytesRefHash and all the other code would be unchanged since PPA is a subclass. * the other way would be to bind the ByteRefHash to the postings array which seems odd to me though. More ideas? {quote} Can we impl BytesRefHash.bytesUsed as an AtomicLong (hmm maybe AtomicInt - none of these classes can address > 2GB)? Then the pool would add in blockSize every time it binds a new block. That method (DW.bytesUsed) is called alot - at least once on every addDoc. {quote} I did exactly that in the not yet uploaded patch. But I figured that it would maybe make more sense to use that AtomicInt in the allocator as well as in THPF or is that what you mean? {quote} I'm confused again - when do we use RecyclingByteBlockAllocator from a single thread...? Ie, why did the sync need to be conditional for this class, again....? It seems like we always need it sync'd (both the main pool & per-doc pool need this)? If so we can simplify and make these methods sync'd? {quote} man, I am sorry - I thought I will use this in LUCENE-2186 in a single threaded env but if so I should change it there if needed. I was one step ahead though. I will change and maybe have a second one if needed. Agree? simon > BytesHash > --------- > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch, 4.0 > Reporter: Jason Rutherglen > Assignee: Simon Willnauer > Priority: Minor > Fix For: Realtime Branch, 4.0 > > Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, > LUCENE-2662.patch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org