[ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915713#action_12915713
 ] 

Simon Willnauer commented on LUCENE-2662:
-----------------------------------------

{quote}
How about renaming key back to ord? And then maybe rename values to
bytesStart? And in their decls add comments saying they are indexed
by hash code? And maybe rename addByOffset -> addByBytesStart?
{quote}
I don't like addByBytesStart I would like to keep offset since it really is an 
offset into the pool. addByPoolOffset?
The names ord and bytesStart are a good compromise :) lets shoot for that.


{quote}
On the nocommit in ByteBlockPool - I think that's fine? It's an
internal class....
{quote}
you refer to this: // nocommit - public arrays are not nice! ?
yeah that more of an style thing but if somebody changes them its their fault 
for being stupid I guess.

{quote}
The nocommit in BytesRefHash seems wrong? (Ie, compact is used
internally)... though maybe we make it private if it's not used
externally?
{quote}

Ah yeah thats bogus - its from a previous iteration which was wrong as well, I 
will remove.

{quote}
On the "nocommit factor this out!" in THPF.java... I agree, the
postingsArray.textStarts should go away right? Ie, it's a
[wasteful] copy of what the BytesRefHash is already storing?
{quote}
Yeah that is the reason for that nocommit. Yet, I though about this a little 
and I have two options for this.
 * we could factor out a super class from ParallelPostingArray which only has 
the textStart int array, the grow and copy method and let ParallelPostingArray 
subclass it.
BytesRefHash would accept this class, don't have a good name for it but lets 
call it TextStartArray for now, and use it internally. It would call grow() 
once needed inside BytesRefHash and all the other code would be unchanged since 
PPA is a subclass. 
* the other way would be to bind the ByteRefHash to the postings array which 
seems odd to me though.

More ideas?

{quote}
Can we impl BytesRefHash.bytesUsed as an AtomicLong (hmm maybe
AtomicInt - none of these classes can address > 2GB)? Then the
pool would add in blockSize every time it binds a new block. That
method (DW.bytesUsed) is called alot - at least once on every
addDoc.
{quote}

I did exactly that in the not yet uploaded patch. But I figured that it would 
maybe make more sense to use that AtomicInt in the allocator as well as in THPF 
or is that what you mean?

{quote}
I'm confused again - when do we use RecyclingByteBlockAllocator
from a single thread...? Ie, why did the sync need to be
conditional for this class, again....? It seems like we always
need it sync'd (both the main pool & per-doc pool need this)? If
so we can simplify and make these methods sync'd?
{quote}

man, I am sorry - I  thought I will use this in LUCENE-2186 in a single 
threaded env but if so I should change it there if needed. I was one step ahead 
though.
I will change and maybe have a second one if needed. Agree?

simon








> BytesHash
> ---------
>
>                 Key: LUCENE-2662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch, 4.0
>            Reporter: Jason Rutherglen
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch, 4.0
>
>         Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
> LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to