[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708486#comment-13708486
 ] 

Han Jiang edited comment on LUCENE-3069 at 7/15/13 2:35 PM:
------------------------------------------------------------

bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

bq. It doesn't impl equals (must it really impl hashCode?)

-Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two 
fst nodes can be 'merged'.-
Oops, I forgot it still relys on equals to make sure two instance really 
matches, ok, I'll add that.
                
      was (Author: billy):
    bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer 
based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState 
to enum, which 
doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always 
makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, 
is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

bq. It doesn't impl equals (must it really impl hashCode?)

Hmm, do we need equals? Also, NodeHash relys on hashCode to judge whether two 
fst nodes can be 'merged'.
                  
> Lucene should have an entirely memory resident term dictionary
> --------------------------------------------------------------
>
>                 Key: LUCENE-3069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3069
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0-ALPHA
>            Reporter: Simon Willnauer
>            Assignee: Han Jiang
>              Labels: gsoc2013
>             Fix For: 4.4
>
>         Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to