Re[2]: Faster highlighting with TermPositionVectors (update)

Maxim Patramanskij Thu, 11 Nov 2004 03:52:50 -0800

Hello Mark.

I'm just wondered about the following piece of code from your latest
TokenSources class:


 public static TokenStream getAnyTokenStream(IndexReader reader,int docId, 
String field,Analyzer analyzer) throws IOException
    {
                TokenStream ts=null;

                TermFreqVector tfv=(TermFreqVector) 
reader.getTermFreqVector(docId,field);
                if(tfv!=null)
                {
                    if(tfv instanceof TermPositionVector)
                    {
                        //read pre-parsed token position info stored on disk
                        TermPositionVector tpv=(TermPositionVector) 
reader.getTermFreqVector(docId,field);
                         ts=getTokenStream(tpv);
                    }
                }
                //No token info stored so fall back to analyzing raw content
                if(ts==null)
                {
                    ts=getTokenStream(reader,docId,field,analyzer);
                }
                return ts;
    }

Isn't you called getTermFreqVector(docId,field) twice?

 Why not just call:

                    if(tfv instanceof TermPositionVector)
                    {
                       ts=getTokenStream((TermPositionVector) tvf);
                    } 


Max

Friday, November 5, 2004, 12:25:13 AM, you wrote:

m> Having revisited the original TokenSources code it looks like one of the 
m> optimisations I put in will fail if fields are stored with 
m> non-contiguous position info (ie the analyzer has messed with token 
m> position numbers so they overlap or have gaps like ..3,3,7,8,9,..).
m> I've now made the TokenSources code safe by default by assuming token 
m> position values are not contiguous and should not be used for sorting.
m> For those who know what they are doing  I have added a parameter to one 
m> of the methods to turn the optimisation back on if they can guarantee 
m> positions are contigous.

m> New code is at the same place:
m> http://www.inperspective.com/lucene/TokenSources.java

m> Cheers
m> Mark



m> ---------------------------------------------------------------------
m> To unsubscribe, e-mail: [EMAIL PROTECTED]
m> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re[2]: Faster highlighting with TermPositionVectors (update)

Reply via email to