Hello Mark. I'm just wondered about the following piece of code from your latest TokenSources class:
public static TokenStream getAnyTokenStream(IndexReader reader,int docId, String field,Analyzer analyzer) throws IOException { TokenStream ts=null; TermFreqVector tfv=(TermFreqVector) reader.getTermFreqVector(docId,field); if(tfv!=null) { if(tfv instanceof TermPositionVector) { //read pre-parsed token position info stored on disk TermPositionVector tpv=(TermPositionVector) reader.getTermFreqVector(docId,field); ts=getTokenStream(tpv); } } //No token info stored so fall back to analyzing raw content if(ts==null) { ts=getTokenStream(reader,docId,field,analyzer); } return ts; } Isn't you called getTermFreqVector(docId,field) twice? Why not just call: if(tfv instanceof TermPositionVector) { ts=getTokenStream((TermPositionVector) tvf); } Max Friday, November 5, 2004, 12:25:13 AM, you wrote: m> Having revisited the original TokenSources code it looks like one of the m> optimisations I put in will fail if fields are stored with m> non-contiguous position info (ie the analyzer has messed with token m> position numbers so they overlap or have gaps like ..3,3,7,8,9,..). m> I've now made the TokenSources code safe by default by assuming token m> position values are not contiguous and should not be used for sorting. m> For those who know what they are doing I have added a parameter to one m> of the methods to turn the optimisation back on if they can guarantee m> positions are contigous. m> New code is at the same place: m> http://www.inperspective.com/lucene/TokenSources.java m> Cheers m> Mark m> --------------------------------------------------------------------- m> To unsubscribe, e-mail: [EMAIL PROTECTED] m> For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]