I just committed LUCENE-1301, which is a first step (top down) towards
flexible indexing.  I hope I didn't break anything....

While flexible indexing should make this simpler, it's not too bad to
modify Lucene to do this today, if you want.  I think this is what
you'll need to do (but I haven't tested!):

  * Add something to Fieldable/AbstractField/Field that "knows"
    whether a field should store the tf.  Also add this to
    FieldInfo.java, and make sure that bit is saved to the fnm file.

  * In the new oal.index.DocFieldProcessorPerThread, in the
    processDocument method, fix the FieldInfos.add call to also pass
    in your new "storeTermFreq" bit.  Probably, assert that this
    cannot change -- ie a field must be created with
    storeTermFreq=true or false and must never change.

  * The new oal.index.FreqProxTermsWriter, in appendPostings, has the
    code that creates a new segment.  Change that to skip writing tf
    if the FieldInfo says so.

  * Fix SegmentTermDocs to not read tf if FieldInfo says so.

  * Fix SegmentMerger.appendPostings to not merge/write tf if
    FieldInfo says so.  Likewise assert here that the "storeTermFreq"
    does not change in the merged segments.

It's also possible to fix FreqProxTermsWriterPerField to not even
compute & store the tf in its RawPostingList, per term.  This is an
optimization (saves RAM & CPU) that you can do after first getting the
above working...

On the search side, you'll need to fix scoring to be OK with tf=0.

I think this would be a useful addition to Lucene (it comes up every
so often), even before we fully work out flexible indexing.

Mike

eks dev wrote:

hi all,
is there any solution to have pure postings lists without interleaved tf ... this eats a lot of CPU for VInt decoding on dense terms (also doubles IO...) in our case. Can be a untested patch, tips how to do it or whatever... I know about flexible indexing, but cannot wait (I guess it will take some time?).

Does it make sense to start working on it? Can be this somehow later incorporated into Flexible Indexing... I hate to do it and than throw it away whem Mike doe his magic with Flexible Indexing.

Simply we are sure this could help performance a lot (some dense fields have always constant tf, no need to read them from index). Simply asking for help if somebody accidently happens to have some Quick 'n Dirty solution/idea.

thanks, eks



     __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to