You could do that, though, it's not as optimized because you'd use fewer bytes if you directly encoded the docDelta (not 2*docDelta+1), and you'd save some CPU when decoding as well. But maybe first do it this way, then if necessary/it helps/etc, explore the optimization?

Mike

eks dev wrote:

am I boring :)

would it be ok to assume tf == 1 always if we use omitTf? In that case docDelta remains odd and current index format interprets this as tf==1... if all terms have tf == 1 , relative score is factored out, so it makes no diference.


In that case, there is no need to change anything on reader side!


----- Original Message ----
From: eks dev <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Friday, 18 July, 2008 9:48:04 PM
Subject: Re: Index without tf, anyone?

also, another one:

what should happen with payloads and omitTf options in case
op
storePayloads==true && omitTf==true
shold we say:
1. ignore omitTf and go on with payloads
or
2. disable payloads  and omit tf

other combination are clear



----- Original Message ----
From: eks dev
To: java-dev@lucene.apache.org
Sent: Friday, 18 July, 2008 9:20:09 PM
Subject: Re: Index without tf, anyone?

Mike,
I have started playing with this, holly cow.... it is a lot of code

Question

SegmentMerger. mergeFields()... there is a big block

else {
       addIndexed(reader, fieldInfos,
reader .getFieldNames (IndexReader.FieldOption.TERMVECTOR_WITH_POSITION_OFFSET),

true, true, true, false);
       addIndexed(reader, fieldInfos,
reader .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION), true,
true, false, false);
       addIndexed(reader, fieldInfos,
reader .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_OFFSET), true,
false, true, false);
       addIndexed(reader, fieldInfos,
reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR), true, false, false,
false);
       addIndexed(reader, fieldInfos,
reader.getFieldNames(IndexReader.FieldOption.STORES_PAYLOADS), false, false,
false, true);
       addIndexed(reader, fieldInfos,
reader.getFieldNames(IndexReader.FieldOption.INDEXED), false, false, false,
false);

fieldInfos .add(reader.getFieldNames(IndexReader.FieldOption.UNINDEXED),
false);
     }


I simply do not understand it, have changed addIndexed(...) signature to
include
omitTf, but I am sure what needs to be done here?





----- Original Message ----
From: Michael McCandless
To: java-dev@lucene.apache.org
Sent: Friday, 18 July, 2008 11:48:20 AM
Subject: Re: Index without tf, anyone?

I just committed LUCENE-1301, which is a first step (top down) towards
flexible indexing.  I hope I didn't break anything....

While flexible indexing should make this simpler, it's not too bad to
modify Lucene to do this today, if you want.  I think this is what
you'll need to do (but I haven't tested!):

  * Add something to Fieldable/AbstractField/Field that "knows"
    whether a field should store the tf.  Also add this to
FieldInfo.java, and make sure that bit is saved to the fnm file.

  * In the new oal.index.DocFieldProcessorPerThread, in the
processDocument method, fix the FieldInfos.add call to also pass
    in your new "storeTermFreq" bit.  Probably, assert that this
    cannot change -- ie a field must be created with
    storeTermFreq=true or false and must never change.

* The new oal.index.FreqProxTermsWriter, in appendPostings, has the code that creates a new segment. Change that to skip writing tf
    if the FieldInfo says so.

  * Fix SegmentTermDocs to not read tf if FieldInfo says so.

  * Fix SegmentMerger.appendPostings to not merge/write tf if
FieldInfo says so. Likewise assert here that the "storeTermFreq"
    does not change in the merged segments.

It's also possible to fix FreqProxTermsWriterPerField to not even
compute & store the tf in its RawPostingList, per term.  This is an
optimization (saves RAM & CPU) that you can do after first getting the
above working...

On the search side, you'll need to fix scoring to be OK with tf=0.

I think this would be a useful addition to Lucene (it comes up every
so often), even before we fully work out flexible indexing.

Mike

eks dev wrote:

hi all,
is there any solution to have pure postings lists without
interleaved tf ... this eats a lot of CPU for VInt decoding on dense
terms (also doubles IO...)  in our case. Can be a untested patch,
tips how to do it or whatever... I know about flexible indexing, but
cannot wait (I guess it will take some time?).

Does it make sense to start working on it? Can be this somehow later
incorporated into Flexible Indexing... I hate to do it and than
throw it away whem Mike doe his magic with Flexible Indexing.

Simply we are sure this could help performance a lot (some dense
fields have always constant tf, no need to read them from index).
Simply asking for help if somebody accidently happens to have some
Quick 'n Dirty solution/idea.

thanks, eks



    __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses
available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



     __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at

Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



     __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



     __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to