Wow you've jumped right in -- yay!
Yeah, that else clause is spooky. It's only used when merging an
IndexReader that is not a SegmentReader. I think, like was done for
payloads, you should add another FieldOption to select for those
fields that do not store TF and add an addIndexed call for that?
Mike
eks dev wrote:
Mike,
I have started playing with this, holly cow.... it is a lot of code
Question
SegmentMerger. mergeFields()... there is a big block
else {
addIndexed(reader, fieldInfos,
reader
.getFieldNames
(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION_OFFSET), true,
true, true, false);
addIndexed(reader, fieldInfos,
reader
.getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION),
true, true, false, false);
addIndexed(reader, fieldInfos,
reader
.getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_OFFSET),
true, false, true, false);
addIndexed(reader, fieldInfos,
reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR), true,
false, false, false);
addIndexed(reader, fieldInfos,
reader.getFieldNames(IndexReader.FieldOption.STORES_PAYLOADS),
false, false, false, true);
addIndexed(reader, fieldInfos,
reader.getFieldNames(IndexReader.FieldOption.INDEXED), false, false,
false, false);
fieldInfos
.add(reader.getFieldNames(IndexReader.FieldOption.UNINDEXED), false);
}
I simply do not understand it, have changed addIndexed(...)
signature to include omitTf, but I am sure what needs to be done here?
----- Original Message ----
From: Michael McCandless <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Friday, 18 July, 2008 11:48:20 AM
Subject: Re: Index without tf, anyone?
I just committed LUCENE-1301, which is a first step (top down)
towards
flexible indexing. I hope I didn't break anything....
While flexible indexing should make this simpler, it's not too bad to
modify Lucene to do this today, if you want. I think this is what
you'll need to do (but I haven't tested!):
* Add something to Fieldable/AbstractField/Field that "knows"
whether a field should store the tf. Also add this to
FieldInfo.java, and make sure that bit is saved to the fnm file.
* In the new oal.index.DocFieldProcessorPerThread, in the
processDocument method, fix the FieldInfos.add call to also pass
in your new "storeTermFreq" bit. Probably, assert that this
cannot change -- ie a field must be created with
storeTermFreq=true or false and must never change.
* The new oal.index.FreqProxTermsWriter, in appendPostings, has the
code that creates a new segment. Change that to skip writing tf
if the FieldInfo says so.
* Fix SegmentTermDocs to not read tf if FieldInfo says so.
* Fix SegmentMerger.appendPostings to not merge/write tf if
FieldInfo says so. Likewise assert here that the "storeTermFreq"
does not change in the merged segments.
It's also possible to fix FreqProxTermsWriterPerField to not even
compute & store the tf in its RawPostingList, per term. This is an
optimization (saves RAM & CPU) that you can do after first getting
the
above working...
On the search side, you'll need to fix scoring to be OK with tf=0.
I think this would be a useful addition to Lucene (it comes up every
so often), even before we fully work out flexible indexing.
Mike
eks dev wrote:
hi all,
is there any solution to have pure postings lists without
interleaved tf ... this eats a lot of CPU for VInt decoding on dense
terms (also doubles IO...) in our case. Can be a untested patch,
tips how to do it or whatever... I know about flexible indexing, but
cannot wait (I guess it will take some time?).
Does it make sense to start working on it? Can be this somehow later
incorporated into Flexible Indexing... I hate to do it and than
throw it away whem Mike doe his magic with Flexible Indexing.
Simply we are sure this could help performance a lot (some dense
fields have always constant tf, no need to read them from index).
Simply asking for help if somebody accidently happens to have some
Quick 'n Dirty solution/idea.
thanks, eks
__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses
available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses
available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]