am I boring :) would it be ok to assume tf == 1 always if we use omitTf? In that case docDelta remains odd and current index format interprets this as tf==1... if all terms have tf == 1 , relative score is factored out, so it makes no diference.
In that case, there is no need to change anything on reader side! ----- Original Message ---- > From: eks dev <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Friday, 18 July, 2008 9:48:04 PM > Subject: Re: Index without tf, anyone? > > also, another one: > > what should happen with payloads and omitTf options in case > op > storePayloads==true && omitTf==true > shold we say: > 1. ignore omitTf and go on with payloads > or > 2. disable payloads and omit tf > > other combination are clear > > > > ----- Original Message ---- > > From: eks dev > > To: java-dev@lucene.apache.org > > Sent: Friday, 18 July, 2008 9:20:09 PM > > Subject: Re: Index without tf, anyone? > > > > Mike, > > I have started playing with this, holly cow.... it is a lot of code > > > > Question > > > > SegmentMerger. mergeFields()... there is a big block > > > > else { > > addIndexed(reader, fieldInfos, > > reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION_OFFSET), > > > > > true, true, true, false); > > addIndexed(reader, fieldInfos, > > reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION), > > true, > > true, false, false); > > addIndexed(reader, fieldInfos, > > reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_OFFSET), true, > > false, true, false); > > addIndexed(reader, fieldInfos, > > reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR), true, false, > > false, > > false); > > addIndexed(reader, fieldInfos, > > reader.getFieldNames(IndexReader.FieldOption.STORES_PAYLOADS), false, > > false, > > false, true); > > addIndexed(reader, fieldInfos, > > reader.getFieldNames(IndexReader.FieldOption.INDEXED), false, false, false, > > false); > > > fieldInfos.add(reader.getFieldNames(IndexReader.FieldOption.UNINDEXED), > > false); > > } > > > > > > I simply do not understand it, have changed addIndexed(...) signature to > include > > omitTf, but I am sure what needs to be done here? > > > > > > > > > > > > ----- Original Message ---- > > > From: Michael McCandless > > > To: java-dev@lucene.apache.org > > > Sent: Friday, 18 July, 2008 11:48:20 AM > > > Subject: Re: Index without tf, anyone? > > > > > > I just committed LUCENE-1301, which is a first step (top down) towards > > > flexible indexing. I hope I didn't break anything.... > > > > > > While flexible indexing should make this simpler, it's not too bad to > > > modify Lucene to do this today, if you want. I think this is what > > > you'll need to do (but I haven't tested!): > > > > > > * Add something to Fieldable/AbstractField/Field that "knows" > > > whether a field should store the tf. Also add this to > > > FieldInfo.java, and make sure that bit is saved to the fnm file. > > > > > > * In the new oal.index.DocFieldProcessorPerThread, in the > > > processDocument method, fix the FieldInfos.add call to also pass > > > in your new "storeTermFreq" bit. Probably, assert that this > > > cannot change -- ie a field must be created with > > > storeTermFreq=true or false and must never change. > > > > > > * The new oal.index.FreqProxTermsWriter, in appendPostings, has the > > > code that creates a new segment. Change that to skip writing tf > > > if the FieldInfo says so. > > > > > > * Fix SegmentTermDocs to not read tf if FieldInfo says so. > > > > > > * Fix SegmentMerger.appendPostings to not merge/write tf if > > > FieldInfo says so. Likewise assert here that the "storeTermFreq" > > > does not change in the merged segments. > > > > > > It's also possible to fix FreqProxTermsWriterPerField to not even > > > compute & store the tf in its RawPostingList, per term. This is an > > > optimization (saves RAM & CPU) that you can do after first getting the > > > above working... > > > > > > On the search side, you'll need to fix scoring to be OK with tf=0. > > > > > > I think this would be a useful addition to Lucene (it comes up every > > > so often), even before we fully work out flexible indexing. > > > > > > Mike > > > > > > eks dev wrote: > > > > > > > hi all, > > > > is there any solution to have pure postings lists without > > > > interleaved tf ... this eats a lot of CPU for VInt decoding on dense > > > > terms (also doubles IO...) in our case. Can be a untested patch, > > > > tips how to do it or whatever... I know about flexible indexing, but > > > > cannot wait (I guess it will take some time?). > > > > > > > > Does it make sense to start working on it? Can be this somehow later > > > > incorporated into Flexible Indexing... I hate to do it and than > > > > throw it away whem Mike doe his magic with Flexible Indexing. > > > > > > > > Simply we are sure this could help performance a lot (some dense > > > > fields have always constant tf, no need to read them from index). > > > > Simply asking for help if somebody accidently happens to have some > > > > Quick 'n Dirty solution/idea. > > > > > > > > thanks, eks > > > > > > > > > > > > > > > > __________________________________________________________ > > > > Not happy with your email address?. > > > > Get the one you really want - millions of new email addresses > > > > available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > __________________________________________________________ > > Not happy with your email address?. > > Get the one you really want - millions of new email addresses available now > > at > > > Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > __________________________________________________________ > Not happy with your email address?. > Get the one you really want - millions of new email addresses available now > at > Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]