also, another one: what should happen with payloads and omitTf options in case op storePayloads==true && omitTf==true shold we say: 1. ignore omitTf and go on with payloads or 2. disable payloads and omit tf
other combination are clear ----- Original Message ---- > From: eks dev <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Friday, 18 July, 2008 9:20:09 PM > Subject: Re: Index without tf, anyone? > > Mike, > I have started playing with this, holly cow.... it is a lot of code > > Question > > SegmentMerger. mergeFields()... there is a big block > > else { > addIndexed(reader, fieldInfos, > reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION_OFFSET), > > true, true, true, false); > addIndexed(reader, fieldInfos, > reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION), true, > true, false, false); > addIndexed(reader, fieldInfos, > reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_OFFSET), true, > false, true, false); > addIndexed(reader, fieldInfos, > reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR), true, false, false, > false); > addIndexed(reader, fieldInfos, > reader.getFieldNames(IndexReader.FieldOption.STORES_PAYLOADS), false, false, > false, true); > addIndexed(reader, fieldInfos, > reader.getFieldNames(IndexReader.FieldOption.INDEXED), false, false, false, > false); > > fieldInfos.add(reader.getFieldNames(IndexReader.FieldOption.UNINDEXED), > false); > } > > > I simply do not understand it, have changed addIndexed(...) signature to > include > omitTf, but I am sure what needs to be done here? > > > > > > ----- Original Message ---- > > From: Michael McCandless > > To: java-dev@lucene.apache.org > > Sent: Friday, 18 July, 2008 11:48:20 AM > > Subject: Re: Index without tf, anyone? > > > > I just committed LUCENE-1301, which is a first step (top down) towards > > flexible indexing. I hope I didn't break anything.... > > > > While flexible indexing should make this simpler, it's not too bad to > > modify Lucene to do this today, if you want. I think this is what > > you'll need to do (but I haven't tested!): > > > > * Add something to Fieldable/AbstractField/Field that "knows" > > whether a field should store the tf. Also add this to > > FieldInfo.java, and make sure that bit is saved to the fnm file. > > > > * In the new oal.index.DocFieldProcessorPerThread, in the > > processDocument method, fix the FieldInfos.add call to also pass > > in your new "storeTermFreq" bit. Probably, assert that this > > cannot change -- ie a field must be created with > > storeTermFreq=true or false and must never change. > > > > * The new oal.index.FreqProxTermsWriter, in appendPostings, has the > > code that creates a new segment. Change that to skip writing tf > > if the FieldInfo says so. > > > > * Fix SegmentTermDocs to not read tf if FieldInfo says so. > > > > * Fix SegmentMerger.appendPostings to not merge/write tf if > > FieldInfo says so. Likewise assert here that the "storeTermFreq" > > does not change in the merged segments. > > > > It's also possible to fix FreqProxTermsWriterPerField to not even > > compute & store the tf in its RawPostingList, per term. This is an > > optimization (saves RAM & CPU) that you can do after first getting the > > above working... > > > > On the search side, you'll need to fix scoring to be OK with tf=0. > > > > I think this would be a useful addition to Lucene (it comes up every > > so often), even before we fully work out flexible indexing. > > > > Mike > > > > eks dev wrote: > > > > > hi all, > > > is there any solution to have pure postings lists without > > > interleaved tf ... this eats a lot of CPU for VInt decoding on dense > > > terms (also doubles IO...) in our case. Can be a untested patch, > > > tips how to do it or whatever... I know about flexible indexing, but > > > cannot wait (I guess it will take some time?). > > > > > > Does it make sense to start working on it? Can be this somehow later > > > incorporated into Flexible Indexing... I hate to do it and than > > > throw it away whem Mike doe his magic with Flexible Indexing. > > > > > > Simply we are sure this could help performance a lot (some dense > > > fields have always constant tf, no need to read them from index). > > > Simply asking for help if somebody accidently happens to have some > > > Quick 'n Dirty solution/idea. > > > > > > thanks, eks > > > > > > > > > > > > __________________________________________________________ > > > Not happy with your email address?. > > > Get the one you really want - millions of new email addresses > > > available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > __________________________________________________________ > Not happy with your email address?. > Get the one you really want - millions of new email addresses available now > at > Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]