you are right here, I already changed my mind on this one, almost all terms I have are with tf = 1... would not make sense
but I will hard code tf to 1 in that case as it makes no damage and makes tf = 0 problem goes away ----- Original Message ---- > From: Michael McCandless <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Friday, 18 July, 2008 10:19:19 PM > Subject: Re: Index without tf, anyone? > > > You could do that, though, it's not as optimized because you'd use > fewer bytes if you directly encoded the docDelta (not 2*docDelta+1), > and you'd save some CPU when decoding as well. But maybe first do it > this way, then if necessary/it helps/etc, explore the optimization? > > Mike > > eks dev wrote: > > > am I boring :) > > > > would it be ok to assume tf == 1 always if we use omitTf? In that > > case docDelta remains odd and current index format interprets this > > as tf==1... if all terms have tf == 1 , relative score is factored > > out, so it makes no diference. > > > > > > In that case, there is no need to change anything on reader side! > > > > > > ----- Original Message ---- > >> From: eks dev > >> To: java-dev@lucene.apache.org > >> Sent: Friday, 18 July, 2008 9:48:04 PM > >> Subject: Re: Index without tf, anyone? > >> > >> also, another one: > >> > >> what should happen with payloads and omitTf options in case > >> op > >> storePayloads==true && omitTf==true > >> shold we say: > >> 1. ignore omitTf and go on with payloads > >> or > >> 2. disable payloads and omit tf > >> > >> other combination are clear > >> > >> > >> > >> ----- Original Message ---- > >>> From: eks dev > >>> To: java-dev@lucene.apache.org > >>> Sent: Friday, 18 July, 2008 9:20:09 PM > >>> Subject: Re: Index without tf, anyone? > >>> > >>> Mike, > >>> I have started playing with this, holly cow.... it is a lot of code > >>> > >>> Question > >>> > >>> SegmentMerger. mergeFields()... there is a big block > >>> > >>> else { > >>> addIndexed(reader, fieldInfos, > >>> reader > >>> .getFieldNames > >>> (IndexReader.FieldOption.TERMVECTOR_WITH_POSITION_OFFSET), > >> > >>> true, true, true, false); > >>> addIndexed(reader, fieldInfos, > >>> reader > >>> .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION), > >>> true, > >>> true, false, false); > >>> addIndexed(reader, fieldInfos, > >>> reader > >>> .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_OFFSET), > >>> true, > >>> false, true, false); > >>> addIndexed(reader, fieldInfos, > >>> reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR), true, > >>> false, false, > >>> false); > >>> addIndexed(reader, fieldInfos, > >>> reader.getFieldNames(IndexReader.FieldOption.STORES_PAYLOADS), > >>> false, false, > >>> false, true); > >>> addIndexed(reader, fieldInfos, > >>> reader.getFieldNames(IndexReader.FieldOption.INDEXED), false, > >>> false, false, > >>> false); > >>> > >> fieldInfos > >> .add(reader.getFieldNames(IndexReader.FieldOption.UNINDEXED), > >>> false); > >>> } > >>> > >>> > >>> I simply do not understand it, have changed addIndexed(...) > >>> signature to > >> include > >>> omitTf, but I am sure what needs to be done here? > >>> > >>> > >>> > >>> > >>> > >>> ----- Original Message ---- > >>>> From: Michael McCandless > >>>> To: java-dev@lucene.apache.org > >>>> Sent: Friday, 18 July, 2008 11:48:20 AM > >>>> Subject: Re: Index without tf, anyone? > >>>> > >>>> I just committed LUCENE-1301, which is a first step (top down) > >>>> towards > >>>> flexible indexing. I hope I didn't break anything.... > >>>> > >>>> While flexible indexing should make this simpler, it's not too > >>>> bad to > >>>> modify Lucene to do this today, if you want. I think this is what > >>>> you'll need to do (but I haven't tested!): > >>>> > >>>> * Add something to Fieldable/AbstractField/Field that "knows" > >>>> whether a field should store the tf. Also add this to > >>>> FieldInfo.java, and make sure that bit is saved to the fnm > >>>> file. > >>>> > >>>> * In the new oal.index.DocFieldProcessorPerThread, in the > >>>> processDocument method, fix the FieldInfos.add call to also > >>>> pass > >>>> in your new "storeTermFreq" bit. Probably, assert that this > >>>> cannot change -- ie a field must be created with > >>>> storeTermFreq=true or false and must never change. > >>>> > >>>> * The new oal.index.FreqProxTermsWriter, in appendPostings, has > >>>> the > >>>> code that creates a new segment. Change that to skip writing > >>>> tf > >>>> if the FieldInfo says so. > >>>> > >>>> * Fix SegmentTermDocs to not read tf if FieldInfo says so. > >>>> > >>>> * Fix SegmentMerger.appendPostings to not merge/write tf if > >>>> FieldInfo says so. Likewise assert here that the > >>>> "storeTermFreq" > >>>> does not change in the merged segments. > >>>> > >>>> It's also possible to fix FreqProxTermsWriterPerField to not even > >>>> compute & store the tf in its RawPostingList, per term. This is an > >>>> optimization (saves RAM & CPU) that you can do after first > >>>> getting the > >>>> above working... > >>>> > >>>> On the search side, you'll need to fix scoring to be OK with tf=0. > >>>> > >>>> I think this would be a useful addition to Lucene (it comes up > >>>> every > >>>> so often), even before we fully work out flexible indexing. > >>>> > >>>> Mike > >>>> > >>>> eks dev wrote: > >>>> > >>>>> hi all, > >>>>> is there any solution to have pure postings lists without > >>>>> interleaved tf ... this eats a lot of CPU for VInt decoding on > >>>>> dense > >>>>> terms (also doubles IO...) in our case. Can be a untested patch, > >>>>> tips how to do it or whatever... I know about flexible indexing, > >>>>> but > >>>>> cannot wait (I guess it will take some time?). > >>>>> > >>>>> Does it make sense to start working on it? Can be this somehow > >>>>> later > >>>>> incorporated into Flexible Indexing... I hate to do it and than > >>>>> throw it away whem Mike doe his magic with Flexible Indexing. > >>>>> > >>>>> Simply we are sure this could help performance a lot (some dense > >>>>> fields have always constant tf, no need to read them from index). > >>>>> Simply asking for help if somebody accidently happens to have some > >>>>> Quick 'n Dirty solution/idea. > >>>>> > >>>>> thanks, eks > >>>>> > >>>>> > >>>>> > >>>>> __________________________________________________________ > >>>>> Not happy with your email address?. > >>>>> Get the one you really want - millions of new email addresses > >>>>> available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>>> For additional commands, e-mail: [EMAIL PROTECTED] > >>>>> > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>> For additional commands, e-mail: [EMAIL PROTECTED] > >>> > >>> > >>> > >>> __________________________________________________________ > >>> Not happy with your email address?. > >>> Get the one you really want - millions of new email addresses > >>> available now at > >> > >>> Yahoo! http://uk.docs.yahoo.com/ymail/new.html > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > >> > >> __________________________________________________________ > >> Not happy with your email address?. > >> Get the one you really want - millions of new email addresses > >> available now at > >> Yahoo! http://uk.docs.yahoo.com/ymail/new.html > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > __________________________________________________________ > > Not happy with your email address?. > > Get the one you really want - millions of new email addresses > > available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]