I have created "https://issues.apache.org/jira/browse/LUCENE-1340" for this, with a patch, not properly tested, missing asserts and unit tests, but basic ant test-core passed ... released early for feedback
----- Original Message ---- > From: eks dev <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Friday, 18 July, 2008 10:40:41 PM > Subject: Re: Index without tf, anyone? > > for now I will ignore Payloads, it is simpler to get some working code this > way > and is not worse nor better than the other option (anyhow this mambo jumbo > with > options will have to be cleaned up for flexible Ixing, or we will have > problem > to keep it under control) > > > > ----- Original Message ---- > > From: Michael McCandless > > To: java-dev@lucene.apache.org > > Sent: Friday, 18 July, 2008 10:17:29 PM > > Subject: Re: Index without tf, anyone? > > > > > > Hmm -- maybe ignore payloads? > > > > I was going to say "maybe throw an exception", but, I can imagine > > you'd want to index a TokenStream once with a field that's storing tf, > > positions & payloads, and then again as an field that doesn't. > > > > Mike > > > > eks dev wrote: > > > > > also, another one: > > > > > > what should happen with payloads and omitTf options in case > > > op > > > storePayloads==true && omitTf==true > > > shold we say: > > > 1. ignore omitTf and go on with payloads > > > or > > > 2. disable payloads and omit tf > > > > > > other combination are clear > > > > > > > > > > > > ----- Original Message ---- > > >> From: eks dev > > >> To: java-dev@lucene.apache.org > > >> Sent: Friday, 18 July, 2008 9:20:09 PM > > >> Subject: Re: Index without tf, anyone? > > >> > > >> Mike, > > >> I have started playing with this, holly cow.... it is a lot of code > > >> > > >> Question > > >> > > >> SegmentMerger. mergeFields()... there is a big block > > >> > > >> else { > > >> addIndexed(reader, fieldInfos, > > >> reader > > >> .getFieldNames > > >> (IndexReader.FieldOption.TERMVECTOR_WITH_POSITION_OFFSET), > > >> true, true, true, false); > > >> addIndexed(reader, fieldInfos, > > >> reader > > >> .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION), > > >> true, > > >> true, false, false); > > >> addIndexed(reader, fieldInfos, > > >> reader > > >> .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_OFFSET), true, > > >> false, true, false); > > >> addIndexed(reader, fieldInfos, > > >> reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR), true, > > >> false, false, > > >> false); > > >> addIndexed(reader, fieldInfos, > > >> reader.getFieldNames(IndexReader.FieldOption.STORES_PAYLOADS), > > >> false, false, > > >> false, true); > > >> addIndexed(reader, fieldInfos, > > >> reader.getFieldNames(IndexReader.FieldOption.INDEXED), false, > > >> false, false, > > >> false); > > >> > > >> fieldInfos > > >> .add(reader.getFieldNames(IndexReader.FieldOption.UNINDEXED), > > >> false); > > >> } > > >> > > >> > > >> I simply do not understand it, have changed addIndexed(...) > > >> signature to include > > >> omitTf, but I am sure what needs to be done here? > > >> > > >> > > >> > > >> > > >> > > >> ----- Original Message ---- > > >>> From: Michael McCandless > > >>> To: java-dev@lucene.apache.org > > >>> Sent: Friday, 18 July, 2008 11:48:20 AM > > >>> Subject: Re: Index without tf, anyone? > > >>> > > >>> I just committed LUCENE-1301, which is a first step (top down) > > >>> towards > > >>> flexible indexing. I hope I didn't break anything.... > > >>> > > >>> While flexible indexing should make this simpler, it's not too bad > > >>> to > > >>> modify Lucene to do this today, if you want. I think this is what > > >>> you'll need to do (but I haven't tested!): > > >>> > > >>> * Add something to Fieldable/AbstractField/Field that "knows" > > >>> whether a field should store the tf. Also add this to > > >>> FieldInfo.java, and make sure that bit is saved to the fnm file. > > >>> > > >>> * In the new oal.index.DocFieldProcessorPerThread, in the > > >>> processDocument method, fix the FieldInfos.add call to also pass > > >>> in your new "storeTermFreq" bit. Probably, assert that this > > >>> cannot change -- ie a field must be created with > > >>> storeTermFreq=true or false and must never change. > > >>> > > >>> * The new oal.index.FreqProxTermsWriter, in appendPostings, has > > >>> the > > >>> code that creates a new segment. Change that to skip writing tf > > >>> if the FieldInfo says so. > > >>> > > >>> * Fix SegmentTermDocs to not read tf if FieldInfo says so. > > >>> > > >>> * Fix SegmentMerger.appendPostings to not merge/write tf if > > >>> FieldInfo says so. Likewise assert here that the > > >>> "storeTermFreq" > > >>> does not change in the merged segments. > > >>> > > >>> It's also possible to fix FreqProxTermsWriterPerField to not even > > >>> compute & store the tf in its RawPostingList, per term. This is an > > >>> optimization (saves RAM & CPU) that you can do after first getting > > >>> the > > >>> above working... > > >>> > > >>> On the search side, you'll need to fix scoring to be OK with tf=0. > > >>> > > >>> I think this would be a useful addition to Lucene (it comes up every > > >>> so often), even before we fully work out flexible indexing. > > >>> > > >>> Mike > > >>> > > >>> eks dev wrote: > > >>> > > >>>> hi all, > > >>>> is there any solution to have pure postings lists without > > >>>> interleaved tf ... this eats a lot of CPU for VInt decoding on > > >>>> dense > > >>>> terms (also doubles IO...) in our case. Can be a untested patch, > > >>>> tips how to do it or whatever... I know about flexible indexing, > > >>>> but > > >>>> cannot wait (I guess it will take some time?). > > >>>> > > >>>> Does it make sense to start working on it? Can be this somehow > > >>>> later > > >>>> incorporated into Flexible Indexing... I hate to do it and than > > >>>> throw it away whem Mike doe his magic with Flexible Indexing. > > >>>> > > >>>> Simply we are sure this could help performance a lot (some dense > > >>>> fields have always constant tf, no need to read them from index). > > >>>> Simply asking for help if somebody accidently happens to have some > > >>>> Quick 'n Dirty solution/idea. > > >>>> > > >>>> thanks, eks > > >>>> > > >>>> > > >>>> > > >>>> __________________________________________________________ > > >>>> Not happy with your email address?. > > >>>> Get the one you really want - millions of new email addresses > > >>>> available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > >>>> > > >>>> --------------------------------------------------------------------- > > >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] > > >>>> For additional commands, e-mail: [EMAIL PROTECTED] > > >>>> > > >>> > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: [EMAIL PROTECTED] > > >>> For additional commands, e-mail: [EMAIL PROTECTED] > > >> > > >> > > >> > > >> __________________________________________________________ > > >> Not happy with your email address?. > > >> Get the one you really want - millions of new email addresses > > >> available now at > > >> Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > > >> For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > __________________________________________________________ > > > Not happy with your email address?. > > > Get the one you really want - millions of new email addresses > > > available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > __________________________________________________________ > Not happy with your email address?. > Get the one you really want - millions of new email addresses available now > at > Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]