for now I will ignore Payloads, it is simpler to get some working code this way and is not worse nor better than the other option (anyhow this mambo jumbo with options will have to be cleaned up for flexible Ixing, or we will have problem to keep it under control)
----- Original Message ---- > From: Michael McCandless <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Friday, 18 July, 2008 10:17:29 PM > Subject: Re: Index without tf, anyone? > > > Hmm -- maybe ignore payloads? > > I was going to say "maybe throw an exception", but, I can imagine > you'd want to index a TokenStream once with a field that's storing tf, > positions & payloads, and then again as an field that doesn't. > > Mike > > eks dev wrote: > > > also, another one: > > > > what should happen with payloads and omitTf options in case > > op > > storePayloads==true && omitTf==true > > shold we say: > > 1. ignore omitTf and go on with payloads > > or > > 2. disable payloads and omit tf > > > > other combination are clear > > > > > > > > ----- Original Message ---- > >> From: eks dev > >> To: java-dev@lucene.apache.org > >> Sent: Friday, 18 July, 2008 9:20:09 PM > >> Subject: Re: Index without tf, anyone? > >> > >> Mike, > >> I have started playing with this, holly cow.... it is a lot of code > >> > >> Question > >> > >> SegmentMerger. mergeFields()... there is a big block > >> > >> else { > >> addIndexed(reader, fieldInfos, > >> reader > >> .getFieldNames > >> (IndexReader.FieldOption.TERMVECTOR_WITH_POSITION_OFFSET), > >> true, true, true, false); > >> addIndexed(reader, fieldInfos, > >> reader > >> .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION), > >> true, > >> true, false, false); > >> addIndexed(reader, fieldInfos, > >> reader > >> .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_OFFSET), true, > >> false, true, false); > >> addIndexed(reader, fieldInfos, > >> reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR), true, > >> false, false, > >> false); > >> addIndexed(reader, fieldInfos, > >> reader.getFieldNames(IndexReader.FieldOption.STORES_PAYLOADS), > >> false, false, > >> false, true); > >> addIndexed(reader, fieldInfos, > >> reader.getFieldNames(IndexReader.FieldOption.INDEXED), false, > >> false, false, > >> false); > >> > >> fieldInfos > >> .add(reader.getFieldNames(IndexReader.FieldOption.UNINDEXED), > >> false); > >> } > >> > >> > >> I simply do not understand it, have changed addIndexed(...) > >> signature to include > >> omitTf, but I am sure what needs to be done here? > >> > >> > >> > >> > >> > >> ----- Original Message ---- > >>> From: Michael McCandless > >>> To: java-dev@lucene.apache.org > >>> Sent: Friday, 18 July, 2008 11:48:20 AM > >>> Subject: Re: Index without tf, anyone? > >>> > >>> I just committed LUCENE-1301, which is a first step (top down) > >>> towards > >>> flexible indexing. I hope I didn't break anything.... > >>> > >>> While flexible indexing should make this simpler, it's not too bad > >>> to > >>> modify Lucene to do this today, if you want. I think this is what > >>> you'll need to do (but I haven't tested!): > >>> > >>> * Add something to Fieldable/AbstractField/Field that "knows" > >>> whether a field should store the tf. Also add this to > >>> FieldInfo.java, and make sure that bit is saved to the fnm file. > >>> > >>> * In the new oal.index.DocFieldProcessorPerThread, in the > >>> processDocument method, fix the FieldInfos.add call to also pass > >>> in your new "storeTermFreq" bit. Probably, assert that this > >>> cannot change -- ie a field must be created with > >>> storeTermFreq=true or false and must never change. > >>> > >>> * The new oal.index.FreqProxTermsWriter, in appendPostings, has > >>> the > >>> code that creates a new segment. Change that to skip writing tf > >>> if the FieldInfo says so. > >>> > >>> * Fix SegmentTermDocs to not read tf if FieldInfo says so. > >>> > >>> * Fix SegmentMerger.appendPostings to not merge/write tf if > >>> FieldInfo says so. Likewise assert here that the > >>> "storeTermFreq" > >>> does not change in the merged segments. > >>> > >>> It's also possible to fix FreqProxTermsWriterPerField to not even > >>> compute & store the tf in its RawPostingList, per term. This is an > >>> optimization (saves RAM & CPU) that you can do after first getting > >>> the > >>> above working... > >>> > >>> On the search side, you'll need to fix scoring to be OK with tf=0. > >>> > >>> I think this would be a useful addition to Lucene (it comes up every > >>> so often), even before we fully work out flexible indexing. > >>> > >>> Mike > >>> > >>> eks dev wrote: > >>> > >>>> hi all, > >>>> is there any solution to have pure postings lists without > >>>> interleaved tf ... this eats a lot of CPU for VInt decoding on > >>>> dense > >>>> terms (also doubles IO...) in our case. Can be a untested patch, > >>>> tips how to do it or whatever... I know about flexible indexing, > >>>> but > >>>> cannot wait (I guess it will take some time?). > >>>> > >>>> Does it make sense to start working on it? Can be this somehow > >>>> later > >>>> incorporated into Flexible Indexing... I hate to do it and than > >>>> throw it away whem Mike doe his magic with Flexible Indexing. > >>>> > >>>> Simply we are sure this could help performance a lot (some dense > >>>> fields have always constant tf, no need to read them from index). > >>>> Simply asking for help if somebody accidently happens to have some > >>>> Quick 'n Dirty solution/idea. > >>>> > >>>> thanks, eks > >>>> > >>>> > >>>> > >>>> __________________________________________________________ > >>>> Not happy with your email address?. > >>>> Get the one you really want - millions of new email addresses > >>>> available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>> For additional commands, e-mail: [EMAIL PROTECTED] > >>>> > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > >> > >> __________________________________________________________ > >> Not happy with your email address?. > >> Get the one you really want - millions of new email addresses > >> available now at > >> Yahoo! http://uk.docs.yahoo.com/ymail/new.html > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > __________________________________________________________ > > Not happy with your email address?. > > Get the one you really want - millions of new email addresses > > available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]