You could do that, though, it's not as optimized because you'd use fewer bytes if you directly encoded the docDelta (not 2*docDelta+1), and you'd save some CPU when decoding as well. But maybe first do it this way, then if necessary/it helps/etc, explore the optimization?
Mike eks dev wrote:
am I boring :)would it be ok to assume tf == 1 always if we use omitTf? In that case docDelta remains odd and current index format interprets this as tf==1... if all terms have tf == 1 , relative score is factored out, so it makes no diference.In that case, there is no need to change anything on reader side! ----- Original Message ----From: eks dev <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Friday, 18 July, 2008 9:48:04 PM Subject: Re: Index without tf, anyone? also, another one: what should happen with payloads and omitTf options in case op storePayloads==true && omitTf==true shold we say: 1. ignore omitTf and go on with payloads or 2. disable payloads and omit tf other combination are clear ----- Original Message ----From: eks dev To: java-dev@lucene.apache.org Sent: Friday, 18 July, 2008 9:20:09 PM Subject: Re: Index without tf, anyone? Mike, I have started playing with this, holly cow.... it is a lot of code Question SegmentMerger. mergeFields()... there is a big block else { addIndexed(reader, fieldInfos,reader .getFieldNames (IndexReader.FieldOption.TERMVECTOR_WITH_POSITION_OFFSET),fieldInfos .add(reader.getFieldNames(IndexReader.FieldOption.UNINDEXED),true, true, true, false); addIndexed(reader, fieldInfos,reader .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_POSITION), true,true, false, false); addIndexed(reader, fieldInfos,reader .getFieldNames(IndexReader.FieldOption.TERMVECTOR_WITH_OFFSET), true,false, true, false); addIndexed(reader, fieldInfos,reader.getFieldNames(IndexReader.FieldOption.TERMVECTOR), true, false, false,false); addIndexed(reader, fieldInfos,reader.getFieldNames(IndexReader.FieldOption.STORES_PAYLOADS), false, false,false, true); addIndexed(reader, fieldInfos,reader.getFieldNames(IndexReader.FieldOption.INDEXED), false, false, false,false);false); }I simply do not understand it, have changed addIndexed(...) signature toincludeomitTf, but I am sure what needs to be done here? ----- Original Message ----From: Michael McCandless To: java-dev@lucene.apache.org Sent: Friday, 18 July, 2008 11:48:20 AM Subject: Re: Index without tf, anyone?I just committed LUCENE-1301, which is a first step (top down) towardsflexible indexing. I hope I didn't break anything....While flexible indexing should make this simpler, it's not too bad tomodify Lucene to do this today, if you want. I think this is what you'll need to do (but I haven't tested!): * Add something to Fieldable/AbstractField/Field that "knows" whether a field should store the tf. Also add this toFieldInfo.java, and make sure that bit is saved to the fnm file.* In the new oal.index.DocFieldProcessorPerThread, in theprocessDocument method, fix the FieldInfos.add call to also passin your new "storeTermFreq" bit. Probably, assert that this cannot change -- ie a field must be created with storeTermFreq=true or false and must never change.* The new oal.index.FreqProxTermsWriter, in appendPostings, has the code that creates a new segment. Change that to skip writing tfif the FieldInfo says so. * Fix SegmentTermDocs to not read tf if FieldInfo says so. * Fix SegmentMerger.appendPostings to not merge/write tf ifFieldInfo says so. Likewise assert here that the "storeTermFreq"does not change in the merged segments. It's also possible to fix FreqProxTermsWriterPerField to not even compute & store the tf in its RawPostingList, per term. This is anoptimization (saves RAM & CPU) that you can do after first getting theabove working... On the search side, you'll need to fix scoring to be OK with tf=0.I think this would be a useful addition to Lucene (it comes up everyso often), even before we fully work out flexible indexing. Mike eks dev wrote:hi all, is there any solution to have pure postings lists withoutinterleaved tf ... this eats a lot of CPU for VInt decoding on denseterms (also doubles IO...) in our case. Can be a untested patch,tips how to do it or whatever... I know about flexible indexing, butcannot wait (I guess it will take some time?).Does it make sense to start working on it? Can be this somehow laterincorporated into Flexible Indexing... I hate to do it and than throw it away whem Mike doe his magic with Flexible Indexing. Simply we are sure this could help performance a lot (some dense fields have always constant tf, no need to read them from index). Simply asking for help if somebody accidently happens to have some Quick 'n Dirty solution/idea. thanks, eks __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]__________________________________________________________ Not happy with your email address?.Get the one you really want - millions of new email addresses available now atYahoo! http://uk.docs.yahoo.com/ymail/new.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]__________________________________________________________ Not happy with your email address?.Get the one you really want - millions of new email addresses available now atYahoo! http://uk.docs.yahoo.com/ymail/new.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]__________________________________________________________ Not happy with your email address?.Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]