It seams someone else had the same idea to "inline" very short postings into term dictionary (even for in-memory index) ans save one pointer (and seek, in disk setup)... nice reading
http://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf ----- Original Message ---- > From: Eks Dev (JIRA) <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Sunday, 20 July, 2008 1:02:31 PM > Subject: [jira] Commented: (LUCENE-1278) Add optional storing of document > numbers in term dictionary > > > [ > https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077 > > ] > > Eks Dev commented on LUCENE-1278: > --------------------------------- > > in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I > think it is worth mentioning that I am working on LUCENE-1340, that is > storing > postings without additional frq info. > > correct me if I am wrong, the only difference is that this approach with > *.frq > needs one seek more... at the same time, this could potentially increase term > dict size, so we loose some locality. > > Your your last proposal sounds interesting, "inline short postings" into > term > dict , so for short postings (about the size of offset pointer into *.frq) > with > tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340) > we > spare one seek()... this could be a lot. Also, there is no need to store > postings into *frq (this complicates maintenance I guess) > > > Add optional storing of document numbers in term dictionary > > ----------------------------------------------------------- > > > > Key: LUCENE-1278 > > URL: https://issues.apache.org/jira/browse/LUCENE-1278 > > Project: Lucene - Java > > Issue Type: New Feature > > Components: Index > > Affects Versions: 2.3.1 > > Reporter: Jason Rutherglen > > Priority: Minor > > Attachments: lucene.1278.5.4.2008.patch, > > lucene.1278.5.5.2008.2.patch, > lucene.1278.5.5.2008.patch, lucene.1278.5.7.2008.patch, > lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java > > > > > > Add optional storing of document numbers in term dictionary. String index > field cache and range filter creation will be faster. > > Example read code: > > {noformat} > > TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS); > > do { > > Term term = termEnum.term(); > > if (term == null || term.field() != field) break; > > int[] docs = termEnum.docs(); > > } while (termEnum.next()); > > {noformat} > > Example write code: > > {noformat} > > Document document = new Document(); > > document.add(new Field("tag", "dog", Field.Store.YES, > Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS)); > > indexWriter.addDocument(document); > > {noformat} > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]