It seams someone else had the same idea to "inline" very short postings into 
term dictionary (even for in-memory index) ans save one pointer (and seek, in 
disk setup)... nice reading

http://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf




----- Original Message ----
> From: Eks Dev (JIRA) <[EMAIL PROTECTED]>
> To: java-dev@lucene.apache.org
> Sent: Sunday, 20 July, 2008 1:02:31 PM
> Subject: [jira] Commented: (LUCENE-1278) Add optional storing of document 
> numbers in term dictionary
> 
> 
>     [ 
> https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077
>  
> ] 
> 
> Eks Dev commented on LUCENE-1278:
> ---------------------------------
> 
> in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I 
> think it is worth mentioning that I am working on LUCENE-1340, that is 
> storing 
> postings without additional frq info. 
> 
> correct me if I am wrong, the only difference is that this approach with 
> *.frq 
> needs one seek more... at the same time, this could potentially increase term 
> dict size, so we loose some locality.
> 
> Your your last proposal sounds interesting,  "inline short postings" into 
> term 
> dict , so for short postings (about the size of offset pointer into *.frq) 
> with 
> tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340)  
> we 
> spare one seek()... this could be a lot. Also, there is no need to store 
> postings into *frq  (this complicates maintenance I guess)  
> 
> > Add optional storing of document numbers in term dictionary
> > -----------------------------------------------------------
> >
> >                 Key: LUCENE-1278
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1278
> >             Project: Lucene - Java
> >          Issue Type: New Feature
> >          Components: Index
> >    Affects Versions: 2.3.1
> >            Reporter: Jason Rutherglen
> >            Priority: Minor
> >         Attachments: lucene.1278.5.4.2008.patch, 
> > lucene.1278.5.5.2008.2.patch, 
> lucene.1278.5.5.2008.patch, lucene.1278.5.7.2008.patch, 
> lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java
> >
> >
> > Add optional storing of document numbers in term dictionary.  String index 
> field cache and range filter creation will be faster.  
> > Example read code:
> > {noformat}
> > TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> > do {
> >   Term term = termEnum.term();
> >   if (term == null || term.field() != field) break;
> >   int[] docs = termEnum.docs();
> > } while (termEnum.next());
> > {noformat}
> > Example write code:
> > {noformat}
> > Document document = new Document();
> > document.add(new Field("tag", "dog", Field.Store.YES, 
> Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> > indexWriter.addDocument(document);
> > {noformat}
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to