This also reminds me of the "pulsing" technique described in:

http://citeseer.ist.psu.edu/cutting90optimizations.html

Doug

eks dev wrote:
It seams someone else had the same idea to "inline" very short postings into 
term dictionary (even for in-memory index) ans save one pointer (and seek, in disk 
setup)... nice reading

http://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf




----- Original Message ----
From: Eks Dev (JIRA) <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Sunday, 20 July, 2008 1:02:31 PM
Subject: [jira] Commented: (LUCENE-1278) Add optional storing of document 
numbers in term dictionary


[ https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077 ]
Eks Dev commented on LUCENE-1278:
---------------------------------

in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I think it is worth mentioning that I am working on LUCENE-1340, that is storing postings without additional frq info. correct me if I am wrong, the only difference is that this approach with *.frq needs one seek more... at the same time, this could potentially increase term dict size, so we loose some locality.

Your your last proposal sounds interesting, "inline short postings" into term dict , so for short postings (about the size of offset pointer into *.frq) with tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340) we spare one seek()... this could be a lot. Also, there is no need to store postings into *frq (this complicates maintenance I guess)
Add optional storing of document numbers in term dictionary
-----------------------------------------------------------

                Key: LUCENE-1278
                URL: https://issues.apache.org/jira/browse/LUCENE-1278
            Project: Lucene - Java
         Issue Type: New Feature
         Components: Index
   Affects Versions: 2.3.1
           Reporter: Jason Rutherglen
           Priority: Minor
Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch,
lucene.1278.5.5.2008.patch, lucene.1278.5.7.2008.patch, lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java

Add optional storing of document numbers in term dictionary. String index
field cache and range filter creation will be faster.
Example read code:
{noformat}
TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
do {
  Term term = termEnum.term();
  if (term == null || term.field() != field) break;
  int[] docs = termEnum.docs();
} while (termEnum.next());
{noformat}
Example write code:
{noformat}
Document document = new Document();
document.add(new Field("tag", "dog", Field.Store.YES,
Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
indexWriter.addDocument(document);
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to