Re: [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

Doug Cutting Mon, 21 Jul 2008 15:18:57 -0700

This also reminds me of the "pulsing" technique described in:


http://citeseer.ist.psu.edu/cutting90optimizations.html

Doug

eks dev wrote:

It seams someone else had the same idea to "inline" very short postings into 
term dictionary (even for in-memory index) ans save one pointer (and seek, in disk 
setup)... nice reading

http://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf




----- Original Message ----
From: Eks Dev (JIRA) <[EMAIL PROTECTED]>
To: [email protected]
Sent: Sunday, 20 July, 2008 1:02:31 PM
Subject: [jira] Commented: (LUCENE-1278) Add optional storing of document 
numbers in term dictionary
[https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077]
Eks Dev commented on LUCENE-1278:
---------------------------------
in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), Ithink it is worth mentioning that I am working on LUCENE-1340, that is storingpostings without additional frq info.correct me if I am wrong, the only difference is that this approach with *.frqneeds one seek more... at the same time, this could potentially increase termdict size, so we loose some locality.
Your your last proposal sounds interesting, "inline short postings" into termdict , so for short postings (about the size of offset pointer into *.frq) withtf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340) wespare one seek()... this could be a lot. Also, there is no need to storepostings into *frq (this complicates maintenance I guess)
Add optional storing of document numbers in term dictionary
-----------------------------------------------------------

                Key: LUCENE-1278
                URL: https://issues.apache.org/jira/browse/LUCENE-1278
            Project: Lucene - Java
         Issue Type: New Feature
         Components: Index
   Affects Versions: 2.3.1
           Reporter: Jason Rutherglen
           Priority: Minor
Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch,
lucene.1278.5.5.2008.patch, lucene.1278.5.7.2008.patch,lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java
Add optional storing of document numbers in term dictionary. String index
field cache and range filter creation will be faster.
Example read code:
{noformat}
TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
do {
  Term term = termEnum.term();
  if (term == null || term.field() != field) break;
  int[] docs = termEnum.docs();
} while (termEnum.next());
{noformat}
Example write code:
{noformat}
Document document = new Document();
document.add(new Field("tag", "dog", Field.Store.YES,
Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
indexWriter.addDocument(document);
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

Reply via email to