This also reminds me of the "pulsing" technique described in:
http://citeseer.ist.psu.edu/cutting90optimizations.html
Doug
eks dev wrote:
It seams someone else had the same idea to "inline" very short postings into
term dictionary (even for in-memory index) ans save one pointer (and seek, in disk
setup)... nice reading
http://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf
----- Original Message ----
From: Eks Dev (JIRA) <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Sunday, 20 July, 2008 1:02:31 PM
Subject: [jira] Commented: (LUCENE-1278) Add optional storing of document
numbers in term dictionary
[
https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077
]
Eks Dev commented on LUCENE-1278:
---------------------------------
in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I
think it is worth mentioning that I am working on LUCENE-1340, that is storing
postings without additional frq info.
correct me if I am wrong, the only difference is that this approach with *.frq
needs one seek more... at the same time, this could potentially increase term
dict size, so we loose some locality.
Your your last proposal sounds interesting, "inline short postings" into term
dict , so for short postings (about the size of offset pointer into *.frq) with
tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340) we
spare one seek()... this could be a lot. Also, there is no need to store
postings into *frq (this complicates maintenance I guess)
Add optional storing of document numbers in term dictionary
-----------------------------------------------------------
Key: LUCENE-1278
URL: https://issues.apache.org/jira/browse/LUCENE-1278
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Affects Versions: 2.3.1
Reporter: Jason Rutherglen
Priority: Minor
Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch,
lucene.1278.5.5.2008.patch, lucene.1278.5.7.2008.patch,
lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java
Add optional storing of document numbers in term dictionary. String index
field cache and range filter creation will be faster.
Example read code:
{noformat}
TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
do {
Term term = termEnum.term();
if (term == null || term.field() != field) break;
int[] docs = termEnum.docs();
} while (termEnum.next());
{noformat}
Example write code:
{noformat}
Document document = new Document();
document.add(new Field("tag", "dog", Field.Store.YES,
Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
indexWriter.addDocument(document);
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at
Yahoo! http://uk.docs.yahoo.com/ymail/new.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]