[jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

Paul Elschot (JIRA) Wed, 14 May 2008 06:46:22 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596762#action_12596762
 ]


Paul Elschot commented on LUCENE-1278:
--------------------------------------

Some comments on the 5.7.2008 patch:

The test with 7.6 times speedup for very few docs per term makes me wonder why 
this never showed up as a performance problem before. It certainly shows an 
advantage of flexible indexing for the case in which the within document term 
frequencies are not needed (for example primary/foreign keys, which normally 
end up in a keyword field.)

In the patch, DocIdSetIterator is used in the org.apache.lucene.index package, 
so it would be a good idea to move it from o.a.l.search to o.a.l.index or to 
o.a.l.util to avoid a circular dependency involving the index and search 
packages. As DocIdSetIterator is not yet released, this move should be no 
problem.

The DocIdSetReader class in the patch has so much code in common with 
SortedVIntList that it might be better to merge the two into a single one, and 
try and refactor common code into new methods there.
That would also be an easy way to get rid of the unsupported skipTo() operation.



> Add optional storing of document numbers in term dictionary
> -----------------------------------------------------------
>
>                 Key: LUCENE-1278
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1278
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: lucene.1278.5.4.2008.patch, 
> lucene.1278.5.5.2008.2.patch, lucene.1278.5.5.2008.patch, 
> lucene.1278.5.7.2008.patch, lucene.1278.5.7.2008.test.patch, 
> TestTermEnumDocs.java
>
>
> Add optional storing of document numbers in term dictionary.  String index 
> field cache and range filter creation will be faster.  
> Example read code:
> {noformat}
> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> do {
>   Term term = termEnum.term();
>   if (term == null || term.field() != field) break;
>   int[] docs = termEnum.docs();
> } while (termEnum.next());
> {noformat}
> Example write code:
> {noformat}
> Document document = new Document();
> document.add(new Field("tag", "dog", Field.Store.YES, 
> Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> indexWriter.addDocument(document);
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

Reply via email to