[jira] [Commented] (LUCENE-4694) Add back IndexReader.fields() -> Multi*, or discourage term vectors in some better way

Robert Muir (JIRA) Sun, 10 Mar 2013 00:31:16 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598189#comment-13598189
 ]


Robert Muir commented on LUCENE-4694:
-------------------------------------

{quote}
I personally think it's ok if IndexReader lets you get docsValues(doc), 
document(doc), getTV(doc) and termDocsEnum(term). There's nothing inefficient 
about supporting them, as far as I can see.
{quote}

this is not correct at all. 

for the sorted types we need to iterate through all of the values and create a 
datastructure mapping per-segment ordinals to global ones, and also cache this 
somewhere. 

additionally, all docvalues types and norms on a composite reader would pay the 
cost of binary-search for *each* docid access: and due to the way they are 
used, typically many docids are accessed.

stored fields are used for summary results, so on a 100 million doc index who 
cares if you do 10 or 20 binary searches: who cares.

term vectors are used for highlighting summary results, MoreLikeThis, etc: both 
of which are small top-N just like the stored fields case. so its also fine.

but docvalues is used in scoring and sorting, so this would be 100 million 
binary searches. its a big damn difference.

the postings is pretty much just an additional check per document, so its a 
little more up in the air what to do. but as mentioned in the description, 
users look at IndexReader.java and the only postings api they see is term 
vectors.

                
> Add back IndexReader.fields() -> Multi*, or discourage term vectors in some 
> better way
> --------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4694
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4694
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-4694.patch
>
>
> Users can easily get term vectors from any indexreader, but not postings 
> lists. this encourages them to do really slow things: like pulling term 
> vectors for every single document.
> this is really really so much worse than going through multifields or 
> whatever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4694) Add back IndexReader.fields() -> Multi*, or discourage term vectors in some better way

Reply via email to