[
https://issues.apache.org/jira/browse/LUCENE-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598189#comment-13598189
]
Robert Muir commented on LUCENE-4694:
-------------------------------------
{quote}
I personally think it's ok if IndexReader lets you get docsValues(doc),
document(doc), getTV(doc) and termDocsEnum(term). There's nothing inefficient
about supporting them, as far as I can see.
{quote}
this is not correct at all.
for the sorted types we need to iterate through all of the values and create a
datastructure mapping per-segment ordinals to global ones, and also cache this
somewhere.
additionally, all docvalues types and norms on a composite reader would pay the
cost of binary-search for *each* docid access: and due to the way they are
used, typically many docids are accessed.
stored fields are used for summary results, so on a 100 million doc index who
cares if you do 10 or 20 binary searches: who cares.
term vectors are used for highlighting summary results, MoreLikeThis, etc: both
of which are small top-N just like the stored fields case. so its also fine.
but docvalues is used in scoring and sorting, so this would be 100 million
binary searches. its a big damn difference.
the postings is pretty much just an additional check per document, so its a
little more up in the air what to do. but as mentioned in the description,
users look at IndexReader.java and the only postings api they see is term
vectors.
> Add back IndexReader.fields() -> Multi*, or discourage term vectors in some
> better way
> --------------------------------------------------------------------------------------
>
> Key: LUCENE-4694
> URL: https://issues.apache.org/jira/browse/LUCENE-4694
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-4694.patch
>
>
> Users can easily get term vectors from any indexreader, but not postings
> lists. this encourages them to do really slow things: like pulling term
> vectors for every single document.
> this is really really so much worse than going through multifields or
> whatever.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]