[jira] [Commented] (LUCENE-4694) Add back IndexReader.fields() -> Multi*, or discourage term vectors in some better way

Shai Erera (JIRA) Sun, 10 Mar 2013 00:19:15 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598186#comment-13598186
 ]


Shai Erera commented on LUCENE-4694:
------------------------------------

bq. we should never ever expose postings on composite readers

I don't know if I agree with this. Iterating on a posting list is something 
very basic IMO. And it has no inefficiencies whatsoever when a CompositeReader 
needs to implement it. The returned CompositeDocsEnum can do the iteration on 
the sub-DocsEnum itself, using liveDocs of each (not MultiLiveDocs) and you get 
both a friendly API as well as that's what users will need to do anyway ...

bq. For example IR has nothing in its javadocs about how to work the leaves per 
segment

I don't think it's just a matter of javadocs. If your application has an 
{{IndexReader r}}, it's entirely not clear today how to get to the 
postings/DocValues API. {{leaves()}} doesn't give the hint, and the fact that 
document() and getTV() are there is even more confusing.

I personally think it's ok if IndexReader lets you get docsValues(doc), 
document(doc), getTV(doc) and termDocsEnum(term). There's nothing inefficient 
about supporting them, as far as I can see.

About API that needs to "merge" things, like fields(), terms(field), 
getLiveDocs() ... well, what's the harm of exposing that with documentation 
that these are implemented inefficiently, and you should use the respective API 
on the AtomicReader returned from leaves()? We don't need to make everyone an 
expert Lucene developer, especially when it doesn't matter (e.g. for simple 
stupid tests that need to wrap an IR with SlowComposite)... however, since I 
appreciate all the work that was done to separate the API, I'm fine if uses 
need to do wrap w/ SlowComposite. That makes the 'slowness' more evident. But 
let's force users to do that only for the API that really cannot be implemented 
efficiently?
                
> Add back IndexReader.fields() -> Multi*, or discourage term vectors in some 
> better way
> --------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4694
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4694
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-4694.patch
>
>
> Users can easily get term vectors from any indexreader, but not postings 
> lists. this encourages them to do really slow things: like pulling term 
> vectors for every single document.
> this is really really so much worse than going through multifields or 
> whatever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4694) Add back IndexReader.fields() -> Multi*, or discourage term vectors in some better way

Reply via email to