[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

Hoss Man (JIRA) Thu, 03 Apr 2014 14:33:54 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959254#comment-13959254
 ]


Hoss Man commented on LUCENE-5527:
----------------------------------

bq. The patch only moves 3 methods of the Collector class into a new 
LeafCollector class that gets instantiated per-segment. It doesn't add any 
functionnality but just changes the way documents are collected.

sure ... but "in for a penny in for a pound" ... this change is going to 
require every existing implementation of "Collector" to change, so *if* we're 
already going to require that of all existing classes, why not also add a 
{{close()}} method (that can easily be a No-Op for most existing 
implementations) at the same time?

bq. I think having callbacks upon collection completion would be useful and 
help clean up code as Shikhar Bhushan mentionned. But this LeafCollector 
refactoring has already been stalled in LUCENE-5299 because LUCENE-5299 also 
discussed parallelizing collection so I would like to keep this change as small 
as possible and get it in.

Fair enough -- as i said, i don't have strong opinions about any of this, i 
just wanted to raise the question.  But personally i think that adding 
{{Closable}} here and now as part of this API change issue makes sense (since 
it's about the "life cycle" of the Collector/LeafCollector) independent of any 
question of parallelization.,  Even if parallelization is controversial, adding 
{{close()}} doesn't seem to be, so why not do all of the known desirable, 
non-controversial, Collector API changes at once?

If i'm wrong, and there is some dissent about whether {{close()}} makes sense 
on these classes, then by all means yes: ignore the suggestion and focus purely 
on what you've got in the current patch.

> Make the Collector API work per-segment
> ---------------------------------------
>
>                 Key: LUCENE-5527
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5527
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

Reply via email to