[
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959080#comment-13959080
]
Hoss Man commented on LUCENE-5527:
----------------------------------
This isn't an area of Lucene i tend to think much about, but if we're talking
about changing the API, i have a few questions i wanted to put out there for
discussion:
* is {{setNextReader}} really an appropriate name anymore? or should it be
something like {{getLeafCollector(AtomicReaderContext)}}
* Should we bite the bullet and make {{LeafCollector}} and {{Collector}} extend
{{Closable}} ?
** I believe this was discussed at one point before, and it was considered an
obnoxious interface change with minimal gain - but if we're going to change the
interface API now anyway, it may be worth reconsidering
** Solr works around this by always using {{DelegatingCollector}} which has a
{{finish()}} method - i can't imagine other lucene apps don't have similar
wishes for {{close()}} like this.
* Should we go ahead and think about if/how this API should be tweaked (now or
in the future) to allow a {{Collector}} to indicate that it's
{{LeafCollectors}} can be used to collect documents from different segments in
parallel threads? If we think a marker interface would be the best way to go,
so we can easily add that later -- but i wanted to ask the question
** One possibility that just occurred to me: Document right now that
{{LeafCollectors}} _may_ be processed in parallel, and that if you want to
force single threaded collection you should implement
{{Collector.getLeafCollector(AtomicReaderContext)}} such that it blocks until
the previously returned {{LeafCollector}} has been closed (which should be easy
enough to do in things like {{SimpleCollector}} using a basic sync lock, right?)
(I have no strong feelings about most of this)
> Make the Collector API work per-segment
> ---------------------------------------
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but
> there is one of them that I really really like that consists in refactoring
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
> AtomicCollector setNextReader(AtomicReaderContext context) throws
> IOException;
>
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
> void setScorer(Scorer scorer) throws IOException;
> void collect(int doc) throws IOException;
> boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more
> easily decide to use different strategies on different segments. In
> particular, it makes the early-termination collector much cleaner since it
> can return different atomic collectors implementations depending on whether
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it
> easier to migrate by having a Collector that would implement both Collector
> and AtomicCollector, return {{this}} in setNextReader and make current
> concrete Collector implementations extend this class instead of directly
> extending Collector.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]