Hi Robust Links:
I think you want to build a class that implements the LeafCollector.
For example:
public class theLeafCollectorDocid implements LeafCollector
{
theLeafCollectorDocid( final LeafReaderContext context )
{
}
collect( int doc )
{
}
}
Once you done this then build another class that implements the Collector.
For example:
public class docCollectorKeyDocid implements Collector
{
public LeafCollector getLeafCollector( final LeafReaderContext
context )
{
final LeafCollector tlc = new
theLeafCollectorDocid(context );
}
}
This will, I believe, allow you to realize your goal.
regards,
west suhanic
On Wed, Apr 29, 2015 at 10:41 AM, Robust Links <[email protected]>
wrote:
> Hi
>
> I need help porting my lucene code from 4 to 5. In particular, I need to
> customize a collector (to collect all doc Ids in the index - which can be
> >30MM docs..). Below is how I achieved this in lucene 4. Is there some
> guidelines how to do this in lucene 5, specially on semantics changes of
> AtomicReaderContext (which seems deprecated) and the new LeafReaderContext?
>
> thank you in advance
>
>
> public class CustomCollector extends Collector {
>
> private HashSet<String> data = new HashSet<String>();
>
> private Scorer scorer;
>
> private int docBase;
>
> private BinaryDocValues dataList;
>
>
> public boolean acceptsDocsOutOfOrder() {
>
> return true;
>
> }
>
> public void setScorer(Scorer scorer) {
>
> this.scorer = scorer;
>
> }
>
> public void setNextReader(AtomicReaderContext ctx) throws IOException{
>
> this.docBase = ctx.docBase;
>
> dataList = FieldCache.DEFAULT.getTerms(ctx.reader(),"title",false);
>
> }
>
> public void collect(int doc) throws IOException {
>
> BytesRef t = new BytesRef();
>
> dataList(doc);
>
> if (t.bytes != BytesRef.EMPTY_BYTES && t.bytes != BytesRef.EMPTY_BYTES) {
>
> data((t.utf8ToString()));
>
> }
>
> }
>
> public void reset() {
>
> data.clear();
>
> dataList = null;
>
> }
>
> public HashSet<String> getData() {
>
> return data;
>
> }
>
> }
>