Hi Erick
The index I am searching is lucene. I am trying to perform some operations
over ALL the documents in that index. I can rebuild the index as a solr
index and then use the export functionality. Up to now I've been using the
lucene index searcher with custom collector. Would the below code be
correct if I want to continue with lucene path?
thank you Erick
public class DocIDCollector extends SimpleCollector {
HashBiMap<Integer,Long> idSet = HashBiMap.create();
private Scorer scorer;
private NumericDocValues ids;
public boolean acceptsDocsOutOfOrder() {
return true;
}
public void setScorer(Scorer scorer) {
this.scorer = scorer;
}
public void doSetNextReader(LeafReaderContext reader)
throws IOException{
ids = DocValues.getNumeric(reader.reader(), "id");
}
public void collect(int doc) throws IOException {
long wid = ids.get(doc);
idSet.put(doc,wid);
}
public void reset() {
idSet.clear();
}
public HashBiMap<Integer,Long> getWikiIds() {
return idSet;
}
}
On Wed, Apr 29, 2015 at 11:32 AM, Erick Erickson <[email protected]>
wrote:
> Hmmm, it's not clear to me whether you're using Solr or not, but if
> you are have you considered using the export functionality? This is
> already built to stream large result sets back to the client. And
> lately (5.1), you can combine that with "streaming aggregation" to do
> some pretty cool stuff.
>
> Not sure it applies in your situation as you didn't state the use-case
> but thought I'd at least mention it.
>
> Best,
> Erick
>
> On Wed, Apr 29, 2015 at 7:41 AM, Robust Links <[email protected]>
> wrote:
> > Hi
> >
> > I need help porting my lucene code from 4 to 5. In particular, I need to
> > customize a collector (to collect all doc Ids in the index - which can be
> >>30MM docs..). Below is how I achieved this in lucene 4. Is there some
> > guidelines how to do this in lucene 5, specially on semantics changes of
> > AtomicReaderContext (which seems deprecated) and the new
> LeafReaderContext?
> >
> > thank you in advance
> >
> >
> > public class CustomCollector extends Collector {
> >
> > private HashSet<String> data = new HashSet<String>();
> >
> > private Scorer scorer;
> >
> > private int docBase;
> >
> > private BinaryDocValues dataList;
> >
> >
> > public boolean acceptsDocsOutOfOrder() {
> >
> > return true;
> >
> > }
> >
> > public void setScorer(Scorer scorer) {
> >
> > this.scorer = scorer;
> >
> > }
> >
> > public void setNextReader(AtomicReaderContext ctx) throws IOException{
> >
> > this.docBase = ctx.docBase;
> >
> > dataList = FieldCache.DEFAULT.getTerms(ctx.reader(),"title",false);
> >
> > }
> >
> > public void collect(int doc) throws IOException {
> >
> > BytesRef t = new BytesRef();
> >
> > dataList(doc);
> >
> > if (t.bytes != BytesRef.EMPTY_BYTES && t.bytes !=
> BytesRef.EMPTY_BYTES) {
> >
> > data((t.utf8ToString()));
> >
> > }
> >
> > }
> >
> > public void reset() {
> >
> > data.clear();
> >
> > dataList = null;
> >
> > }
> >
> > public HashSet<String> getData() {
> >
> > return data;
> >
> > }
> >
> > }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>