Hmmm, it's not clear to me whether you're using Solr or not, but if you are have you considered using the export functionality? This is already built to stream large result sets back to the client. And lately (5.1), you can combine that with "streaming aggregation" to do some pretty cool stuff.
Not sure it applies in your situation as you didn't state the use-case but thought I'd at least mention it. Best, Erick On Wed, Apr 29, 2015 at 7:41 AM, Robust Links <pey...@robustlinks.com> wrote: > Hi > > I need help porting my lucene code from 4 to 5. In particular, I need to > customize a collector (to collect all doc Ids in the index - which can be >>30MM docs..). Below is how I achieved this in lucene 4. Is there some > guidelines how to do this in lucene 5, specially on semantics changes of > AtomicReaderContext (which seems deprecated) and the new LeafReaderContext? > > thank you in advance > > > public class CustomCollector extends Collector { > > private HashSet<String> data = new HashSet<String>(); > > private Scorer scorer; > > private int docBase; > > private BinaryDocValues dataList; > > > public boolean acceptsDocsOutOfOrder() { > > return true; > > } > > public void setScorer(Scorer scorer) { > > this.scorer = scorer; > > } > > public void setNextReader(AtomicReaderContext ctx) throws IOException{ > > this.docBase = ctx.docBase; > > dataList = FieldCache.DEFAULT.getTerms(ctx.reader(),"title",false); > > } > > public void collect(int doc) throws IOException { > > BytesRef t = new BytesRef(); > > dataList(doc); > > if (t.bytes != BytesRef.EMPTY_BYTES && t.bytes != BytesRef.EMPTY_BYTES) { > > data((t.utf8ToString())); > > } > > } > > public void reset() { > > data.clear(); > > dataList = null; > > } > > public HashSet<String> getData() { > > return data; > > } > > } --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org