Robert Muir created LUCENE-4771:
-----------------------------------
Summary: Query-time join collectors could maybe be more efficient
Key: LUCENE-4771
URL: https://issues.apache.org/jira/browse/LUCENE-4771
Project: Lucene - Core
Issue Type: Improvement
Components: modules/join
Reporter: Robert Muir
I was looking @ these collectors on LUCENE-4765 and I noticed:
* SingleValued collector (SV) pulls FieldCache.getTerms and adds the bytes to a
bytesrefhash per-collect.
* MultiValued collector (MV) pulls FieldCache.getDocTermsOrds, but doesnt use
the ords, just looks up each value and adds the bytes per-collect.
I think instead its worth investigating if SV should use getTermsIndex, and
both collectors just collect-up their per-segment ords in something like a
BitSet[maxOrd].
When asked for the terms at the end in getCollectorTerms(), they could merge
these into one BytesRefHash.
Of course, if you are going to turn around and execute the query against the
same searcher anyway (is this the typical case?), this could even be more
efficient: No need to hash or instantiate all the terms in memory, we could do
postpone the lookups to SeekingTermSetTermsEnum.accept()/nextSeekTerm() i
think... somehow :)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]