I have a complicated problem to solve, and I don't know enough about
lucene/solr to phrase the question properly. This is kind of a shot in the
dark. My requirement is to return search results always in completely
"collapsed" form, rolling up duplicates with a count. Duplicates are defined
by whatever fields are requested. If the search requests fields A, B, C,
then all matched documents that have identical values for those 3 fields are
"dupes". The field list may change with every new search request. What I do
know is the super set of all fields that may be part of the field list at
index time.

I know this can't be done with configuration alone. It doesn't seem
performant to retrieve all 1M+ docs and post process in Java. A very smart
person told me that a custom hit collector should be able to do the
filtering for me. So, maybe I create a custom search handler that somehow
exposes this custom hit collector that can use FieldCache or DocValues to
examine all the matches and filter the results in the way I've described
above.

So assuming this is a viable solution path, can anyone suggest some helpful
posts, code fragments, books for me to review? I admit to being out of my
depth, but this requirement isn't going away. I'm grasping for straws right
now.

thanks
(using Solr 4.9)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to