We have been working on a search index that contains Archived Web Pages that 
has been collected over a number of years. This can result in the same 
page(url) being collected on many dates. The problem that we faced is that we 
wanted to group results by Site(domain) but this left us with the same page 
being found many times so we needed a second level of grouping.

I have extended the SOLR 5.5.3 grouping code to allow for 2 level grouping, 
through discussions with some of the people that are involved with archiving 
websites it was requested that the code be shared with the SOLR developers. I 
have made the code public on github  SOLR 
Grouping<https://github.com/nla/solr-grouping>.

When extending the SOLR grouping code I tried to keep the code generic so that 
it could possible used elsewhere but I did not try to make all existing 
features work, only focusing on the parts that we needed for our system. Along 
the way I found a couple of bugs that I fixed in this code (1. Integer overflow 
in holding the total record count & 2. Not searching all shards when performing 
the second phase of the query(get all records within a group)).


Ian Caldwell
National Library of Australia

Reply via email to