On Mon, 2015-09-14 at 12:34 +0200, Leonardo Foderaro wrote:

> Should you have any issue or suggestion on how to improve it please
> let me know. 

I can explain my planned project, as it seems relevant in a broader
scope. Maybe you can tell me if such a project fits into your framework?


We have a SolrCloud setup with billions of documents, with 2-300M
documents in each shard. We need to define multiple "sub-corpora", with
a granularity that can be at single-document-level. In Solr-speak that
could be done with filters. A filter could be (id:1234 OR id:5678),
which is easy enough. But that does not scale to millions of IDs.

The idea is to introduce named filters, where the construction of the
filters themselves is done internally in Solr.

Creating a filter could be a call with a user-specified name (aka
filter-ID) and an URL to a filter-setup. The filter-setup would just be
a list of queries, one on each line
 id:1234
 id:5678
 domain:example.com
 id:7654
The lines are processed one at a time and each match is OR'ed to the
named filter being constructed. As this is a streaming process, there is
not real limit to the size.

Using a previously constructed named filter would (guessing here) be a
matter of writing a small alba-annotated class that takes the filter-ID
as input and returns the corresponding custom-made Filter, which really
is just a list of docIDs underneath (probably represented as a bitmap).


- Toke Eskildsen, State and University Library, Denmark




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to