On Mon, 2015-09-14 at 12:34 +0200, Leonardo Foderaro wrote: > Should you have any issue or suggestion on how to improve it please > let me know.
I can explain my planned project, as it seems relevant in a broader scope. Maybe you can tell me if such a project fits into your framework? We have a SolrCloud setup with billions of documents, with 2-300M documents in each shard. We need to define multiple "sub-corpora", with a granularity that can be at single-document-level. In Solr-speak that could be done with filters. A filter could be (id:1234 OR id:5678), which is easy enough. But that does not scale to millions of IDs. The idea is to introduce named filters, where the construction of the filters themselves is done internally in Solr. Creating a filter could be a call with a user-specified name (aka filter-ID) and an URL to a filter-setup. The filter-setup would just be a list of queries, one on each line id:1234 id:5678 domain:example.com id:7654 The lines are processed one at a time and each match is OR'ed to the named filter being constructed. As this is a streaming process, there is not real limit to the size. Using a previously constructed named filter would (guessing here) be a matter of writing a small alba-annotated class that takes the filter-ID as input and returns the corresponding custom-made Filter, which really is just a list of docIDs underneath (probably represented as a bitmap). - Toke Eskildsen, State and University Library, Denmark --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
