Hi Shai, On Thu, Dec 13, 2012 at 12:21 PM, Shai Erera <[email protected]> wrote: > As I said, if someone volunteers to do some work on the Solr side, I will > gladly participate in that effort. > I just don't even know where to start w/ Solr :).
The entry point for Solr facets is org.apache.solr.request.SimpleFacets.getFacetCounts (called from FacetComponent). > One thing that would be really great is if we can build an adapter (I think > someone mentioned that word here) > which supports basic facets capabilities, so that we can at least benchmark > Solr's current > implementation vs the implementation w/ the module. Comparing both impls would be great but an adapter might be hard to write given how Lucene faceting differs from Solr faceting: the lucene module requires users to decide at indexing time what and how to facet whereas Solr does everything at searching time (there is even an issue open in order to be able to compute facet counts based on arbitray functions [1]) using FieldCache and UninvertedField (meaning that you can compute facets on any field that is indexed). So Lucene faceting would probably require an additional field property in the schema to let Solr know that it should add category paths to documents? (Please correct me if anything I wrote here is wrong). I have a few questions regarding the faceting module: - do you have any rough idea of how speed and memory usage vary depending on the number of docs to collect, distinct field values, etc. ? - TaxonomyReader seems to use ints as ordinals for category paths, does it mean that the faceting module can't handle paths that have more than 2B distinct values? Is it fixable? (Or maybe it doesn't make sense to handle such large numbers of distinct values?) [1] https://issues.apache.org/jira/browse/SOLR-1581 -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
