Hello everyone, We are trying to choose between Taxonomy and SortedSetDocValuesFacetField implementations for faceted search, and based on available information and our quick tests, the difference is the following -
- Taxonomy is faster at query time (on our test workload, the difference sometimes is higher than documented 25%). Also SortedSet adds latency to an NRT refresh. - Taxonomy is slower at index time, and unlike SortedSet implementation, it does not scale as good with more than 4 threads (a lot of contention at DirectoryTaxonomyWriter#addCategory() and UTF8TaxonomyWriterCache.get() synchronized blocks) - SortedSet does not support hierarchical queries - SortedSet does not require a sidecar index - Tie-break differences for labels with the same count Am I missing something, or that’s everything we should take into account as of today? I know that Solr and ES use their own faceting for historical reasons, but are there any other large Lucene-based products, which have chosen one implementation over another? Do we know why? Any insight on less known trade-offs and production experience is greatly appreciated! -- Thank you, Alex