Taxonomy vs SSDVFF for faceted search

Alexander Lukyanchikov Wed, 28 Apr 2021 10:26:20 -0700

Hello everyone,

We are trying to choose between Taxonomy and SortedSetDocValuesFacetField
implementations for faceted search, and based on available information and
our quick tests, the difference is the following -


- Taxonomy is faster at query time (on our test workload, the difference
sometimes is higher than documented 25%). Also SortedSet adds latency to an
NRT refresh.
- Taxonomy is slower at index time, and unlike SortedSet implementation, it
does not scale as good with more than 4 threads (a lot of contention at
DirectoryTaxonomyWriter#addCategory() and UTF8TaxonomyWriterCache.get()
synchronized blocks)
- SortedSet does not support hierarchical queries
- SortedSet does not require a sidecar index
- Tie-break differences for labels with the same count

Am I missing something, or that’s everything we should take into account as
of today?

I know that Solr and ES use their own faceting for historical reasons, but
are there any other large Lucene-based products, which have chosen one
implementation over another? Do we know why?
Any insight on less known trade-offs and production experience is greatly
appreciated!

--
Thank you,
Alex

Taxonomy vs SSDVFF for faceted search

Reply via email to