Re: Taxonomy vs SSDVFF for faceted search

Alexander Lukyanchikov Wed, 28 Apr 2021 14:49:09 -0700

Hi Matt,
It's very interesting, thanks for the response! Did you have any issues
with Taxonomy indexing performance, or maybe tried to optimize it somehow?
Also, any problems maintaining a sidecar index or experience building a
distributed system around it with sharding/rebalancing?


--
Regards,
Alex


On Wed, Apr 28, 2021 at 11:18 AM Matt Davis <kryptonics...@gmail.com> wrote:

> Alex,
>
> With our lucene based implementation of Zulia (
> https://github.com/zuliaio/zuliasearch) we have went back and forth.  We
> started with Taxonomy and switched and then switched back to taxonomy.  In
> our experience the Taxonomy based approach is more scalable and
> performant.   We do large searches (sometimes returning millions of
> results) with about 20 facets being run with some high cardinality facets.
> A small dataset version of the tool that is backed by zulia we released for
> covid can be found here (
>
> https://icite.od.nih.gov/covid19/search/#search:searchId=6089a5b7218c6902d422e907
> ).
> If you click on the facet tab you can see how we use facets.  I believe the
> use case might largely drive the choice.
>
> Thanks,
> Matt
>
> On Wed, Apr 28, 2021 at 1:26 PM Alexander Lukyanchikov <
> alexanderlukyanchi...@gmail.com> wrote:
>
> > Hello everyone,
> >
> > We are trying to choose between Taxonomy and SortedSetDocValuesFacetField
> > implementations for faceted search, and based on available information
> and
> > our quick tests, the difference is the following -
> >
> > - Taxonomy is faster at query time (on our test workload, the difference
> > sometimes is higher than documented 25%). Also SortedSet adds latency to
> an
> > NRT refresh.
> > - Taxonomy is slower at index time, and unlike SortedSet implementation,
> it
> > does not scale as good with more than 4 threads (a lot of contention at
> > DirectoryTaxonomyWriter#addCategory() and UTF8TaxonomyWriterCache.get()
> > synchronized blocks)
> > - SortedSet does not support hierarchical queries
> > - SortedSet does not require a sidecar index
> > - Tie-break differences for labels with the same count
> >
> > Am I missing something, or that’s everything we should take into account
> as
> > of today?
> >
> > I know that Solr and ES use their own faceting for historical reasons,
> but
> > are there any other large Lucene-based products, which have chosen one
> > implementation over another? Do we know why?
> > Any insight on less known trade-offs and production experience is greatly
> > appreciated!
> >
> > --
> > Thank you,
> > Alex
> >
>

Re: Taxonomy vs SSDVFF for faceted search

Reply via email to