[jira] [Commented] (LUCENE-10062) Explore using SORTED_NUMERIC doc values to encode taxonomy ordinals for faceting

Greg Miller (Jira) Thu, 26 Aug 2021 06:19:04 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405218#comment-17405218
 ]


Greg Miller commented on LUCENE-10062:
--------------------------------------

Hmm, so I ran an internal benchmarking tool against our Lucene application 
(Amazon Product Search) and the results were not nearly as compelling. It looks 
like there wasn't much impact to red-line QPS or the latency (in particular, of 
our facet-counting step). It also looks like the index got bigger with this 
change by ~4%. I suspect there's a significant different between the two tests 
with respect to how many facet categories each doc is storing on average, 
probably highlighting the gap between these solutions where one is doing delta 
encoding and one isn't.

I'm certainly not saying this should be a show-stopper for trying to more 
forward with this change, but it would be really good to understand if our 
internal use-case is an outlier here or if the {{luceneutil}} testing is the 
outlier. I'd obviously want to avoid a situation where our benchmarks think 
this is a great improvement but most common Lucene users see a regression! If 
anyone else has an application they're able to benchmark the change with, that 
could provide some more interesting data points. I'll also see if I can dig in 
more on our internal application and look to see if things can be sped up.

> Explore using SORTED_NUMERIC doc values to encode taxonomy ordinals for 
> faceting
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-10062
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10062
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Greg Miller
>            Assignee: Greg Miller
>            Priority: Minor
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> We currently encode taxonomy ordinals using varint style packing in a binary 
> doc values field. I suspect there have been a number of improvements to 
> SortedNumericDocValues since taxonomy faceting was first introduced, and I 
> plan to explore replacing the custom binary format we have today with a 
> SORTED_NUMERIC type dv field instead.
> I'll report benchmark results and index size impact here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10062) Explore using SORTED_NUMERIC doc values to encode taxonomy ordinals for faceting

Reply via email to