[ https://issues.apache.org/jira/browse/LUCENE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405218#comment-17405218 ]
Greg Miller commented on LUCENE-10062: -------------------------------------- Hmm, so I ran an internal benchmarking tool against our Lucene application (Amazon Product Search) and the results were not nearly as compelling. It looks like there wasn't much impact to red-line QPS or the latency (in particular, of our facet-counting step). It also looks like the index got bigger with this change by ~4%. I suspect there's a significant different between the two tests with respect to how many facet categories each doc is storing on average, probably highlighting the gap between these solutions where one is doing delta encoding and one isn't. I'm certainly not saying this should be a show-stopper for trying to more forward with this change, but it would be really good to understand if our internal use-case is an outlier here or if the {{luceneutil}} testing is the outlier. I'd obviously want to avoid a situation where our benchmarks think this is a great improvement but most common Lucene users see a regression! If anyone else has an application they're able to benchmark the change with, that could provide some more interesting data points. I'll also see if I can dig in more on our internal application and look to see if things can be sped up. > Explore using SORTED_NUMERIC doc values to encode taxonomy ordinals for > faceting > -------------------------------------------------------------------------------- > > Key: LUCENE-10062 > URL: https://issues.apache.org/jira/browse/LUCENE-10062 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Greg Miller > Assignee: Greg Miller > Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > We currently encode taxonomy ordinals using varint style packing in a binary > doc values field. I suspect there have been a number of improvements to > SortedNumericDocValues since taxonomy faceting was first introduced, and I > plan to explore replacing the custom binary format we have today with a > SORTED_NUMERIC type dv field instead. > I'll report benchmark results and index size impact here. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org