[
https://issues.apache.org/jira/browse/LUCENE-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741566#comment-15741566
]
Michael McCandless commented on LUCENE-7589:
--------------------------------------------
bq. However if you add a new field that stores the average number of miles per
hour as a long doc values field, then it highlights the quality issues of this
dataset and disk usage for this field goes from 40 to 15.7 bits per value
(-60%) with the patch.
Ahhh, I see! The taxis that go faster than the speed of light are not apparent
now since we don't store that field directly... makes sense.
> Prevent outliers from raising the number of bits of everyone with numeric doc
> values
> ------------------------------------------------------------------------------------
>
> Key: LUCENE-7589
> URL: https://issues.apache.org/jira/browse/LUCENE-7589
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7589.patch
>
>
> Today we encode entire segments with a single number of bits per value. It
> was done this way because it was faster, but it also means a single outlier
> can significantly increase the space requirements. I think we should have
> protection against that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]