[ https://issues.apache.org/jira/browse/LUCENE-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026380#comment-14026380 ]
Adrien Grand commented on LUCENE-5748: -------------------------------------- +1 I like it! > SORTED_NUMERIC dv type > ---------------------- > > Key: LUCENE-5748 > URL: https://issues.apache.org/jira/browse/LUCENE-5748 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Robert Muir > Attachments: LUCENE-5748.patch > > > Currently for Strings you have SORTED and SORTED_SET, capable of single and > multiple values per document respectively. > For multi-numerics, there are only a few choices: > * encode with NumericUtils into byte[]'s and store with SORTED_SET. > * encode yourself per-document into BINARY. > Both of these techniques have problems: > SORTED_SET isn't bad if you just want to do basic sorting (e.g. min/max) or > faceting counts: most of the bloat in the "terms dict" is compressed away, > and it optimizes the case where the data is actually single-valued, but it > falls apart performance-wise if you want to do more complex stuff like solr's > analytics component or elasticsearch's aggregations: the ordinals just get in > your way and cause additional work, deref'ing each to a byte[] and then > decoding that back to a number. Worst of all, any mathematical calculations > are off because it discards frequency (deduplicates). > using your own custom encoding in BINARY removes the unnecessary ordinal > dereferencing, but you trade off bad compression and access: you have no real > choice but to do something like vInt within each byte[] for the doc, which > means even basic sorting (e.g. max) is slow as its not constant time. There > is no chance for the codec to optimize things like dates with GCD compression > or optimize the single-valued case because its just an opaque byte[]. > So I think it would be good to explore a simple long[] type that solves these > problems. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org