[
https://issues.apache.org/jira/browse/LUCENE-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15689447#comment-15689447
]
Toke Eskildsen commented on LUCENE-7521:
----------------------------------------
I was involved in the original PackedInts implementation, where I did quite a
bit of performance testing of the two different approaches: Optimal memory
packing (Packed64) and word-aligned packing (Packed64SingleBlock). They were
named different back then, but the principles and the performance-relevant code
parts were about the same. The JIRA is LUCENE-1990. The conclusion then was
that aligned won in a few cases but added quite a lot of complexity, so it was
scrapped.
Two years later the aligned version was re-introduced in LUCENE-4062. Again
there were some performance testing. Performance characteristics differed
depending on CPU structure and in-memory array size (cache utilization really).
Overall it seemed that aligned packing was faster, but not by much on the i7
(desktop & Xeon).
One important observation from the JIRA is that only the BPVs (Bits Per Value)
3, 5, 6, 7, 9, 10, 12 and 21 that differ in representation (and get/set
algorithm) between packed and aligned. There's some poor graphs from an old
comparison of those values on http://ekot.dk/misc/packedints/padding.html where
contiguous=packed and padding=aligned. This was for a small (10M values, AFAIR)
set. Note how the performance difference between the implementation varies a
lot, depending on CPU type.
Long story longer, I still favour having only 1 underlying format ("optimal"
packed): Too little gain in too few cases for a high code complexity cost with
aligned. On a related node, a high-quality micro-benchmark for structures like
these would be great.
> Simplify PackedInts
> -------------------
>
> Key: LUCENE-7521
> URL: https://issues.apache.org/jira/browse/LUCENE-7521
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7521.patch
>
>
> We have a lot of specialization in PackedInts about how to keep packed arrays
> of longs in memory. However, most use-cases have slowly moved to DirectWriter
> and DirectMonotonicWriter and most specializations we have are barely used
> for performance-sensitive operations, so I'd like to clean this up a bit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]