[
https://issues.apache.org/jira/browse/LUCENE-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435401#comment-17435401
]
Michael McCandless commented on LUCENE-9673:
--------------------------------------------
OK phew catching up on this issue again [~mashudong]! Sorry for the crazy long
delay.
It turns out nothing in Lucene's {{core}} uses any of this complex growable
{{int[]}} logic – only {{MemoryIndex}} does (today anyways). {{core}}'s
{{int[]}} allocation need are are simpler: just allocating 1, 2 or 3 ints per
new term encountered during indexing (depending on docs, freqs, prox are
enabled). For {{byte[]}} storage, we do still use/need the growing slices to
account for longer and shorter vInt encoded postings lists.
I will open a follow-on issue to promote this out of {{core}} into
{{MemoryIndex}}.
For this issue let's just fix this sneaky {{IntBlockPool}} performance bug!
Oh and I also found this long-standing {{TODO}}:
{noformat}
// TODO: figure out why this is 2*streamCount here. streamCount should be
enough?{noformat}
And indeed it is over-allocating – we are wasting half of the {{int[]}} RAM we
are allocating! I fixed that, tests pass. So this will be a little RAM
efficiency improvement for {{IndexWriter}}.
Separately, I wonder if we could run a static "locally dead code detector" from
gradle that would crawl the source graph dependencies, excluding tests? I.e.
this code was not technically dead, since unit tests were indeed exercising it,
and another Lucene module was also using it, but nothing in Lucene's {{core}}
was in fact using it. I wish such code were automatically removed from our
repository, or proposed to be moved out to the module that really needs it :)
Sort of a source code garbage collector ...
> The level of IntBlockPool slice is always 1
> --------------------------------------------
>
> Key: LUCENE-9673
> URL: https://issues.apache.org/jira/browse/LUCENE-9673
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/other
> Reporter: mashudong
> Priority: Minor
> Attachments: LUCENE-9673.patch
>
>
> First slice is allocated by IntBlockPoo.newSlice(), and its level is 1,
>
> {code:java}
> private int newSlice(final int size) {
> if (intUpto > INT_BLOCK_SIZE-size) {
> nextBuffer();
> assert assertSliceBuffer(buffer);
> }
>
> final int upto = intUpto;
> intUpto += size;
> buffer[intUpto-1] = 1;
> return upto;
> }{code}
>
>
> If one slice is not enough, IntBlockPoo.allocSlice() is called to allocate
> more slices,
> as the following code shows, level is 1, newLevel is NEXT_LEVEL_ARRAY[0]
> which is also 1.
>
> The result is the level of IntBlockPool slice is always 1, the first slice is
> 2 bytes long, and all subsequent slices are 4 bytes long.
>
> {code:java}
> private static final int[] NEXT_LEVEL_ARRAY = {1, 2, 3, 4, 5, 6, 7, 8, 9, 9};
> private int allocSlice(final int[] slice, final int sliceOffset) {
> final int level = slice[sliceOffset];
> final int newLevel = NEXT_LEVEL_ARRAY[level - 1];
> final int newSize = LEVEL_SIZE_ARRAY[newLevel];
> // Maybe allocate another block
> if (intUpto > INT_BLOCK_SIZE - newSize) {
> nextBuffer();
> assert assertSliceBuffer(buffer);
> }
> final int newUpto = intUpto;
> final int offset = newUpto + intOffset;
> intUpto += newSize;
> // Write forwarding address at end of last slice:
> slice[sliceOffset] = offset;
> // Write new level:
> buffer[intUpto - 1] = newLevel;
> return newUpto;
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]