[
https://issues.apache.org/jira/browse/LUCENE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551992#comment-17551992
]
Patrick Zhai commented on LUCENE-10605:
---------------------------------------
I'm definitely not an expert on this but after some research I found:
# The real problem probably is we're assuming object alignment in 32 bit jvm
is 4 bytes but they're actually default into 8 bytes in HotSpot JVM and can't
be anything less than 8 bytes
([https://stackoverflow.com/questions/44468639/memory-alignment-of-java-classes)]
# Object header may create offset for object alignment, like in your jol
analysis, the header is 12 bytes long and thus created a 12%8=4 bytes offset,
so that the target array size should cover those and that's why for {{byte[]}}
4,12,20... sizes are optimal, but I *think* the header length can vary depend
on either jvm or system, since I've seen some post with 2 mark words in the
header which makes header 16 bytes
So there should be something we could optimize here, but probably need to
figure out a way to identify how many bytes are in array header, ah
[RamUsageEstimator|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java#L179,L187]
listed the details out, the 64 bit machine's header is already aligned so we
don't need to worry about the offset, and 32 bit machine's header is constant
12 bytes so with a 4 bytes offset.
> fix error in 32bit jvm object alignment gap calculation
> -------------------------------------------------------
>
> Key: LUCENE-10605
> URL: https://issues.apache.org/jira/browse/LUCENE-10605
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/other
> Affects Versions: 8.11.1
> Environment: jdk 7 32-bit
> jdk 8 32-bit
> Reporter: sun wuqiang
> Priority: Trivial
> Attachments: image-2022-06-08-20-50-27-712.png,
> image-2022-06-08-21-24-57-674.png, image-2022-06-09-08-25-55-289.png,
> image-2022-06-09-08-26-36-528.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> ArrayUtil.{*}oversize{*}(int minTargetSize, int bytesPerElement)
> This method is used to calculate the optimal length of an array during
> expansion.
>
> According to current logic,in order to avoid space waste caused by *object
> alignment gap.* In *32-bit* JVM,the array length will select the numbers(the
> +current optional+ columns) in the table below. But the results weren't
> perfect.
> For example, if I want to expand byte[2], I will call the method
> oversize(2,1) to get the size of the next array, which returns 8.
> But byte [8] is not the best result.
> Since byte[8] and byte[12] use the same memory space (both are 24 bytes due
> to alignment gap),
> So it's best to return 12 here.
> See the table below.
> !image-2022-06-09-08-26-36-528.png!
>
> I used *jol-core* to calculate object alignment gap
> {code:java}
> <dependency>
> <groupId>org.openjdk.jol</groupId>
> <artifactId>jol-core</artifactId>
> <version>0.16</version>
> <scope>compile</scope>
> </dependency> {code}
>
> Execute the following code:
> {code:java}
> System.out.println(ClassLayout.parseInstance(new byte[6]).toPrintable());
> {code}
>
> !image-2022-06-08-21-24-57-674.png!
>
> To further verify that the tool's results are correct, I wrote the following
> code to infer how much space the array of different lengths actually occupies
> based on when the OOM occursThe conclusion is consistent with jol-core.
> {code:java}
> // -Xms16m -Xmx16m
> // Used to infer the memory space occupied
> // by the length of various arrays
> public static void main(String[] args) {
> byte[][] arr = new byte[1024 * 1024][];
> for (int i = 0; i < arr.length; i++) {
> if (i % 100 == 0) {
> System.out.println(i);
> }
> // According to OOM occurrence time
> // in 32-bit JVM,
> // Arrays range in length from 5 to 12,
> // occupying the same amount of memory
> arr[i]=new byte[5];
> }
> } {code}
> *new byte[5]* and *new byte[12]* use the same amount of memory
> ----
>
> In addition +*- XX: ObjectAlignmentInBytes*+ should also affect the return
> value of this method. But I don't know whether it is necessary to do this
> function. If necessary, I will modify it together. Thank you very much!
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]