somandal commented on code in PR #8953:
URL: https://github.com/apache/pinot/pull/8953#discussion_r910139202
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java:
##########
@@ -77,7 +77,8 @@ public MultiValueFixedByteRawIndexCreator(File baseIndexDir,
ChunkCompressionTyp
boolean deriveNumDocsPerChunk, int writerVersion)
throws IOException {
File file = new File(baseIndexDir, column +
Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
- int totalMaxLength = maxNumberOfMultiValueElements *
valueType.getStoredType().size();
+ // Store the length followed by the values
+ int totalMaxLength = Integer.BYTES + (maxNumberOfMultiValueElements *
valueType.getStoredType().size());
Review Comment:
So the main finding:
This change is needed because today when we setup the `totalMaxLength`
without this change, we don't account for the additional 'length' of the MV row
that we need to add. The writer accounts for the additional bytes used in the
header as the offset but not this 'length' of the MV row since this writer is
used by both SV and MV types. With this change, the reader side will also get
the current max length and work as expected (otherwise we would have run into
issues on the reader side too).
Next question to answer was why the test
`MultiValueFixedByteRawIndexCreatorTest` wasn't failing. My findings on that:
We calculate the worst case chunk size large enough to fit all of the
multi-value rows in the below code:
```
int totalMaxLength = maxNumberOfMultiValueElements *
valueType.getStoredType().size();
chunkSize = numDocsPerChunk * (CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE + (long)
totalMaxLength)
```
Which means that if you have different number of multi-value entries per MV
row, they collectively won't take up the full size of the chunk, as some rows
will be smaller than `totalMaxLength` and some will the at `totalMaxLength`.
You need sufficient multi-value rows to have 'maxNumberOfMultiValueElements'
or enough elements to fully fill up the chunk (due to the header taking up
totalDocs * 4 bytes, we will hit the buffer overflow before we write out all
docs in case all MVs are at the max length). In the test, the number of entries
per MV row varies greatly, and the total collective length was less than the
space for data in the chunk buffer. This means we just don't exhaust the full
chunk size available. thus never hitting a buffer overflow error.
To validate this theory, I modified one of the functions to always choose 49
elements per MV int array like this (instead of choosing the array length at
random):
```
private static List<int[]> ints() {
return IntStream.range(0, 1000)
.mapToObj(i -> new int[49])
.peek(array -> {
for (int i = 0; i < array.length; i++) {
array[i] = RANDOM.nextInt();
}
})
.collect(Collectors.toList());
}
```
I was able to reproduce the buffer overflow (and this test passed when I
added the Integer.BYTES to the 'totalMaxLength' above):
```
java.nio.BufferOverflowException
at java.base/java.nio.DirectByteBuffer.put(DirectByteBuffer.java:409)
at java.base/java.nio.ByteBuffer.put(ByteBuffer.java:906)
at
org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter.putBytes(VarByteChunkSVForwardIndexWriter.java:108)
at
org.apache.pinot.segment.local.segment.creator.impl.fwd.MultiValueFixedByteRawIndexCreator.putIntMV(MultiValueFixedByteRawIndexCreator.java:119)
at
org.apache.pinot.segment.local.segment.index.creator.MultiValueFixedByteRawIndexCreatorTest.lambda$testMV$14(MultiValueFixedByteRawIndexCreatorTest.java:124)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at
org.apache.pinot.segment.local.segment.index.creator.MultiValueFixedByteRawIndexCreatorTest.testMV(MultiValueFixedByteRawIndexCreatorTest.java:124)
at
org.apache.pinot.segment.local.segment.index.creator.MultiValueFixedByteRawIndexCreatorTest.testMVInt(MultiValueFixedByteRawIndexCreatorTest.java:75)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:108)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:661)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:869)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1193)
at
org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:126)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
at org.testng.TestRunner.privateRun(TestRunner.java:744)
at org.testng.TestRunner.run(TestRunner.java:602)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:380)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:375)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:340)
at org.testng.SuiteRunner.run(SuiteRunner.java:289)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1301)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1226)
at org.testng.TestNG.runSuites(TestNG.java:1144)
at org.testng.TestNG.run(TestNG.java:1115)
at
com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66)
at
com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:109)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]