kirkrodrigues opened a new issue, #10130:
URL: https://github.com/apache/pinot/issues/10130
When streaming data into a no-dictionary MV column, the segment fails to be
built with the following exception:
```
2023/01/14 03:28:20.982 ERROR
[LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z]
[bug__0__0__20230114T0826Z] Could not build segment
java.nio.BufferOverflowException: null
at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:409) ~[?:?]
at java.nio.ByteBuffer.put(ByteBuffer.java:914) ~[?:?]
at
org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter.putBytes(VarByteChunkSVForwardIndexWriter.java:118)
~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba
at
org.apache.pinot.segment.local.segment.creator.impl.fwd.MultiValueFixedByteRawIndexCreator.putIntMV(MultiValueFixedByteRawIndexCreator.java:119)
~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca0
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(SegmentColumnarIndexCreator.java:677)
~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074a
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:240)
~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba0
at
org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:110)
~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074af1d4d492b92
at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:903)
[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475b
at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:814)
[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475
at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:713)
[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475
at java.lang.Thread.run(Thread.java:829) [?:?]
2023/01/14 03:28:21.003 ERROR
[LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z]
[bug__0__0__20230114T0826Z] Could not build segment for
bug__0__0__20230114T0826Z
```
Glancing at the code, it seems like
`MutableNoDictionaryColStatistics::getMaxNumberOfMultiValues` returns 0,
whereas it should probably return
`_dataSource.getDataSourceMetadata().getMaxNumValuesPerMVEntry()`, similar to
`MutableColStatistics::getMaxNumberOfMultiValues`? (I may be totally wrong
though, didn't look at all the code.)
# Version
ca86ef
# Environment
* OpenJDK 11.0.16
* Ubuntu 18.04
# Reproduction steps
* Add the following schema to Pinot:
```json
{
"schemaName": "bug",
"dimensionFieldSpecs": [
{
"name": "integers",
"dataType": "INT",
"singleValueField": false
}
],
"dateTimeFieldSpecs": [
{
"name": "timestamp",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}
]
}
```
* Add the following table to Pinot:
```json
{
"tableName": "bug",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "timestamp",
"timeType": "MILLISECONDS",
"schemaName": "bug",
"replicasPerPartition": "1"
},
"tenants": {},
"tableIndexConfig": {
"noDictionaryColumns": [
"integers"
],
"loadMode": "MMAP",
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.topic.name": "bug-topic",
"stream.kafka.decoder.class.name":
"org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder",
"stream.kafka.consumer.factory.class.name":
"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": "localhost:9876",
"realtime.segment.flush.threshold.time": "3600000",
"realtime.segment.flush.threshold.rows": "500000",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest"
}
},
"metadata": {
"customConfigs": {}
}
}
```
* Ingest over a segment's worth of JSON records (500K+) containing the field
`integers` into the table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]