szehon-ho opened a new pull request #3760:
URL: https://github.com/apache/iceberg/pull/3760
A certain string in the input data (with a prefix of over 16 unparseable
chars like high/low surrogates) triggered a NullPointerException in Parquet
writer flush, which I reproduced in the accompanying unit test.
```
java.lang.NullPointerException
at
org.apache.iceberg.parquet.ParquetUtil.toBufferMap(ParquetUtil.java:307)
at
org.apache.iceberg.parquet.ParquetUtil.footerMetrics(ParquetUtil.java:166)
at
org.apache.iceberg.parquet.ParquetUtil.footerMetrics(ParquetUtil.java:88)
at
org.apache.iceberg.parquet.ParquetWriter.metrics(ParquetWriter.java:126)
at org.apache.iceberg.io.DataWriter.close(DataWriter.java:89)
at
org.apache.iceberg.parquet.TestParquetDataWriter.testCorruptString(TestParquetDataWriter.java:158)
```
The problem is that UnicodeUtil and BinaryUtil return null if fail to get a
truncated upper bound the string/binary. A null value in the upperBound maps
then triggers a NPE in the ParquetUtil.toBufferMap class as it tries to call
.getValue() on it.
```
private static Map<Integer, ByteBuffer> toBufferMap(Schema schema,
Map<Integer, Literal<?>> map) {
Map<Integer, ByteBuffer> bufferMap = Maps.newHashMap();
for (Map.Entry<Integer, Literal<?>> entry : map.entrySet()) {
bufferMap.put(entry.getKey(),
Conversions.toByteBuffer(schema.findType(entry.getKey()),
entry.getValue().value()));
}
return bufferMap;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]