parthchandra commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1039958501
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,36 @@ public DataPage readPage() {
public DataPage visit(DataPageV1 dataPageV1) {
try {
BytesInput bytes = dataPageV1.getBytes();
- if (null != blockDecryptor) {
- bytes =
BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+ BytesInput decompressed;
+
+ if (options.getAllocator().isDirect() &&
options.useOffHeapDecryptBuffer()) {
+ ByteBuffer byteBuffer = bytes.toByteBuffer();
+ if (!byteBuffer.isDirect()) {
+ throw new ParquetDecodingException("Expected a direct buffer");
+ }
+ if (blockDecryptor != null) {
+ byteBuffer = blockDecryptor.decrypt(byteBuffer, dataPageAAD);
+ }
+ long compressedSize = byteBuffer.limit();
+
+ ByteBuffer decompressedBuffer =
+
options.getAllocator().allocate(dataPageV1.getUncompressedSize());
+ decompressor.decompress(byteBuffer, (int) compressedSize,
decompressedBuffer,
+ dataPageV1.getUncompressedSize());
+
+ // HACKY: sometimes we need to do `flip` because the position of
output bytebuffer is
Review Comment:
The output buffer is set, but the position is not reset after the call to
some direct buffer decompressors. (Not clear to me where in the direct
decompression it happens; it might be worth looking into). It is safe (and not
expensive) to call flip.
##########
parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java:
##########
@@ -44,6 +44,8 @@ public class ParquetReadOptions {
private static final int ALLOCATION_SIZE_DEFAULT = 8388608; // 8MB
private static final boolean PAGE_VERIFY_CHECKSUM_ENABLED_DEFAULT = false;
private static final boolean BLOOM_FILTER_ENABLED_DEFAULT = true;
+ // Default to true if JDK 17 or newer.
Review Comment:
Oops. Comment got left behind from the original. I changed the
initialization after some review comments.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]