[GitHub] [drill] vvysotskyi commented on a change in pull request #2143: DRILL-7825: Unknown logical type in Parquet

GitBox Sat, 17 Apr 2021 09:48:33 -0700


vvysotskyi commented on a change in pull request #2143:
URL: https://github.com/apache/drill/pull/2143#discussion_r615276673




##########
File path: 
exec/java-exec/src/main/java/org/apache/parquet/hadoop/ParquetColumnChunkPageWriteStore.java
##########
@@ -260,14 +260,16 @@ public long getMemSize() {
     }
 
     /**
-     * Writes a number of pages within corresponding column chunk
+     * Writes a number of pages within corresponding column chunk <br>
+     * // TODO: the Bloom Filter can be useful in filtering entire row groups,
+     *     see <a 
href="https://issues.apache.org/jira/browse/DRILL-7895";>DRILL-7895</a>

Review comment:
       @vdiravka, are you sure that heap memory usage is the same? I assumed 
that the main reason for using `ParquetColumnChunkPageWriteStore` was to use 
direct memory instead of heap one...
   From the code perspective, it looks like nothing was done in this direction 
for `ColumnChunkPageWriteStore`, it is still using the 
`ConcatenatingByteArrayCollector` for collecting data before writing it to the 
file, but our version uses `CapacityByteArrayOutputStream` that uses provided 
allocator.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] vvysotskyi commented on a change in pull request #2143: DRILL-7825: Unknown logical type in Parquet

Reply via email to