vvysotskyi commented on a change in pull request #2143:
URL: https://github.com/apache/drill/pull/2143#discussion_r617911298



##########
File path: 
exec/java-exec/src/main/java/org/apache/parquet/hadoop/ParquetColumnChunkPageWriteStore.java
##########
@@ -260,14 +260,16 @@ public long getMemSize() {
     }
 
     /**
-     * Writes a number of pages within corresponding column chunk
+     * Writes a number of pages within corresponding column chunk <br>
+     * // TODO: the Bloom Filter can be useful in filtering entire row groups,
+     *     see <a 
href="https://issues.apache.org/jira/browse/DRILL-7895";>DRILL-7895</a>

Review comment:
       @vdiravka, thanks for sharing screenshots and providing more details.
   
   > 3. And we converted that buf to bytes via BytesInput.from(buf) and 
compressedBytes.writeAllTo(buf). So all data still placed in heap.
   
   Please note, that when calling `BytesInput.from(buf)`, it doesn't convert 
all bytes of the buffer at the same time, it creates `CapacityBAOSBytesInput` 
that wraps provided `CapacityByteArrayOutputStream` and uses it when writing to 
the OutputStream.
   Regarding the `compressedBytes.writeAllTo(buf)` call this is fine to have 
bytes here since GC will take care of them, no reasons for possible leaks, data 
that should be processed later will be stored in direct memory.
   
   But when using `ConcatenatingByteArrayCollector`, all bytes will be stored 
in heap (including data that should be processed later) so GC has no power here.
   
   Not sure why the heap usage you provided is similar, perhaps it may make 
difference when we will have more data, or GC will do its work right before 
flushing data from the `buf`... 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to