Re: [PR] PARQUET-2366: Optimize random seek during rewriting [parquet-mr]

via GitHub Tue, 17 Oct 2023 04:48:12 -0700


ConeyLiu commented on code in PR #1174:
URL: https://github.com/apache/parquet-mr/pull/1174#discussion_r1361986315



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java:
##########
@@ -265,6 +265,10 @@ private void processBlocksFromReader() throws IOException {
       BlockMetaData blockMetaData = meta.getBlocks().get(blockId);
       List<ColumnChunkMetaData> columnsInOrder = blockMetaData.getColumns();
 
+      List<ColumnIndex> columnIndexes = readAllColumnIndexes(reader, 
columnsInOrder, descriptorsMap);

Review Comment:
   I could add an option to it if someone is concerned about memory usage. This 
only caches the metadata for only one block and should be smaller than doing 
file writing which needs to cache all blocks' metadata.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] PARQUET-2366: Optimize random seek during rewriting [parquet-mr]

Reply via email to