[ https://issues.apache.org/jira/browse/PARQUET-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770034#comment-17770034 ]
ASF GitHub Bot commented on PARQUET-2348: ----------------------------------------- ConeyLiu commented on code in PR #1143: URL: https://github.com/apache/parquet-mr/pull/1143#discussion_r1340067800 ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ########## @@ -366,6 +366,10 @@ private void processChunk(ColumnChunkMetaData chunk, ColumnIndex columnIndex = reader.readColumnIndex(chunk); OffsetIndex offsetIndex = reader.readOffsetIndex(chunk); + BloomFilter bloomFilter = reader.readBloomFilter(chunk); + if (bloomFilter != null) { + writer.addBloomFilter(chunk.getPath().toDotString(), bloomFilter); Review Comment: Sorry for the late response. Added the related UTs. > Recompression/Re-encrypt should rewrite bloomfilter > --------------------------------------------------- > > Key: PARQUET-2348 > URL: https://issues.apache.org/jira/browse/PARQUET-2348 > Project: Parquet > Issue Type: Bug > Reporter: Xianyang Liu > Priority: Major > > The bloomfilter data is lost after rewriting with recompression or > re-encrypt. We should rewrite the bloomfilter data as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)