mapleFU commented on code in PR #37400:
URL: https://github.com/apache/arrow/pull/37400#discussion_r1309657769


##########
cpp/src/parquet/file_writer.cc:
##########
@@ -484,6 +516,22 @@ class FileSerializer : public ParquetFileWriter::Contents {
     }
   }
 
+  void WriteBloomFilter() {
+    if (bloom_filter_builder_ != nullptr) {
+      if (properties_->file_encryption_properties()) {
+        throw ParquetException("Encryption is not supported with bloom 
filter");
+      }
+      // Serialize page index after all row groups have been written and report
+      // location to the file metadata.
+      BloomFilterLocation bloom_filter_location;
+      bloom_filter_builder_->Finish();
+      bloom_filter_builder_->WriteTo(sink_.get(), &bloom_filter_location);
+      metadata_->SetBloomFilterLocation(bloom_filter_location);
+      // Release the memory for BloomFilter.
+      //      bloom_filter_builder_ = nullptr;

Review Comment:
   > Actually I think we'd better place bloom filters between row groups so we 
can proactively release the memory of serialized row groups as early as 
possible. But that can be a future optmization.
   
   We can support it later. I think it would be a bit tricky for RowGroup Split 
Handling. But it's ok for me. I can implement this firstly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to