rok commented on code in PR #41821:
URL: https://github.com/apache/arrow/pull/41821#discussion_r1792615650
##########
cpp/src/parquet/metadata.cc:
##########
@@ -1078,6 +1096,36 @@ void FileMetaData::WriteTo(::arrow::io::OutputStream*
dst,
return impl_->WriteTo(dst, encryptor);
}
+::arrow::Result<std::shared_ptr<parquet::FileMetaData>>
FileMetaData::CoalesceMetadata(
+ std::vector<std::shared_ptr<parquet::FileMetaData>>& metadata_list,
+ std::shared_ptr<parquet::WriterProperties>& writer_props) {
+ if (metadata_list.empty()) {
+ return ::arrow::Status::Invalid("No metadata to coalesce");
+ }
+
+ std::vector<std::string> values, keys;
+
+ // Read metadata from all dataset files and store AADs.
+ for (size_t i = 0; i < metadata_list.size(); i++) {
+ const auto& file_metadata = metadata_list[i];
+ keys.push_back("row_group_aad_" + std::to_string(i));
+ values.push_back(file_metadata->file_aad());
+ if (i > 0) {
+ metadata_list[0]->AppendRowGroups(*file_metadata);
+ }
+ }
+
+ // Create a new FileMetadata object with the created AADs as
key_value_metadata.
+ auto fmd_builder =
+ parquet::FileMetaDataBuilder::Make(metadata_list[0]->schema(),
writer_props);
+ const std::shared_ptr<const KeyValueMetadata> file_aad_metadata =
+ ::arrow::key_value_metadata(keys, values);
+ auto metadata = fmd_builder->Finish(file_aad_metadata);
+ metadata->AppendRowGroups(*metadata_list[0]);
Review Comment:
@ggershinsky As proposed I now decrypt all footers and then coalesce them
into a single footer. As decrypting data files at read times requires AADs I
also store those into `key_value_metadata` with `row_group_aad_{i}` keys. Does
this seem reasonable design?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]