ggershinsky commented on code in PR #41821:
URL: https://github.com/apache/arrow/pull/41821#discussion_r1818960363


##########
cpp/src/parquet/metadata.cc:
##########
@@ -1078,6 +1096,36 @@ void FileMetaData::WriteTo(::arrow::io::OutputStream* 
dst,
   return impl_->WriteTo(dst, encryptor);
 }
 
+::arrow::Result<std::shared_ptr<parquet::FileMetaData>> 
FileMetaData::CoalesceMetadata(
+    std::vector<std::shared_ptr<parquet::FileMetaData>>& metadata_list,
+    std::shared_ptr<parquet::WriterProperties>& writer_props) {
+  if (metadata_list.empty()) {
+    return ::arrow::Status::Invalid("No metadata to coalesce");
+  }
+
+  std::vector<std::string> values, keys;
+
+  // Read metadata from all dataset files and store AADs.
+  for (size_t i = 0; i < metadata_list.size(); i++) {
+    const auto& file_metadata = metadata_list[i];
+    keys.push_back("row_group_aad_" + std::to_string(i));
+    values.push_back(file_metadata->file_aad());
+    if (i > 0) {
+      metadata_list[0]->AppendRowGroups(*file_metadata);
+    }
+  }
+
+  // Create a new FileMetadata object with the created AADs as 
key_value_metadata.
+  auto fmd_builder =
+      parquet::FileMetaDataBuilder::Make(metadata_list[0]->schema(), 
writer_props);
+  const std::shared_ptr<const KeyValueMetadata> file_aad_metadata =
+      ::arrow::key_value_metadata(keys, values);
+  auto metadata = fmd_builder->Finish(file_aad_metadata);
+  metadata->AppendRowGroups(*metadata_list[0]);

Review Comment:
   @rok sorry for the delay, I've been away for a while. 
   > store those into key_value_metadata with row_group_aad_{i} keys.
   
   Do we need to store the aad_prefixes ? Once a footer of a parquet file is 
decrypted, the file key and aad_prefix can be dropped. Aad_prefix is a 
user-provided unique ID of a file. So we can(*) generate a new one for the new 
_metadata file that keeps the coalesced footer. Also, it'd be good to generate 
a new key. Then the coalesced footer can be encrypted in the _metadata file.
   
   (*) it's possible to write/encrypt the _metadata file without a new 
aad_prefix, if the user app level doesn't check the file id. You can simply 
pass a null pointer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to