westonpace commented on code in PR #34616:
URL: https://github.com/apache/arrow/pull/34616#discussion_r1261979449


##########
cpp/src/arrow/dataset/file_parquet.h:
##########
@@ -136,6 +138,20 @@ class ARROW_DS_EXPORT ParquetFileFormat : public 
FileFormat {
       fs::FileLocator destination_locator) const override;
 
   std::shared_ptr<FileWriteOptions> DefaultWriteOptions() override;
+
+  /// \brief A getter function to retrieve the dataset encryption configuration
+  std::shared_ptr<DatasetEncryptionConfiguration> GetDatasetEncryptionConfig() 
const {
+    return dataset_encryption_config_;
+  }
+  /// \brief A setter for DatasetEncryptionConfiguration
+  void SetDatasetEncryptionConfig(
+      std::shared_ptr<DatasetEncryptionConfiguration> 
dataset_encryption_config) {
+    dataset_encryption_config_ = std::move(dataset_encryption_config);
+  }
+
+ private:
+  // A configuration structure that provides per file encryption properties 
for a dataset
+  std::shared_ptr<DatasetEncryptionConfiguration> dataset_encryption_config_ = 
NULLPTR;

Review Comment:
   The user could (ideally) specify the write-options on each call to 
write_dataset this way:
   
   ```
   write_opts = ds.ParquetFileFormat().make_write_options(encryption=de)
   ds.write_dataset(..., file_options=write_opts, ...)
   ```
   
   This will require some additional code 
[here](https://github.com/apache/arrow/blob/apache-arrow-12.0.1/python/pyarrow/_dataset_parquet.pyx#L529-L581).
  These write options are then eventually accessible 
[here](https://github.com/apache/arrow/pull/34616/files#diff-c49cc60c1aa3a00a84b72c89ce2c8f5eb5d034ed8383741def1c4ae377d62854R655)
 (as `parquet_options`) and so I think you can even simplify that logic.
   
   It looks like we don't have any real way for the user to set the default 
write options today.  We could probably add logic in `_dataset_parquet.pyx` in 
`ParquetFileFormat.make_write_options` to check and see if the default read 
options have an encryption config and, if they do, to use that.  Longer term it 
might make more sense to have a `default_write_options` which is the same as 
the `default_fragment_scan_options`.  For this PR I think it would be ok if 
there was no default and the user has to supply `file_options` to each call to 
`write_dataset` if they want to use encryption?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to