etseidl commented on code in PR #6637:
URL: https://github.com/apache/arrow-rs/pull/6637#discussion_r1955020840
##########
parquet/src/arrow/async_reader/store.rs:
##########
@@ -163,15 +164,29 @@ impl AsyncFileReader for ParquetObjectReader {
// an `impl MetadataFetch` and calls those methods to get data from it.
Due to `Self`'s impl of
// `AsyncFileReader`, the calls to `MetadataFetch::fetch` are just
delegated to
// `Self::get_bytes`.
- fn get_metadata(&mut self) -> BoxFuture<'_, Result<Arc<ParquetMetaData>>> {
+ fn get_metadata<'a>(
+ &'a mut self,
+ #[cfg(feature = "encryption")] file_decryption_properties: Option<
+ &'a FileDecryptionProperties,
+ >,
+ ) -> BoxFuture<'a, Result<Arc<ParquetMetaData>>> {
Box::pin(async move {
let file_size = self.meta.size;
+ #[cfg(not(feature = "encryption"))]
Review Comment:
To reduce duplication, maybe instead do something like
```rust
let metadata = ParquetMetaDataReader::new()
.with_column_indexes(self.preload_column_index)
.with_offset_indexes(self.preload_offset_index)
.with_prefetch_hint(self.metadata_size_hint);
#[cfg(feature = "encryption")]
let metadata =
metadata.with_decryption_properties(file_decryption_properties);
let metadata = metadata.load_and_finish(self, file_size).await?;
```
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -532,6 +554,13 @@ impl<T: ChunkReader + 'static>
ParquetRecordBatchReaderBuilder<T> {
Ok(Self::new_with_metadata(reader, metadata))
}
+ /// Create a new [`ParquetRecordBatchReaderBuilder`] with
[`ArrowReaderOptions`] and [`FileDecryptionProperties`]
+ #[cfg(feature = "encryption")]
+ pub fn try_new_with_decryption(reader: T, options: ArrowReaderOptions) ->
Result<Self> {
Review Comment:
Is this needed? It seems to be identical to `try_new_with_options`.
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -788,6 +854,22 @@ impl ParquetRecordBatchReader {
.build()
}
+ /// Create a new [`ParquetRecordBatchReader`] from the provided chunk
reader and [`FileDecryptionProperties`]
+ ///
+ /// Note: this is needed when the parquet file is encrypted
+ #[cfg(feature = "encryption")]
+ pub fn try_new_with_decryption<T: ChunkReader + 'static>(
+ reader: T,
+ batch_size: usize,
+ file_decryption_properties: Option<&FileDecryptionProperties>,
+ ) -> Result<Self> {
+ let options = ArrowReaderOptions::default()
+
.with_file_decryption_properties(file_decryption_properties.cloned().unwrap());
+ ParquetRecordBatchReaderBuilder::try_new_with_decryption(reader,
options)?
Review Comment:
Same...this could just be `try_new_with_options`
##########
parquet/src/arrow/async_reader/mod.rs:
##########
@@ -343,8 +386,8 @@ impl<T: AsyncFileReader + Send + 'static>
ParquetRecordBatchStreamBuilder<T> {
Self::new_with_options(input, Default::default()).await
}
- /// Create a new [`ParquetRecordBatchStreamBuilder`] with the provided
async source
- /// and [`ArrowReaderOptions`]
+ /// Create a new [`ParquetRecordBatchStreamBuilder`] with the provided
async source,
+ /// [`ArrowReaderOptions`] and [`FileDecryptionProperties`] if the data is
encrypted.
Review Comment:
I'd revert to the old wording, but add a following sentence along the lines
of "If the data is encrypted, ArrowReaderOptions should have the
FileDecryptionProperties set".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]