etseidl commented on code in PR #6637: URL: https://github.com/apache/arrow-rs/pull/6637#discussion_r1955020840
########## parquet/src/arrow/async_reader/store.rs: ########## @@ -163,15 +164,29 @@ impl AsyncFileReader for ParquetObjectReader { // an `impl MetadataFetch` and calls those methods to get data from it. Due to `Self`'s impl of // `AsyncFileReader`, the calls to `MetadataFetch::fetch` are just delegated to // `Self::get_bytes`. - fn get_metadata(&mut self) -> BoxFuture<'_, Result<Arc<ParquetMetaData>>> { + fn get_metadata<'a>( + &'a mut self, + #[cfg(feature = "encryption")] file_decryption_properties: Option< + &'a FileDecryptionProperties, + >, + ) -> BoxFuture<'a, Result<Arc<ParquetMetaData>>> { Box::pin(async move { let file_size = self.meta.size; + #[cfg(not(feature = "encryption"))] Review Comment: To reduce duplication, maybe instead do something like ```rust let metadata = ParquetMetaDataReader::new() .with_column_indexes(self.preload_column_index) .with_offset_indexes(self.preload_offset_index) .with_prefetch_hint(self.metadata_size_hint); #[cfg(feature = "encryption")] let metadata = metadata.with_decryption_properties(file_decryption_properties); let metadata = metadata.load_and_finish(self, file_size).await?; ``` ########## parquet/src/arrow/arrow_reader/mod.rs: ########## @@ -532,6 +554,13 @@ impl<T: ChunkReader + 'static> ParquetRecordBatchReaderBuilder<T> { Ok(Self::new_with_metadata(reader, metadata)) } + /// Create a new [`ParquetRecordBatchReaderBuilder`] with [`ArrowReaderOptions`] and [`FileDecryptionProperties`] + #[cfg(feature = "encryption")] + pub fn try_new_with_decryption(reader: T, options: ArrowReaderOptions) -> Result<Self> { Review Comment: Is this needed? It seems to be identical to `try_new_with_options`. ########## parquet/src/arrow/arrow_reader/mod.rs: ########## @@ -788,6 +854,22 @@ impl ParquetRecordBatchReader { .build() } + /// Create a new [`ParquetRecordBatchReader`] from the provided chunk reader and [`FileDecryptionProperties`] + /// + /// Note: this is needed when the parquet file is encrypted + #[cfg(feature = "encryption")] + pub fn try_new_with_decryption<T: ChunkReader + 'static>( + reader: T, + batch_size: usize, + file_decryption_properties: Option<&FileDecryptionProperties>, + ) -> Result<Self> { + let options = ArrowReaderOptions::default() + .with_file_decryption_properties(file_decryption_properties.cloned().unwrap()); + ParquetRecordBatchReaderBuilder::try_new_with_decryption(reader, options)? Review Comment: Same...this could just be `try_new_with_options` ########## parquet/src/arrow/async_reader/mod.rs: ########## @@ -343,8 +386,8 @@ impl<T: AsyncFileReader + Send + 'static> ParquetRecordBatchStreamBuilder<T> { Self::new_with_options(input, Default::default()).await } - /// Create a new [`ParquetRecordBatchStreamBuilder`] with the provided async source - /// and [`ArrowReaderOptions`] + /// Create a new [`ParquetRecordBatchStreamBuilder`] with the provided async source, + /// [`ArrowReaderOptions`] and [`FileDecryptionProperties`] if the data is encrypted. Review Comment: I'd revert to the old wording, but add a following sentence along the lines of "If the data is encrypted, ArrowReaderOptions should have the FileDecryptionProperties set". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org