alamb commented on code in PR #1998:
URL: https://github.com/apache/arrow-rs/pull/1998#discussion_r915163698


##########
parquet/src/arrow/arrow_reader.rs:
##########
@@ -90,6 +121,20 @@ impl ArrowReaderOptions {
     pub fn with_skip_arrow_metadata(self, skip_arrow_metadata: bool) -> Self {
         Self {
             skip_arrow_metadata,
+            ..self
+        }
+    }
+
+    /// Scan rows from the parquet file according to the provided `selection`
+    ///
+    /// TODO: Make public once row selection fully implemented

Review Comment:
   perhaps worth a ticket?



##########
parquet/src/arrow/array_reader/byte_array.rs:
##########
@@ -210,6 +214,10 @@ impl<I: OffsetSizeTrait + ScalarValue> ColumnValueDecoder
 
         decoder.read(out, range.end - range.start, self.dict.as_ref())
     }
+
+    fn skip_values(&mut self, _num_values: usize) -> Result<usize> {
+        todo!()

Review Comment:
   I think adding a ticket reference here like
   `unimplemented!("See https://github.com/apache/arrow-rs/.....";)` would help 
future readers
   
   Bonus points for returning `ArrowError::Unimplemented`
   
   This comment applies to everything below as well



##########
parquet/src/file/serialized_reader.rs:
##########
@@ -555,6 +555,14 @@ impl<T: Read + Send> PageReader for 
SerializedPageReader<T> {
         // We are at the end of this column chunk and no more page left. 
Return None.
         Ok(None)
     }
+
+    fn peek_next_page(&self) -> Result<Option<PageMetadata>> {
+        todo!()

Review Comment:
   ditto returning "not yet implemented" would probably be nicer



##########
parquet/src/arrow/record_reader/definition_levels.rs:
##########
@@ -146,15 +146,15 @@ impl LevelsBufferSlice for DefinitionLevelBuffer {
     }
 }
 
-pub struct DefinitionLevelDecoder {
+pub struct DefinitionLevelBufferDecoder {

Review Comment:
   I this rename a public API change as well? It does not appear in the docs
   
   https://docs.rs/parquet/17.0.0/parquet/?search=DefinitionLevelDecoder
   
   



##########
parquet/src/arrow/arrow_reader.rs:
##########
@@ -70,9 +71,39 @@ pub trait ArrowReader {
     ) -> Result<Self::RecordReader>;
 }
 
+/// [`RowSelection`] allows selecting or skipping a provided number of rows
+/// when scanning the parquet file
+#[derive(Debug, Clone, Copy)]
+pub(crate) struct RowSelection {

Review Comment:
   You probably already have thought about this, but I would expect that in 
certain scenarios, non contiguous rows / skips would be desired
   
   Like "fetch the first 100 rows, skip the next 200, and then fetch the 
remaining"
   
   Would this interface handle that case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to