tustvold commented on code in PR #4269:
URL: https://github.com/apache/arrow-rs/pull/4269#discussion_r1202761362


##########
parquet/src/file/writer.rs:
##########
@@ -475,28 +464,107 @@ impl<'a, W: Write> SerializedRowGroupWriter<'a, W> {
 
             Ok(())
         };
+        (self.buf, Box::new(on_close))
+    }
 
-        let column = self.descr.column(self.column_index);
-        self.column_index += 1;
-
-        Ok(Some(factory(
-            column,
-            &self.props,
-            page_writer,
-            Box::new(on_close),
-        )?))
+    /// Returns the next column writer, if available, using the factory 
function;
+    /// otherwise returns `None`.
+    pub(crate) fn next_column_with_factory<'b, F, C>(
+        &'b mut self,
+        factory: F,
+    ) -> Result<Option<C>>
+    where
+        F: FnOnce(
+            ColumnDescPtr,
+            WriterPropertiesPtr,
+            Box<dyn PageWriter + 'b>,
+            OnCloseColumnChunk<'b>,
+        ) -> Result<C>,
+    {
+        self.assert_previous_writer_closed()?;
+        Ok(match self.next_column_desc() {
+            Some(column) => {
+                let props = self.props.clone();
+                let (buf, on_close) = self.get_on_close();
+                let page_writer = Box::new(SerializedPageWriter::new(buf));
+                Some(factory(column, props, page_writer, Box::new(on_close))?)
+            }
+            None => None,
+        })
     }
 
     /// Returns the next column writer, if available; otherwise returns `None`.
     /// In case of any IO error or Thrift error, or if row group writer has 
already been
     /// closed returns `Err`.
     pub fn next_column(&mut self) -> 
Result<Option<SerializedColumnWriter<'_>>> {
         self.next_column_with_factory(|descr, props, page_writer, on_close| {
-            let column_writer = get_column_writer(descr, props.clone(), 
page_writer);
+            let column_writer = get_column_writer(descr, props, page_writer);
             Ok(SerializedColumnWriter::new(column_writer, Some(on_close)))
         })
     }
 
+    /// Append a column chunk from another source without decoding it
+    ///
+    /// This can be used for efficiently concatenating or projecting parquet 
data,
+    /// or encoding parquet data to temporary in-memory buffers
+    pub fn splice_column<R: ChunkReader>(

Review Comment:
   It is perhaps worth highlighting that if the reader doesn't correspond to 
ColumnCloseResult the resulting parquet file will contain gibberish. Ultimately 
there is no way to prevent this, after all if the user really wanted to they 
could just write whatever they felt like to the underlying file anyway, and so 
I don't think this is actually an issue. The onus is ultimately on the 
read-side to tolerate broken files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to