[GitHub] [arrow-rs] jmfiaschi commented on a change in pull request #1468: feat(557): append row groups to already exist parquet file

GitBox Fri, 25 Mar 2022 13:06:45 -0700


jmfiaschi commented on a change in pull request #1468:
URL: https://github.com/apache/arrow-rs/pull/1468#discussion_r835578005




##########
File path: parquet/src/arrow/arrow_writer.rs
##########
@@ -198,6 +201,43 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
     }
 }
 
+impl<W: 'static + ParquetWriter> ArrowWriter<W> {
+    /// Try to create a new Arrow writer to append data to an existing parquet 
file without read the entiery file.

Review comment:
       Oky, the name `from_chunk` is confusing. With this library it's not 
possible to do chunk file because the cursor W need to have the entirery file 
data in memory in order to work with the metadata (metadata contain the last 
position of the last row group). I find a documentation that the term chunk 
with parquet is more when you have multi file in a directory and one file is 
considered as a chunk. Is beter to use `from_file` in order to avoid any 
missunderstanding ^^.
   
   Yes, `SerializedFileWriter::from_chunk` read from `SliceableCursor` and 
write into `InMemoryWriteableCursor` for example. `SerializedFileWriter::new` 
write directly into `InMemoryWriteableCursor`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] jmfiaschi commented on a change in pull request #1468: feat(557): append row groups to already exist parquet file

Reply via email to