[GitHub] [arrow-datafusion] mustafasrepo commented on pull request #6526: Add support for appending data to external tables - CSV

via GitHub Tue, 06 Jun 2023 01:35:06 -0700


mustafasrepo commented on PR #6526:
URL: 
https://github.com/apache/arrow-datafusion/pull/6526#issuecomment-1578189210


   > Had a quick review, I think this is probably fine, but type-erasing the 
writer mode seems a little peculiar to me.
   > 
   > This is because each of the methods has rather different characteristics, 
and imo warrants writing in a different manner.
   > 
   > ## Put
   > The write is completely synchronous (it is writing to memory) and is then 
atomically flushed, with no need for abort behaviour or async write. All file 
formats can support this mode
   > 
   > ## Put Multipart
   > The write is async with a final atomic close. Requires custom abort logic. 
All file formats can support this mode
   > 
   > ## Append
   > Abort is fatal (not even entirely sure how to surface this), only 
supported by row oriented file formats. Even then requires custom handling for 
things like CSV headers, etc...
   > 
   > ## Proposal
   > I guess my proposal would be to simply add a match block within `DataSink` 
for each of the various `FileWriterMode`. Over time I expect we will be able to 
extract common logic for each of the variants, e.g. a generic `Put` version 
using a `RecordBatchWriter`, etc... but I'm not sure that trying to unify all 
the writer modes is a good abstraction and certainly at this stage where we 
only have one impl seems a touch premature perhaps?
   
   You proposal makes sense to me. I have removed `FileWriterFactory`. Now 
writer is created in the `CsvSink`. I in the future need for `trait` arises we 
can do so. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mustafasrepo commented on pull request #6526: Add support for appending data to external tables - CSV

Reply via email to