[GitHub] [arrow-datafusion] alamb commented on issue #5383: The output of write_csv and write_json methods is confusing.

via GitHub Sun, 26 Feb 2023 04:21:27 -0800


alamb commented on issue #5383:
URL: 
https://github.com/apache/arrow-datafusion/issues/5383#issuecomment-1445347526


   > Another thought that comes to mind regarding this, is if this is done then 
could have cases where the parts written out aren't in sequential/increasing 
order, which could cause confusion as well. e.g. if parts 2 and 4 are the only 
with data then only those will appear on the filesystem like:
   
   This is an excellent point @Jefffrey  
   
   > Am not sure which is more desirable, having 'gaps' in the parts written, 
vs. having empty parts. Or somehow only write the parts with data first (which 
would break the parallel behaviour of the writes? unless force repartition).
   
   I agree that I don't know what is better. I don't really use the DataFrame 
API and so I don't know if the "write multiple files" is an important feature 
or if it was just the most straightforward initial implementation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #5383: The output of write_csv and write_json methods is confusing.

Reply via email to