alamb commented on issue #5383: URL: https://github.com/apache/arrow-datafusion/issues/5383#issuecomment-1445347526
> Another thought that comes to mind regarding this, is if this is done then could have cases where the parts written out aren't in sequential/increasing order, which could cause confusion as well. e.g. if parts 2 and 4 are the only with data then only those will appear on the filesystem like: This is an excellent point @Jefffrey > Am not sure which is more desirable, having 'gaps' in the parts written, vs. having empty parts. Or somehow only write the parts with data first (which would break the parallel behaviour of the writes? unless force repartition). I agree that I don't know what is better. I don't really use the DataFrame API and so I don't know if the "write multiple files" is an important feature or if it was just the most straightforward initial implementation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
