alamb commented on issue #9493: URL: https://github.com/apache/datafusion/issues/9493#issuecomment-3368113345
Hi @yeya24 -- I don't think there is any progress on this issue yet. However, in recent days I have found myself writing a parallel parquet writer as well several times Depending on how much time I get, I may end up reworking the internals of the ParquetSink so they are easier to reuse / not tied to the datafusion internals as much > We are looking for something similar to be able to reuse the parallel parquet writer. Our use case is to just parallelize single parquet file writing, not writing to multiple files/partitions. It would be nice to reuse ParquetSink so we don't have to build the same thing ourselves. FWIW I built something like this in tpchgen-rs which might be helpful: https://github.com/clflushopt/tpchgen-rs/blob/482ee68404d61a5308ca1fd0095b347b3831a5b4/tpchgen-cli/src/parquet.rs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
