amoeba commented on issue #40576: URL: https://github.com/apache/arrow/issues/40576#issuecomment-2010974700
Can you share some code to show how you're approaching this? The ultimate thing you're trying to do is possible and part of the design of the Datasets module, see https://arrow.apache.org/docs/python/dataset.html#writing-large-amounts-of-data. Also be aware of the considerations in https://arrow.apache.org/docs/python/dataset.html#partitioning-performance-considerations. What you most likely don't want to be doing here is converting one or more Parquet files into an Arrow Table, then converting that Arrow Table to one or more Parquet files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
