LouisClt opened a new issue, #14211: URL: https://github.com/apache/arrow/issues/14211
Hello, I use Arrow as an intermediate format before writing Parquet or ORC. However, in order to reduce the memory fottprint (as the tables can be quite big), I need to write both formats little by little. For the parquet export, I manager to do so by using the "WriteColumnChunk" method. However, for the ORC format, there is no such thing, the only method available is "Write(table)" which writes a whole table into the file. That forces me to allocate a whole Arrow table in memory beforehand, which I can't do. So is there a plan to support this in the future ? I looked at the code, and it seems to be pretty easy to make a method like that, since the "write" method already write little chunks internally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
