Hello, I am a Parquet developer in the Bay Area, and I am writing this email to seek precious help on writing Parquet file from Arrow.
My goal is to control the size (in bytes) of the output Parquet file when writing from existing arrow table. I saw a reply in 2017 on this StackOverflow post ( https://stackoverflow.com/questions/45572962/how-can-i-write-streaming-row-oriented-data-using-parquet-cpp-without-buffering) and wondering if the following implementation is currently possible: Feed data into the Arrow table, until at a point that the buffered data can be converted to a Parquet file (e.g. of size 256 MB, instead of a fix number of rows), and then use WriteTable() to create such Parquet file. I saw that parquet-cpp recently introduced API to control the column writer's size in bytes in the low-level API, but seems this is still not yet available for the arrow-parquet API. Would this be in the roadmap? Thanks, Jiayuan
