Hello,

I am a Parquet developer in the Bay Area, and I am writing this email to
seek precious help on writing Parquet file from Arrow.

My goal is to control the size (in bytes) of the output Parquet file when
writing from existing arrow table. I saw a reply in 2017 on this
StackOverflow post (
https://stackoverflow.com/questions/45572962/how-can-i-write-streaming-row-oriented-data-using-parquet-cpp-without-buffering)
and wondering if the following implementation is currently possible: Feed
data into the Arrow table, until at a point that the buffered data can be
converted to a Parquet file (e.g. of size 256 MB, instead of a fix number
of rows), and then use WriteTable() to create such Parquet file.

I saw that parquet-cpp recently introduced API to control the column
writer's size in bytes in the low-level API, but seems this is still not
yet available for the arrow-parquet API. Would this be in the roadmap?

Thanks,
Jiayuan

Reply via email to