Here's my comment and how I'm generating 128 meg parquet files. This takes 
into account file sizes after compression and dictionary encoding.

https://issues.apache.org/jira/browse/ARROW-3728?focusedCommentId=16703544&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16703544

Would be nice to have a merge() parquet file function that does something 
similar to create parquet files which match HDFS block sizes.


-----Original Message-----
From: Jiayuan Chen <[email protected]> 
Sent: Monday, December 10, 2018 2:30 PM
To: [email protected]
Subject: parquet-arrow estimate file size

External Email: Use caution with links and attachments


Hello,

I am a Parquet developer in the Bay Area, and I am writing this email to seek 
precious help on writing Parquet file from Arrow.

My goal is to control the size (in bytes) of the output Parquet file when 
writing from existing arrow table. I saw a reply in 2017 on this StackOverflow 
post (
https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_45572962_how-2Dcan-2Di-2Dwrite-2Dstreaming-2Drow-2Doriented-2Ddata-2Dusing-2Dparquet-2Dcpp-2Dwithout-2Dbuffering&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=Xc94mwZKuRfKH1rBeBcZvo7wtImfqsvAjDalN4JxsOA&s=209MSzgWa7GsPhLJgGsYhcHCoTC59R4ksjIOYqklNPs&e=)
and wondering if the following implementation is currently possible: Feed data 
into the Arrow table, until at a point that the buffered data can be converted 
to a Parquet file (e.g. of size 256 MB, instead of a fix number of rows), and 
then use WriteTable() to create such Parquet file.

I saw that parquet-cpp recently introduced API to control the column writer's 
size in bytes in the low-level API, but seems this is still not yet available 
for the arrow-parquet API. Would this be in the roadmap?

Thanks,
Jiayuan


This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
information.  Please refer to 
http://www.blackrock.com/corporate/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2018 BlackRock, Inc. All rights reserved.

Reply via email to