I think we create one file for each parquet block. If underlying HDFS block size is 128 MB and parquet block size is > 128MB, it will create more blocks on HDFS. Can you let me know what is the HDFS API that would allow you to do otherwise ?
Thanks, Padma > On Mar 22, 2017, at 11:54 AM, François Méthot <fmetho...@gmail.com> wrote: > > Hi, > > Is there a way to force Drill to store CTAS generated parquet file as a > single block when using HDFS? Java HDFS API allows to do that, files could > be created with the Parquet block-size. > > We are using Drill on hdfs configured with block size of 128MB. Changing > this size is not an option at this point. > > It would be ideal for us to have single parquet file per hdfs block, setting > store.parquet.block-size to 128MB would fix our issue but we end up with a > lot more files to deal with. > > Thanks > Francois