Yes, seems like it is possible to create files with different block sizes. We could potentially pass the configured store.parquet.block-size to the create call. I will try it out and see. will let you know.
Thanks, Padma > On Mar 22, 2017, at 4:16 PM, François Méthot <[email protected]> wrote: > > Here are 2 links I could find: > > http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long) > > http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long) > > Francois > > On Wed, Mar 22, 2017 at 4:29 PM, Padma Penumarthy <[email protected]> > wrote: > >> I think we create one file for each parquet block. >> If underlying HDFS block size is 128 MB and parquet block size is > >> 128MB, >> it will create more blocks on HDFS. >> Can you let me know what is the HDFS API that would allow you to >> do otherwise ? >> >> Thanks, >> Padma >> >> >>> On Mar 22, 2017, at 11:54 AM, François Méthot <[email protected]> >> wrote: >>> >>> Hi, >>> >>> Is there a way to force Drill to store CTAS generated parquet file as a >>> single block when using HDFS? Java HDFS API allows to do that, files >> could >>> be created with the Parquet block-size. >>> >>> We are using Drill on hdfs configured with block size of 128MB. Changing >>> this size is not an option at this point. >>> >>> It would be ideal for us to have single parquet file per hdfs block, >> setting >>> store.parquet.block-size to 128MB would fix our issue but we end up with >> a >>> lot more files to deal with. >>> >>> Thanks >>> Francois >> >>
