Yes, seems like it is possible to create files with different block sizes.
We could potentially pass the configured store.parquet.block-size to the create 
call.
I will try it out and see. will let you know.

Thanks,
Padma 


> On Mar 22, 2017, at 4:16 PM, François Méthot <[email protected]> wrote:
> 
> Here are 2 links I could find:
> 
> http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)
> 
> http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)
> 
> Francois
> 
> On Wed, Mar 22, 2017 at 4:29 PM, Padma Penumarthy <[email protected]>
> wrote:
> 
>> I think we create one file for each parquet block.
>> If underlying HDFS block size is 128 MB and parquet block size  is  >
>> 128MB,
>> it will create more blocks on HDFS.
>> Can you let me know what is the HDFS API that would allow you to
>> do otherwise ?
>> 
>> Thanks,
>> Padma
>> 
>> 
>>> On Mar 22, 2017, at 11:54 AM, François Méthot <[email protected]>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Is there a way to force Drill to store CTAS generated parquet file as a
>>> single block when using HDFS? Java HDFS API allows to do that, files
>> could
>>> be created with the Parquet block-size.
>>> 
>>> We are using Drill on hdfs configured with block size of 128MB. Changing
>>> this size is not an option at this point.
>>> 
>>> It would be ideal for us to have single parquet file per hdfs block,
>> setting
>>> store.parquet.block-size to 128MB would fix our issue but we end up with
>> a
>>> lot more files to deal with.
>>> 
>>> Thanks
>>> Francois
>> 
>> 

Reply via email to