Got it. Thanks a lot! Tianqi -----Original Message----- From: Ryan Blue [mailto:[email protected]] Sent: Monday, April 13, 2015 4:59 PM To: [email protected] Subject: Re: PARQUET_FILE_SIZE & parquet.block.size & dfs.blocksize
On 04/13/2015 03:47 PM, Tianqi Tong wrote: > Hi Ryan, > Then back to the original topic: it should be okay if I break a Parquet file > into multiple HDFS blocks, right? > Because when I was querying via Impala, there's a warning like: Parquet file > should not be split into multiple hdfs-blocks. > > Thanks! > Tianqi It is fine to write data as multiple blocks, but Impala performance will be better if you keep data in a single block for now. This is something that the Impala team is working on. rb -- Ryan Blue Software Engineer Cloudera, Inc.
