[GitHub] drill issue #826: DRILL-5379: Set Hdfs Block Size based on Parquet Block Siz...

kkhatua Wed, 28 Jun 2017 18:26:21 -0700

Github user kkhatua commented on the issue:

    https://github.com/apache/drill/pull/826
  
    @ppadma , Khurram [~khfaraaz] and I were looking at the details in the PR 
and it's not very clear what new behavior does the PR allow. If you need to 
specify the block-size as described in the [comment 
](https://issues.apache.org/jira/browse/DRILL-5379?focusedCommentId=15981366&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15981366)by
 @fmethot , doesn't Drill already do that? I thought Drill implicitly creates 
files with a single row-group anyway. 
    
    My understanding of the JIRA's problem statement was that if the Parquet 
block-size (i.e. the rowgroup size) is set to a large value that exceeds the 
HDFS block size, using the flag would allow Drill to ignore the larger value in 
the options and write with a parquet-blocksize that matches the target HDFS 
location. So, I could have {{store.parquet.block-size=1073741824}} (i.e. 1GB), 
but when writing an output worth 512MB, instead of 1 file... Drill would read 
the HDFS block-size (say 128GB) and apply that as the parquet-block-size and 
write 4 files. 
    
    @fmethot is that what you were looking for? An **automatic scaling down** 
of the parquet file's size to match (and be contained within) the HDFS block 
size?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #826: DRILL-5379: Set Hdfs Block Size based on Parquet Block Siz...

Reply via email to