Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Jinfeng Ni Mon, 28 Sep 2015 14:22:11 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100855
-----------------------------------------------------------



I mainly looked through the change in planning side. You may have another 
person to look at the execution side change.

The planning side overall looks good to me. Have couple of comments below.


contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
 (line 89)
<https://reviews.apache.org/r/38796/#comment158101>

    You may consider put the checking of option when adding this rule into 
Drill's rule set. See [1].
    
    Doing this will save the overhead to matching this rule.
    
    [1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java#L229



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
 (line 102)
<https://reviews.apache.org/r/38796/#comment158102>

    partitions.size() ==0 would be covered by line 114 (the block 107-112 will 
be skipped)?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
 (line 128)
<https://reviews.apache.org/r/38796/#comment158104>

    the format seems to expect 3 inputs; is "e" unused?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
 (line 176)
<https://reviews.apache.org/r/38796/#comment158108>

    I feel the block of checking "*" seems unnecesary. If I understand 
correctly, Hive table is regarded as table with schema. Therefore, we should 
not see "*" in hiveScanRel.getRowType; * column should have been expanded into 
the list of columns in Hive table.



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
 (line 195)
<https://reviews.apache.org/r/38796/#comment158110>

    Seems like during the convertion, you only maintain the column names. What 
about the column type? Do we need to maintain the column type as well? 
DrillFixedRelDataTypeImpl assume every column has "any" type; meaning the 
column type in hive table is lost?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
 (line 199)
<https://reviews.apache.org/r/38796/#comment158111>

    I feel probably there is a bug in existing code HiveScan.getColumn() should 
not contain "*", if Drill treats Hive table as schemaed table.



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
 (line 259)
<https://reviews.apache.org/r/38796/#comment158123>

    Again, if the column type is maintained during the conversion, seems it's 
necessary to cast for every column; if the hive type happens to be same as 
parquet type, we do not have to cast.


- Jinfeng Ni


On Sept. 27, 2015, 7:50 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 27, 2015, 7:50 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
>  11c6455 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
>  PRE-CREATION 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
>  PRE-CREATION 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java
>  PRE-CREATION 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
>  PRE-CREATION 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java
>  9ada569 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java
>  23aa37f 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java
>  2181c2a 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java
>  b459ee4 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java
>  f0b4bdc 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java
>  6423a36 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java
>  9211af6 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java
>  6118be5 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java
>  34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 0f6a5bb 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  118f7ad 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and 
> partition pruning. Manually tested with Hive tables containing large amount 
> of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Reply via email to