Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Venki Korukanti Tue, 29 Sep 2015 11:58:37 -0700


> On Sept. 28, 2015, 10:18 p.m., Aman Sinha wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java,
> >  line 63
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085485#file1085485line63>
> >
> >     Since the RecordCount is the same regardless of the type of the reader, 
> > we should not divide it by the factor.  Dividing the cpu cost and disk cost 
> > seems ok.
> 
> Venki Korukanti wrote:
>     If I understand correctly, we are using only the rowcount while 
> caclulating the self cost of the scan in ScanPrel.computeSelfCost. So we need 
> to alter the rowcount here.
> 
> Aman Sinha wrote:
>     True..the current cost model for Scans is computing cpuCost as a function 
> of rowCount and columnCount.  I will open an enhancement JIRA to change that 
> such that 2 different scan methods (such as Hive scan vs. Drill native scan) 
> that produce the same row count but differ in cpu cost and I/O cost can be 
> modeled accurately. 
>     
>     Given that, you don't have to change the cost here...my only other 
> suggestion would be to use a static constant as a factor: e.g 
> HIVE_COST_FACTOR (or something similar).


Added HIVE_SERDE_SCAN_OVERHEAD_FACTOR constant.


- Venki


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100878
-----------------------------------------------------------


On Sept. 29, 2015, 9:23 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 9:23 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
>  11c6455 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
>  PRE-CREATION 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
>  PRE-CREATION 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java
>  PRE-CREATION 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
>  PRE-CREATION 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java
>  9ada569 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java
>  23aa37f 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java
>  2181c2a 
>   
> contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java
>  b459ee4 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java
>  f0b4bdc 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java
>  6423a36 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java
>  9211af6 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java
>  6118be5 
>   
> contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java
>  34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 66f9f03 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  5838bd1 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and 
> partition pruning. Manually tested with Hive tables containing large amount 
> of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Reply via email to