[ 
https://issues.apache.org/jira/browse/DRILL-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225231#comment-14225231
 ] 

Deneche A. Hakim commented on DRILL-1742:
-----------------------------------------

Sure. Here are the tests I did on my laptop using a local installation of 
hadoop-0.20.2 and hive-0.12.0.

1. When a table is just created and no rows are added to it, the "numRows" 
property isn't available. In this case HiveScan.getScanStats() uses the size of 
the input splits to compute an estimated number of rows and finds 0. So the 
estimated number of rows is correct.

2
a) adding rows to the table using "LOAD DATA ..." does add a "numRows" property 
to the table (and it's partitions if available), but it's value is still 0. 
HiveScan.getScanStats() uses the size of the input splits to estimate the 
number of rows, the estimation isn't accurate but it's better than the value in 
the stats.
b) running "ANALYZE TABLE table_name COMPUTE STATISTICS" in hive updates the 
"numRows" property with the correct number of rows. This time 
HiveScan.getScanStats() uses this value rather than estimating one using the 
size of the input splits.

3. When the table has partitions, "numRows" is computed and available for each 
parition. HiveScan correctly computes the reduced row count when some of the 
partitions are pruned.

The only limitation is that HiveScan.getScanStats() assumes that when the 
statistics are available for a table, they are up to date. This may require the 
user to manually call "analyze ... compute statistics".

> Use Hive stats when planning queries on Hive data sources
> ---------------------------------------------------------
>
>                 Key: DRILL-1742
>                 URL: https://issues.apache.org/jira/browse/DRILL-1742
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization, Storage - Hive
>    Affects Versions: 0.6.0
>            Reporter: Venki Korukanti
>            Assignee: Deneche A. Hakim
>             Fix For: 0.7.0
>
>         Attachments: DRILL-1742.1.patch.txt, DRILL-1742.2.patch.txt, 
> DRILL-1742.3.patch.txt, DRILL-1742.4.patch.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to