[ https://issues.apache.org/jira/browse/DRILL-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225231#comment-14225231 ]
Deneche A. Hakim commented on DRILL-1742: ----------------------------------------- Sure. Here are the tests I did on my laptop using a local installation of hadoop-0.20.2 and hive-0.12.0. 1. When a table is just created and no rows are added to it, the "numRows" property isn't available. In this case HiveScan.getScanStats() uses the size of the input splits to compute an estimated number of rows and finds 0. So the estimated number of rows is correct. 2 a) adding rows to the table using "LOAD DATA ..." does add a "numRows" property to the table (and it's partitions if available), but it's value is still 0. HiveScan.getScanStats() uses the size of the input splits to estimate the number of rows, the estimation isn't accurate but it's better than the value in the stats. b) running "ANALYZE TABLE table_name COMPUTE STATISTICS" in hive updates the "numRows" property with the correct number of rows. This time HiveScan.getScanStats() uses this value rather than estimating one using the size of the input splits. 3. When the table has partitions, "numRows" is computed and available for each parition. HiveScan correctly computes the reduced row count when some of the partitions are pruned. The only limitation is that HiveScan.getScanStats() assumes that when the statistics are available for a table, they are up to date. This may require the user to manually call "analyze ... compute statistics". > Use Hive stats when planning queries on Hive data sources > --------------------------------------------------------- > > Key: DRILL-1742 > URL: https://issues.apache.org/jira/browse/DRILL-1742 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, Storage - Hive > Affects Versions: 0.6.0 > Reporter: Venki Korukanti > Assignee: Deneche A. Hakim > Fix For: 0.7.0 > > Attachments: DRILL-1742.1.patch.txt, DRILL-1742.2.patch.txt, > DRILL-1742.3.patch.txt, DRILL-1742.4.patch.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)