> On Nov. 25, 2014, 1:23 a.m., Aman Sinha wrote: > > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java, > > lines 298-300 > > <https://reviews.apache.org/r/28417/diff/3/?file=775081#file775081line298> > > > > This is not necessarily true; if you have empty tables, the rowcount > > will be 0. So you need to distinguish between the case where the stats are > > not available (maybe use -1 as an indicator) from the case where it is > > available and has 0 rowcount. > > abdelhakim deneche wrote: > The problem is that when numRows=0 in the stats can actually mean the > stats have not been computed yet! so we still need to estimate the row count > using the size of the input splits. > I made some tests using empty tables, and the estimated row count for > those tables is 0 too, so it's correct.
in hive 0.13 numRows will contain -1 when the stats were never computed. - abdelhakim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28417/#review62916 ----------------------------------------------------------- On Nov. 25, 2014, 7:46 p.m., abdelhakim deneche wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28417/ > ----------------------------------------------------------- > > (Updated Nov. 25, 2014, 7:46 p.m.) > > > Review request for drill. > > > Bugs: DRILL-1742 > https://issues.apache.org/jira/browse/DRILL-1742 > > > Repository: drill-git > > > Description > ------- > > HiveScan.getSplits() already gets the table and partitions metadata using > MetaStoreUtils. > We compute the total number of rows using the numRows property and store the > computed number of rows in rowCount attribute which is later returned by > getScanStats(). > > > Diffs > ----- > > > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java > ddbc100 > > Diff: https://reviews.apache.org/r/28417/diff/ > > > Testing > ------- > > created several partitioned and non-partitioned tables, loaded data in hive. > > used explain plan to check the number of rows when the whole table is queried > and also when specific partitions are queried (to make sure the row count > takes hive partition pruning into account) > > > Thanks, > > abdelhakim deneche > >
