> On Sept. 16, 2015, 9:26 p.m., Sergey Shelukhin wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java, > > line 317 > > <https://reviews.apache.org/r/38429/diff/1/?file=1075548#file1075548line317> > > > > that may have perf impact... why not change the queries on metastore > > instead
ColStatistics in hiveColStats returned from StatsUtils.getTableColumnStats(hiveTblMetadata, hiveNonPartitionCols, nonPartColNamesThatRqrStats) (line 300) might not match the colNames in nonPartColNamesThatRqrStats. The order of column names in nonPartColNamesThatRqrStats is sometimes more likely the order of columns specified in the table DDL, and could neither be ascending nor descending, so we could not leverage sql or jdo query in HMS to do this ordering. We can enhance the implementation for APIs like "public List<ColumnStatisticsObj> getTableColumnStatistics(String dbName, String tableName, List<String> colNames)" to make them to return the ColumnStatisticsObj in List<ColumnStatisticsObj> to match the column name in passed-in List<String> colNames by post-ordering the sql/jdo returns. But in the cases where some columns do not have returned stats, we are still not able to match the returned stats to the requested columns. For simplicity, I left this ordering burden to the API cl ient whoever cares about the order like that in RelOptHiveTable.updateColStats(..). Without this re-order, the line 366 code (hiveColStatsMap.put(nonPartColIndxsThatRqrStats.get(i), hiveColStats.get(i));) is not right. Does it make sense? > On Sept. 16, 2015, 9:26 p.m., Sergey Shelukhin wrote: > > metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java, > > line 61 > > <https://reviews.apache.org/r/38429/diff/1/?file=1075546#file1075546line61> > > > > what if some code uses this? seems like not setting it is better, the > > bugs would be more obvious. > > Or for now you could set it from table/db fields but deprecate the > > getter... as is it's not proper deprecation cause it may break things. Yes, good point. I removed the getter methods for db/table/partition names in MTableColumnStatistics/MPartitionColumnStatistics. I still set these columns to value "Deprecated" in case some one goes to peek at these tables in backend DB and is confused by the possible mismatched names. I can not set them to null (or not setting them) since they could not be nullable as specified in their table DDL. - Chaoyu ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38429/#review99307 ----------------------------------------------------------- On Sept. 16, 2015, 12:37 p.m., Chaoyu Tang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/38429/ > ----------------------------------------------------------- > > (Updated Sept. 16, 2015, 12:37 p.m.) > > > Review request for hive, Ashutosh Chauhan, Sergey Shelukhin, and Xuefu Zhang. > > > Bugs: HIVE-11786 > https://issues.apache.org/jira/browse/HIVE-11786 > > > Repository: hive-git > > > Description > ------- > > The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns > such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have > foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. But > these columns are currently used in fetching column stats (e.g. > getTableStats/getPartitionStats) so any Hive operation involved in > db/table/partition name change has to update these columnn, which is not > necessary and sometimes quite difficult in implementation given the > limitations from DN and RawStore APIs. > This patch is to remove the use of these redundant columns at HMS code level. > The changes include: > 1. Instead of directly using these columns in TAB_COL_STATS, PART_COL_STATS, > use these in their referenced tables. > 2. currently the CBO code assumes that the column stats returned from HMS are > in the same order as that passed in column request. It is not gurantteed and > has been changed. > 3. The deprecated redundant columns are now temorarily populated with value > "Deprecated". They will be removed in a followed up JIRA. > > > Diffs > ----- > > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > 1f89b7c > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java > 4d6bfcc > > metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java > b3ceff1 > > metastore/src/test/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java > 7e46523 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java > 6c0bd25 > > Diff: https://reviews.apache.org/r/38429/diff/ > > > Testing > ------- > > 1. Manually tested some cases against MySQL/PostgreSQL/Oracle. > 2. Is running precommit test. > > > Thanks, > > Chaoyu Tang > >