> On Sept. 16, 2015, 9:26 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java,
> >  line 317
> > <https://reviews.apache.org/r/38429/diff/1/?file=1075548#file1075548line317>
> >
> >     that may have perf impact... why not change the queries on metastore 
> > instead

ColStatistics in hiveColStats returned from 
StatsUtils.getTableColumnStats(hiveTblMetadata, hiveNonPartitionCols, 
nonPartColNamesThatRqrStats) (line 300) might not match the colNames in 
nonPartColNamesThatRqrStats. The order of column names in 
nonPartColNamesThatRqrStats is sometimes more likely the order of columns 
specified in the table DDL, and could neither be ascending nor descending, so 
we could not leverage sql or jdo query in HMS to do this ordering. We can 
enhance the implementation for APIs like "public List<ColumnStatisticsObj> 
getTableColumnStatistics(String dbName, String tableName, List<String> 
colNames)" to make them to return the ColumnStatisticsObj in 
List<ColumnStatisticsObj> to match the column name in passed-in List<String> 
colNames by post-ordering the sql/jdo returns. But in the cases where some 
columns do not have returned stats, we are still not able to match the returned 
stats to the requested columns. For simplicity, I left this ordering burden to 
the API cl
 ient whoever cares about the order like that in 
RelOptHiveTable.updateColStats(..). Without this re-order, the line 366 code 
(hiveColStatsMap.put(nonPartColIndxsThatRqrStats.get(i), hiveColStats.get(i));) 
is not right. Does it make sense?


> On Sept. 16, 2015, 9:26 p.m., Sergey Shelukhin wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java,
> >  line 61
> > <https://reviews.apache.org/r/38429/diff/1/?file=1075546#file1075546line61>
> >
> >     what if some code uses this? seems like not setting it is better, the 
> > bugs would be more obvious.
> >     Or for now you could set it from table/db fields but deprecate the 
> > getter... as is it's not proper deprecation cause it may break things.

Yes, good point. I removed the getter methods for db/table/partition names in 
MTableColumnStatistics/MPartitionColumnStatistics. I still set these columns to 
value "Deprecated" in case some one goes to peek at these tables in backend DB 
and is confused by the possible mismatched names. I can not set them to null 
(or not setting them) since they could not be nullable as specified in their 
table DDL.


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38429/#review99307
-----------------------------------------------------------


On Sept. 16, 2015, 12:37 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38429/
> -----------------------------------------------------------
> 
> (Updated Sept. 16, 2015, 12:37 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Sergey Shelukhin, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-11786
>     https://issues.apache.org/jira/browse/HIVE-11786
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns 
> such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have 
> foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. But 
> these columns are currently used in fetching column stats (e.g. 
> getTableStats/getPartitionStats) so any Hive operation involved in 
> db/table/partition name change has to update these columnn, which is not 
> necessary and sometimes quite difficult in implementation given the 
> limitations from DN and RawStore APIs.
> This patch is to remove the use of these redundant columns at HMS code level. 
> The changes include:
> 1. Instead of directly using these columns in TAB_COL_STATS, PART_COL_STATS, 
> use these in their referenced tables.
> 2. currently the CBO code assumes that the column stats returned from HMS are 
> in the same order as that passed in column request. It is not gurantteed and 
> has been changed.
> 3. The deprecated redundant columns are now temorarily populated with value 
> "Deprecated". They will be removed in a followed up JIRA.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> 1f89b7c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 4d6bfcc 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 
> b3ceff1 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java 
> 7e46523 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java 
> 6c0bd25 
> 
> Diff: https://reviews.apache.org/r/38429/diff/
> 
> 
> Testing
> -------
> 
> 1. Manually tested some cases against MySQL/PostgreSQL/Oracle.
> 2. Is running precommit test.
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>

Reply via email to