Qifan Chen created HIVE-24885: --------------------------------- Summary: The state of unset low or high value in LongColumnStatsData can not be retrieved Key: HIVE-24885 URL: https://issues.apache.org/jira/browse/HIVE-24885 Project: Hive Issue Type: Improvement Components: API Environment: // Some comments here public String getFoo() { return foo; } Reporter: Qifan Chen
During the work to improve Impala column stats to compute min/max for columns, it is found that the state of unset low or high value in LongColumnStatsData can not be retrieved back. This is illustrated in the following Impala test case added to MetastoreEventsProcessorTest. /** * Unset the low and the high value first and then check. */ @Test public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException { try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) { List<String> colNames = new ArrayList<String>(); colNames.add("id"); colNames.add("int_col"); colNames.add("bigint_col"); List<ColumnStatisticsObj> colStatsObjs = msClient.getHiveClient().getTableColumnStatistics( "unique_database", "alltypes", colNames, "impala"); for (ColumnStatisticsObj colStatsObj : colStatsObjs) { ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); LongColumnStatsData longColStatsData = colStatsData.getLongStats(); longColStatsData.unsetLowValue(); longColStatsData.unsetHighValue(); colStatsData.setLongStats(longColStatsData); } assertTrue("All good!", true); colStatsObjs = msClient.getHiveClient().getTableColumnStatistics( "unique_database", "alltypes", colNames, "impala"); for (ColumnStatisticsObj colStatsObj : colStatsObjs) { ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); LongColumnStatsData longColStatsData = colStatsData.getLongStats(); assertFalse("isSetLowValue() should be false", longColStatsData.isSetLowValue()); assertFalse( "isSetHighValue() should be false", longColStatsData.isSetHighValue()); } assertTrue("All good!", true); } catch (NoSuchObjectException e) { assertFalse(String.format("No such object exception: %s", e), false); } catch (MetaException e) { assertFalse(String.format("Metadata exception: %s", e), false); } catch (TException e) { assertFalse(String.format("TException: %s", e), false); } } The assertion on isSetLowValue() or isSetHighValue() should be false, since longColStatsData.unsetLowValue() is called in the first loop. To build the test, mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff -Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue Table unique_database.alltypes is defined as follows with several rows. Query: show create table unique_database.alltypes +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | result | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CREATE EXTERNAL TABLE unique_database.alltypes ( | | id INT, | | bool_col BOOLEAN, | | tinyint_col TINYINT, | | smallint_col SMALLINT, | | int_col INT, | | bigint_col BIGINT, | | float_col FLOAT, | | double_col DOUBLE, | | date_string_col STRING, | | string_col STRING, | | timestamp_col TIMESTAMP, | | year INT | | ) | | PARTITIONED BY ( | | month INT | | ) | | STORED AS PARQUET | | LOCATION 'hdfs://localhost:20500/test-warehouse/unique_database.db/alltypes' | | TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'OBJCAPABILITIES'='EXTREAD,EXTWRITE', 'STATS_GENERATED'='TASK', 'external.table.purge'='TRUE', 'impala.lastComputeStatsTime'='1615492819', 'numRows'='0', 'totalSize'='0') | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ It can be built via the following in an Impala environment. create database if not exists unique_database; use unique_database; drop table if exists alltypes; CREATE TABLE alltypes partitioned by (month) STORED AS PARQUET as select * from functional_parquet.alltypes ; -- This message was sent by Atlassian Jira (v8.3.4#803005)