[
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061462#comment-16061462
]
Hive QA commented on HIVE-16937:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874300/HIVE-16937.1.patch
{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10845 tests
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
(batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
(batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[jdbc_handler]
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
(batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16]
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23]
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94]
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24]
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
(batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
(batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
(batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
(batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
(batchId=178)
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5753/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5753/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5753/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12874300 - PreCommit-HIVE-Build
> INFORMATION_SCHEMA usability: everything is currently a string
> --------------------------------------------------------------
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Carter Shanklin
> Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
> `cs_id` string COMMENT 'from deserializer',
> `db_name` string COMMENT 'from deserializer',
> `table_name` string COMMENT 'from deserializer',
> `column_name` string COMMENT 'from deserializer',
> `column_type` string COMMENT 'from deserializer',
> `tbl_id` string COMMENT 'from deserializer',
> `long_low_value` string COMMENT 'from deserializer',
> `long_high_value` string COMMENT 'from deserializer',
> `double_high_value` string COMMENT 'from deserializer',
> `double_low_value` string COMMENT 'from deserializer',
> `big_decimal_low_value` string COMMENT 'from deserializer',
> `big_decimal_high_value` string COMMENT 'from deserializer',
> `num_nulls` string COMMENT 'from deserializer',
> `num_distincts` string COMMENT 'from deserializer',
> `avg_col_len` string COMMENT 'from deserializer',
> `max_col_len` string COMMENT 'from deserializer',
> `num_trues` string COMMENT 'from deserializer',
> `num_falses` string COMMENT 'from deserializer',
> `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
> 'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most
> distinct values.
> {code}
> select
> db_name, table_name, column_name
> from
> sys.tab_col_stats
> where
> num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what
> you really want.
> It would be better to use numeric types where appropriate such as all the
> numbers in tab_col_stats, and most likely bigints should be used for stats
> like # rows, etc.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)