[
https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468274#comment-16468274
]
slim bouguerra commented on HIVE-19474:
---------------------------------------
Results of the posted ddls
{code}
PREHOOK: query: create table test_base_table(`timecolumn` timestamp,
`interval_marker` string, `num_l` DECIMAL(10,2))
PREHOOK: type: CREATETABLE
PREHOOK: Output: database:default
PREHOOK: Output: default@test_base_table
POSTHOOK: query: create table test_base_table(`timecolumn` timestamp,
`interval_marker` string, `num_l` DECIMAL(10,2))
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: database:default
POSTHOOK: Output: default@test_base_table
PREHOOK: query: insert into test_base_table values ('2015-03-08 00:00:00',
'i1-start', 4.5)
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
PREHOOK: Output: default@test_base_table
POSTHOOK: query: insert into test_base_table values ('2015-03-08 00:00:00',
'i1-start', 4.5)
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: Output: default@test_base_table
POSTHOOK: Lineage: test_base_table.interval_marker SCRIPT []
POSTHOOK: Lineage: test_base_table.num_l SCRIPT []
POSTHOOK: Lineage: test_base_table.timecolumn SCRIPT []
PREHOOK: query: CREATE TABLE druid_test_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`,
`interval_marker`, cast(`num_l` as double)
FROM test_base_table
PREHOOK: type: CREATETABLE_AS_SELECT
PREHOOK: Input: default@test_base_table
PREHOOK: Output: database:default
PREHOOK: Output: default@druid_test_table
POSTHOOK: query: CREATE TABLE druid_test_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`,
`interval_marker`, cast(`num_l` as double)
FROM test_base_table
POSTHOOK: type: CREATETABLE_AS_SELECT
POSTHOOK: Input: default@test_base_table
POSTHOOK: Output: database:default
POSTHOOK: Output: default@druid_test_table
POSTHOOK: Lineage: druid_test_table.__time EXPRESSION
[(test_base_table)test_base_table.FieldSchema(name:timecolumn, type:timestamp,
comment:null), ]
POSTHOOK: Lineage: druid_test_table.interval_marker SIMPLE
[(test_base_table)test_base_table.FieldSchema(name:interval_marker,
type:string, comment:null), ]
POSTHOOK: Lineage: druid_test_table.num_l EXPRESSION
[(test_base_table)test_base_table.FieldSchema(name:num_l, type:decimal(10,2),
comment:null), ]
PREHOOK: query: describe druid_test_table
PREHOOK: type: DESCTABLE
PREHOOK: Input: default@druid_test_table
POSTHOOK: query: describe druid_test_table
POSTHOOK: type: DESCTABLE
POSTHOOK: Input: default@druid_test_table
__time timestamp with local time zone from deserializer
interval_marker string from deserializer
num_l double from deserializer
PREHOOK: query: explain select sum(num_l), min(num_l) FROM druid_test_table
PREHOOK: type: QUERY
POSTHOOK: query: explain select sum(num_l), min(num_l) FROM druid_test_table
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: druid_test_table
properties:
druid.fieldNames $f0,$f1
druid.fieldTypes double,double
druid.query.json
\{"queryType":"timeseries","dataSource":"default.druid_test_table","descending":false,"granularity":"all","aggregations":[{"type":"doubleSum","name":"$f0","fieldName":"num_l"},\{"type":"doubleMin","name":"$f1","fieldName":"num_l"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"],"context":\{"skipEmptyBuckets":true}}
druid.query.type timeseries
Select Operator
expressions: $f0 (type: double), $f1 (type: double)
outputColumnNames: _col0, _col1
ListSink
PREHOOK: query: CREATE TABLE druid_test_table_2
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`,
`interval_marker`, `num_l`
FROM test_base_table
PREHOOK: type: CREATETABLE_AS_SELECT
PREHOOK: Input: default@test_base_table
PREHOOK: Output: database:default
PREHOOK: Output: default@druid_test_table_2
POSTHOOK: query: CREATE TABLE druid_test_table_2
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`,
`interval_marker`, `num_l`
FROM test_base_table
POSTHOOK: type: CREATETABLE_AS_SELECT
POSTHOOK: Input: default@test_base_table
POSTHOOK: Output: database:default
POSTHOOK: Output: default@druid_test_table_2
POSTHOOK: Lineage: druid_test_table_2.__time EXPRESSION
[(test_base_table)test_base_table.FieldSchema(name:timecolumn, type:timestamp,
comment:null), ]
POSTHOOK: Lineage: druid_test_table_2.interval_marker SIMPLE
[(test_base_table)test_base_table.FieldSchema(name:interval_marker,
type:string, comment:null), ]
POSTHOOK: Lineage: druid_test_table_2.num_l SIMPLE
[(test_base_table)test_base_table.FieldSchema(name:num_l, type:decimal(10,2),
comment:null), ]
PREHOOK: query: describe druid_test_table_2
PREHOOK: type: DESCTABLE
PREHOOK: Input: default@druid_test_table_2
POSTHOOK: query: describe druid_test_table_2
POSTHOOK: type: DESCTABLE
POSTHOOK: Input: default@druid_test_table_2
__time timestamp with local time zone from deserializer
interval_marker string from deserializer
num_l decimal(10,2) from deserializer
PREHOOK: query: explain select sum(num_l), min(num_l) FROM druid_test_table_2
PREHOOK: type: QUERY
POSTHOOK: query: explain select sum(num_l), min(num_l) FROM druid_test_table_2
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: druid_test_table_2
properties:
druid.fieldNames __time,interval_marker,num_l
druid.fieldTypes timestamp with local time zone,string,decimal(10,2)
druid.query.json
\{"queryType":"scan","dataSource":"default.druid_test_table_2","intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"],"columns":["__time","interval_marker","num_l"],"resultFormat":"compactedList"}
druid.query.type scan
Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: num_l (type: decimal(10,2))
outputColumnNames: num_l
Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(num_l), min(num_l)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: decimal(20,2)), _col1 (type: decimal(10,2))
Execution mode: llap
LLAP IO: no inputs
Reducer 2
Execution mode: llap
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0), min(VALUE._col1)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
> Decimal type should be casted as part of the CTAS or INSERT Clause.
> -------------------------------------------------------------------
>
> Key: HIVE-19474
> URL: https://issues.apache.org/jira/browse/HIVE-19474
> Project: Hive
> Issue Type: Bug
> Reporter: slim bouguerra
> Assignee: slim bouguerra
> Priority: Major
>
> HIVE-18569 introduced a runtime config variable to allow the indexing of
> Decimal as Double, this leads to kind of messy state, Hive metadata think the
> column is still decimal while it is stored as double. Since the Hive metadata
> of the column is Decimal the logical optimizer will not push down aggregates.
> i tried to fix this by adding some logic to the application but it makes the
> code very clumsy with lot of branches. Instead i propose to revert this patch
> and let the user introduce an explicit cast this will be better since the
> metada reflects actual storage type and push down aggregates will kick in and
> there is no config needed.
> cc [~ashutoshc] and [~nishantbangarwa]
> You can see the difference with the following DDL
> {code}
> create table test_base_table(`timecolumn` timestamp, `interval_marker`
> string, `num_l` DECIMAL(10,2));
> insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5);
> set hive.druid.approx.result=true;
> CREATE TABLE druid_test_table
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.segment.granularity" = "DAY")
> AS
> select cast(`timecolumn` as timestamp with local time zone) as `__time`,
> `interval_marker`, cast(`num_l` as double)
> FROM test_base_table;
> describe druid_test_table;
> explain select sum(num_l), min(num_l) FROM druid_test_table;
> CREATE TABLE druid_test_table_2
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.segment.granularity" = "DAY")
> AS
> select cast(`timecolumn` as timestamp with local time zone) as `__time`,
> `interval_marker`, `num_l`
> FROM test_base_table;
> describe druid_test_table_2;
> explain select sum(num_l), min(num_l) FROM druid_test_table_2;
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)