Alexandre Fouché created HIVE-3919:
--------------------------------------
Summary: Partition query refused when PPD is disabled (due to
HIVE-3699)
Key: HIVE-3919
URL: https://issues.apache.org/jira/browse/HIVE-3919
Project: Hive
Issue Type: Bug
Components: Query Processor
Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6):
hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
Reporter: Alexandre Fouché
The query below was working perfectly when hive.optimize.ppd=true. But i had to
disable this property because my results are impacted by unresolved HIVE-3699.
But now that hive.optimize.ppd=false, the query from table1 partitionned by
field 'day' is refused, even if the query has the partition predicate.
>From HIVE-830 it says that the partition pruner depends on it, so that may
>explain why my queries are refused when Predictive Push Down is disabled.
The query is accepted and run if i set hive.mapred.mode=nonstrict. But from
Hive-server2 logs and input path lists and also processing, all dataset
regardless of partitions is deserialized !
set hive.optimize.ppd = true;
set hive.mapred.mode=strict;
FROM table1
INSERT OVERWRITE TABLE general1
SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) WHERE qs_cs_s_cat='PRINT'
AND (day>=20121015 and day<=20130103) GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
-> Parsed fine, but wrong results when i add multiple 'insert overwrite'
set hive.optimize.ppd = false;
set hive.mapred.mode=strict;
FROM table1
INSERT OVERWRITE TABLE general1
SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) WHERE qs_cs_s_cat='PRINT'
AND (day>=20121015 and day<=20130103) GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
Error: Error while processing statement: FAILED: SemanticException [Error
10041]: No partition predicate found for Alias "table1" Table "table1"
(state=42000,code=10041)
Error: Error while processing statement: FAILED: SemanticException [Error
10041]: No partition predicate found for Alias "table1" Table "table1"
(state=42000,code=10041)
The table 'table1' was created like this, it is partitionned by 'day':
CREATE TABLE table1
PARTITIONED BY (day BIGINT)
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{
"type": "record",
"name": "Log",
"fields" : [
{"name": "time_stamp", "type": ["null", "int"], "default": null},
(...)
]
}')
;
set hive.optimize.ppd = false;
set hive.mapred.mode=strict;
FROM table1
INSERT OVERWRITE TABLE general1
SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) WHERE qs_cs_s_cat='PRINT'
AND day=20121015 GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
-> FAILS TOO
set hive.optimize.ppd = false;
INSERT OVERWRITE TABLE general1
SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) FROM table1 WHERE
qs_cs_s_cat='PRINT' AND day=20121015 GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
-> FAILS TOO
set hive.optimize.ppd = false;
set hive.mapred.mode=nonstrict;
FROM table1
INSERT OVERWRITE TABLE general1
SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) WHERE qs_cs_s_cat='PRINT'
AND day=20121015 GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
-> WORKS, but from Hive-server2 logs and input path lists and also
processing, all dataset regardless of partitions is deserialized !
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira