Carter Shanklin created HIVE-14053:
--------------------------------------
Summary: Hive should report that primary keys can't be null.
Key: HIVE-14053
URL: https://issues.apache.org/jira/browse/HIVE-14053
Project: Hive
Issue Type: Bug
Reporter: Carter Shanklin
HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. With
the right driver in place, tools like Tableau can do join elimination and
queries can run much faster.
Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't
work quite right for keys. In particular, primary keys by definition are not
null and the metadata should reflect this for improved join elimination.
In this example that uses the TPC-H schema and its constraints, we sum
l_extendedprice and group by l_shipmode. This query should not use more than
just the lineitem table.
With all the constraints in place, Tableau generates this query:
{code}
SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` =
`orders`.`o_orderkey`)
JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` =
`customer`.`c_custkey`)
JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` =
`nation`.`n_nationkey`)
WHERE ((((NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT
(`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS NULL))
AND (NOT (`lineitem`.`l_suppkey` IS NULL)))) AND (NOT (`nation`.`n_regionkey`
IS NULL)))
{code}
Since these are the primary keys the denormalization and the where condition is
unnecessary and this sort of query can be a lot faster by just accessing the
lineitem table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)