Sergio Peña created HIVE-11763:
----------------------------------
Summary: Use * instead of sum(hash(*)) on Parquet predicate (PPD)
integration tests
Key: HIVE-11763
URL: https://issues.apache.org/jira/browse/HIVE-11763
Project: Hive
Issue Type: Sub-task
Reporter: Sergio Peña
The integration tests for Parquet predicate push down (PPD) use the following
query to validate the values filtered:
{noformat}
select sum(hash(*)) from ...
{noformat}
It would be better if we use {{select * from ...}} instead to see that those
values are correct. It is difficult to see if a value was filtered by seeing
the hash.
Also, we can try to limit the number of rows of the INSERT ... SELECT statmenet
to avoid displaying many rows when validating the data. I think a LIMIT 2 on
each of the SELECT.
For example, the parquet_ppd_boolean.ppd has this:
{noformat}
insert overwrite table newtypestbl select * from (select cast("apple" as
char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 union all
select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22, false
from src src2) uniontbl;
{noformat}
If we use LIMIT 2, then we will reduce the # of rows:
{noformat}
insert overwrite table newtypestbl select * from (select cast("apple" as
char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 LIMIT 2 union
all select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22,
false from src src2 LIMIT 2) uniontbl;
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)