[
https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958304#comment-15958304
]
Yibing Shi commented on HIVE-16291:
-----------------------------------
[~aihuaxu]
Sorry for the delay! I was totally stuck in other problems and didn't get a
chance to check this.
I submitted my patch trying to minimize the scope of my changes (touch as less
line as possible). Yes, I agree that the logic is a bit confusing. Your
suggestions look great! I have a slightly modified version as below. How do you
think?
{code}
String newConfStr = null;
for (String s : Arrays.asList(id, old)) {
if (org.apache.commons.lang.StringUtils.isNotBlank(s)) {
newConfStr = newConfStr == null ? s : newConfStr +
StringUtils.COMMA_STR + s;
}
}
{code}
> Hive fails when unions a parquet table with itself
> --------------------------------------------------
>
> Key: HIVE-16291
> URL: https://issues.apache.org/jira/browse/HIVE-16291
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Yibing Shi
> Assignee: Yibing Shi
> Attachments: HIVE-16291.1.patch
>
>
> Reproduce commands:
> {code:sql}
> create table tst_unin (col1 int) partitioned by (p_tdate int) stored as
> parquet;
> insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310);
> insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410);
> select count(*) from (select tst_unin.p_tdate from tst_unin where
> tst_unin.col1=20160302 union all select tst_unin.p_tdate from tst_unin) t1;
> {code}
> The table is stored in Parquet format, which is a columnar file format. Hive
> tries to push the query predicates to the table scan operators so that only
> the needed columns are read. This is done by adding the needed column IDs
> into job configuration with property "hive.io.file.readcolumn.ids".
> In above case, the query unions the result of 2 subqueries, which select data
> from one same table. The first subquery doesn't need any column from Parquet
> file, while the second subquery needs a column "col1". Hive has a bug here,
> it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which
> method ColumnProjectionUtils.getReadColumnIDs cannot parse.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)