[
https://issues.apache.org/jira/browse/IMPALA-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584819#comment-17584819
]
ASF subversion and git services commented on IMPALA-11344:
----------------------------------------------------------
Commit 44dc157a2c10578b82518012aa2e9aa9288dc6e5 in impala's branch
refs/heads/branch-4.1.1 from ttttttz
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=44dc157a2 ]
IMPALA-11344: Missing slots in all cases should be allowed to be read
When selecting only the missing fields of ORC files and the missing fields
contain non-partition fields, the query will fail due to `Parse error in
possibly corrupt ORC file: '$filename'. No columns found for this scan`.
We should allow read missing slots in all cases.
Testing:
- Added a test to test_scanners.py that ensures the query can be
executed successfully when selecting only the missing fields of
ORC files.
Change-Id: I15dca47ba5f7a93bfd5fcba3cab4ac6d64459023
Reviewed-on: http://gerrit.cloudera.org:8080/18652
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-on: http://gerrit.cloudera.org:8080/18907
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
> Selecting only the missing fields of ORC files should return NULLs
> ------------------------------------------------------------------
>
> Key: IMPALA-11344
> URL: https://issues.apache.org/jira/browse/IMPALA-11344
> Project: IMPALA
> Issue Type: Bug
> Reporter: Quanlong Huang
> Assignee: zhi tang
> Priority: Critical
> Labels: newbie, ramp-up
> Fix For: Impala 4.2.0
>
>
> While looking into the bug of IMPALA-11296, I found a bug on the same
> scenario (scanning only the missing columns of ORC files) in current master
> branch.
> Creating an ORC table with missing fields in the underlying files:
> {code:sql}
> hive> create external table missing_field_orc (f0 int) stored as orc;
> hive> insert into table missing_field_orc select 1;
> hive> alter table missing_field_orc add columns (f1 int);
> hive> select f1 from missing_field_orc;
> +-------+
> | f1 |
> +-------+
> | NULL |
> +-------+
> hive> select f0, f1 from missing_field_orc;
> +-----+-------+
> | f0 | f1 |
> +-----+-------+
> | 1 | NULL |
> +-----+-------+
> {code}
> Run the same queries in Impala:
> {code:sql}
> impala> VERSION;
> Shell version: impala shell build version not available
> Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build
> 7273cfdfb901b9ef564c2737cf00c7a8abb57f07)
> impala> invalidate metadata missing_field_orc;
> impala> select f1 from missing_field_orc;
> ERROR: Parse error in possibly corrupt ORC file:
> 'hdfs://localhost:20500/test-warehouse/missing_field_orc/000000_0'. No
> columns found for this scan.
> impala> select f0, f1 from missing_field_orc;
> +----+------+
> | f0 | f1 |
> +----+------+
> | 1 | NULL |
> +----+------+
> {code}
> While selecting only the column 'f1', the query failed by an error. It should
> return NULL.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]