[
https://issues.apache.org/jira/browse/IMPALA-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631366#comment-17631366
]
ASF subversion and git services commented on IMPALA-11711:
----------------------------------------------------------
Commit f617e3648734ffaff655382f911d256424bcda7b in impala's branch
refs/heads/master from LPL
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f617e3648 ]
IMPALA-11711: Virtual columns should be skipped in
'FileMetadataUtils::AddIcebergColumns'
In the 'FileMetadataUtils::AddIcebergColumns' method, when the slot is
a virtual column, it should be skipped directly. That may affect that
when we query the Iceberg v2 table (the first column is a partition
column of bool type), wrong position-delete result is given.
Testing:
- Add e2e tests
- Locally tested the result of The Position-based Iceberg tables
Change-Id: I58faf3df6ae8a5bcabb1d2ac9f11a6fbcd74bc24
Reviewed-on: http://gerrit.cloudera.org:8080/19223
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Wrong results, when query the iceberg v2 table (the first column is the
> partition column of bool type)
> ------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-11711
> URL: https://issues.apache.org/jira/browse/IMPALA-11711
> Project: IMPALA
> Issue Type: Bug
> Reporter: LiPenglin
> Assignee: LiPenglin
> Priority: Major
> Labels: impala-iceberg
>
> Query: describe formatted ice_v2_partitioned_position_deletes
> +-----------------------------------+----------------------------------------------------------------------+------------------------------------------------------+
> | name | type
> | comment
> |
> +-----------------------------------+----------------------------------------------------------------------+------------------------------------------------------+
> | # col_name | data_type
> | comment
> |
> | | NULL
> | NULL
> |
> | col_boolean | boolean
> | NULL
> |
> | col_int | int
> | NULL
> |
> | col_long | bigint
> | NULL
> |
> | col_float | float
> | NULL
> |
> | col_double | double
> | NULL
> |
> | col_decimal_9 | decimal(2,1)
> | NULL
> |
> | col_decimal_18 | decimal(13,3)
> | NULL
> |
> | col_decimal_38 | decimal(22,3)
> | NULL
> |
> | col_date | date
> | NULL
> |
> | col_timestamp | timestamp
> | NULL
> |
> | col_string | string
> | NULL
> |
> | | NULL
> | NULL
> |
> | # Partition Transform Information | NULL
> | NULL
> |
> | # col_name | transform_type
> | NULL
> |
> | | NULL
> | NULL
> |
> | col_boolean | IDENTITY
> | NULL
> |
> | | NULL
> | NULL
>
> The correct result should be:
> select * from ice_v2_partitioned_position_deletes;
> +-------------+---------+----------+---------------+------------+---------------+----------------+-------------------------+------------+---------------------+------------+
> | col_boolean | col_int | col_long | col_float | col_double |
> col_decimal_9 | col_decimal_18 | col_decimal_38 | col_date |
> col_timestamp | col_string |
> +-------------+---------+----------+---------------+------------+---------------+----------------+-------------------------+------------+---------------------+------------+
> | false | 0 | 123 | 11.0100002289 | 110.001 | 1.1
> | 1234567891.321 | 1234567890987654321.123 | 2001-01-01 | 2001-01-01
> 01:01:00 | aaa |
> | false | 1 | 123 | 11.0100002289 | 110.001 | 1.1
> | 1234567891.321 | 1234567890987654321.123 | 2001-01-02 | 2001-01-02
> 01:01:00 | aaa |
> | false | 2 | 123 | 11.0100002289 | 110.001 | 1.1
> | 1234567891.321 | 1234567890987654321.123 | 2001-01-03 | 2001-01-03
> 01:01:00 | aaa |
>
> But actually gave wrong results:
> +-------------+---------+----------+---------------+------------+---------------+----------------+-------------------------+------------+---------------------+------------+
> | col_boolean | col_int | col_long | col_float | col_double |
> col_decimal_9 | col_decimal_18 | col_decimal_38 | col_date |
> col_timestamp | col_string |
> +-------------+---------+----------+---------------+------------+---------------+----------------+-------------------------+------------+---------------------+------------+
> | false | 0 | 123 | 10.0100002289 | 100.001 | 1.0
> | 1234567890.321 | 1234567890987654320.123 | 2001-01-01 | 2001-01-01
> 00:01:00 | aaa |
> | false | 0 | 123 | 11.0100002289 | 110.001 | 1.1
> | 1234567891.321 | 1234567890987654321.123 | 2001-01-01 | 2001-01-01
> 01:01:00 | aaa |
> | false | 1 | 123 | 10.0100002289 | 100.001 | 1.0
> | 1234567890.321 | 1234567890987654320.123 | 2001-01-02 | 2001-01-02
> 00:01:00 | aaa |
> | false | 1 | 123 | 11.0100002289 | 110.001 | 1.1
> | 1234567891.321 | 1234567890987654321.123 | 2001-01-02 | 2001-01-02
> 01:01:00 | aaa |
> | false | 2 | 123 | 10.0100002289 | 100.001 | 1.0
> | 1234567890.321 | 1234567890987654320.123 | 2001-01-03 | 2001-01-03
> 00:01:00 | aaa |
> | false | 2 | 123 | 11.0100002289 | 110.001 | 1.1
> | 1234567891.321 | 1234567890987654321.123 | 2001-01-03 | 2001-01-03
> 01:01:00 | aaa |
> +-------------+---------+----------+---------------+------------+---------------+----------------+-------------------------+------------+---------------------+------------+
> WARNINGS: Could not parse partition value for column 'col_boolean' in file
> 'hdfs://localhost:20500/ice_v2_partitioned_position_deletes/data/col_boolean=false/00001-1-0eac5f52-629c-46a1-baa2-258aad366df5-00001.parquet'.
> Partition string is 'false' NULL Partition key value is
> '__HIVE_DEFAULT_PARTITION__'
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]