[
https://issues.apache.org/jira/browse/IMPALA-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805753#comment-17805753
]
ASF subversion and git services commented on IMPALA-12356:
----------------------------------------------------------
Commit 32b29ff36fb3e05fd620a6714de88805052d0117 in impala's branch
refs/heads/master from Venu Reddy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=32b29ff36 ]
IMPALA-12356: Fix first ALTER_PARTITION event from Hive could be
treated as self event
Self event check for add partition event is done only for the
transactional tables with IMPALA-10502 (commit id: 7f7a631). But
during addition of new partition(with insert statement), catalog
service id and version number are added to partition params of the
parition irrespective of whether the table is transactional or not.
Thus the version number is added to partition's inFlightEvents_ and
remained in it until the next alter partition event from hive. Thus
led to detection of the alter partition event as self event.
This commit ensures the catalog service id and version number are not
added to partition params if the partition is added to a
non-transactional table.
Also fixed another bug in reload event. Reload event self check
fails due to the above fix as it expects catalog service id and
version number in the partition params. Fixed to use last refreshed
event id to skip the self reload events.
Testing:
- Manually tested in cluster and added testcases
Change-Id: I23c2affa3fe32c0b3843bff5e4c0018dce9060d3
Reviewed-on: http://gerrit.cloudera.org:8080/20486
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Partition created by INSERT will make the next ALTER_PARTITION event on it
> always treated as self-event
> -------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-12356
> URL: https://issues.apache.org/jira/browse/IMPALA-12356
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Venugopal Reddy K
> Priority: Critical
> Labels: ramp-up
>
> In Impala, create a partitioned table and create one partition in it using
> {*}INSERT{*}:
> {noformat}
> create table my_part (i int) partitioned by (p int) stored as parquet;
> insert into my_part partition(p=0) values (0),(1),(2);
> show partitions my_part
> +-------+-------+--------+------+--------------+-------------------+---------+-------------------+---------------------------------------------------+-----------+
> | p | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format
> | Incremental stats | Location | EC
> Policy |
> +-------+-------+--------+------+--------------+-------------------+---------+-------------------+---------------------------------------------------+-----------+
> | 0 | -1 | 1 | 358B | NOT CACHED | NOT CACHED | PARQUET
> | false | hdfs://localhost:20500/test-warehouse/my_part/p=0 |
> NONE |
> | Total | -1 | 1 | 358B | 0B | |
> | | |
> |
> +-------+-------+--------+------+--------------+-------------------+---------+-------------------+---------------------------------------------------+-----------+
> {noformat}
> In Hive, describe the partition. We can see parameters of
> "impala.events.catalogServiceId" and "impala.events.catalogVersion" added by
> Impala. This is ok.
> {noformat}
> hive> desc formatted my_part partition(p=0);
> +-----------------------------------+----------------------------------------------------+-----------------------------------+
> | col_name | data_type
> | comment |
> +-----------------------------------+----------------------------------------------------+-----------------------------------+
> | i | int
> | |
> | | NULL
> | NULL |
> | # Partition Information | NULL
> | NULL |
> | # col_name | data_type
> | comment |
> | p | int
> | |
> | | NULL
> | NULL |
> | # Detailed Partition Information | NULL
> | NULL |
> | Partition Value: | [0]
> | NULL |
> | Database: | default
> | NULL |
> | Table: | my_part
> | NULL |
> | CreateTime: | Wed Aug 09 15:24:50 CST 2023
> | NULL |
> | LastAccessTime: | UNKNOWN
> | NULL |
> | Location: |
> hdfs://localhost:20500/test-warehouse/my_part/p=0 | NULL
> |
> | Partition Parameters: | NULL
> | NULL |
> | | impala.events.catalogServiceId
> | eab33ebb8a14cfd:8b2bdc12df3568df |
> | | impala.events.catalogVersion
> | 1882 |
> | | numFiles
> | 1 |
> | | totalSize
> | 358 |
> | | transient_lastDdlTime
> | 1691565890 |
> | | NULL
> | NULL |
> | # Storage Information | NULL
> | NULL |
> | SerDe Library: |
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL
> |
> | InputFormat: |
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | NULL
> |
> | OutputFormat: |
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL
> |
> | Compressed: | No
> | NULL |
> | Num Buckets: | 0
> | NULL |
> | Bucket Columns: | []
> | NULL |
> | Sort Columns: | []
> | NULL |
> +-----------------------------------+----------------------------------------------------+-----------------------------------+
> {noformat}
> Now run an ALTER statement on the partition in Hive, e.g. changing the
> location:
> {code:sql}
> alter table my_part partition(p=0) set location '/tmp';{code}
> Impala will skip the ALTER_PARTITION event since it's considered as a
> self-event. In catalogd logs:
> {noformat}
> I0809 15:30:19.628449 29844 MetastoreEvents.java:628] EventId: 8351549
> EventType: ALTER_PARTITION Incremented events skipped counter to 12
> I0809 15:30:19.628616 29844 MetastoreEvents.java:628] EventId: 8351549
> EventType: ALTER_PARTITION Not processing the event as it is a
> self-event{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]