[
https://issues.apache.org/jira/browse/IMPALA-12277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869995#comment-17869995
]
ASF subversion and git services commented on IMPALA-12277:
----------------------------------------------------------
Commit f5a15ad6055dc4cd1baae2a11c56a969c5d30a31 in impala's branch
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f5a15ad60 ]
IMPALA-12277: Fix NullPointerException for partitioned inserts when
partition list is stale
When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.
Currently in impala, we always load the partitions from cached table.
This can lead to data inconsistency issue especially in the case
of event processor being turned off or lagged behind. This patch
address this issue by always verify the target partitions from HMS
without loading the file metadata from HMS regardless of state of event
processor. This approach will ensure that partition metadata is
always consistent with metastore.
The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).
Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.
Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Reviewed-on: http://gerrit.cloudera.org:8080/21437
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> metadata reload of INSERT failed by NullPointerException: Invalid partition
> name
> --------------------------------------------------------------------------------
>
> Key: IMPALA-12277
> URL: https://issues.apache.org/jira/browse/IMPALA-12277
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Sai Hemanth Gantasala
> Priority: Major
>
> INSERT into a partition that exists in catalogd but doesn't exist in HMS will
> fail in metadata reloading on the partition. The cause is that updateCatalog
> doesn't create the partition in HMS (since catalogd is not aware of the
> non-existence of the partition in HMS):
> [https://github.com/apache/impala/blob/d0fe4c604f72d41019832513ebf65cfe8f469953/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L6697-L6699]
> When reloading the partition, catalogd first removes it since it doesn't
> exist in HMS:
> [https://github.com/apache/impala/blob/d0fe4c604f72d41019832513ebf65cfe8f469953/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1530-L1531]
> It then try to reload it, which hits NullPointerException at:
> [https://github.com/apache/impala/blob/d0fe4c604f72d41019832513ebf65cfe8f469953/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1566]
> To reproduce the issue, launch Impala with event processing disabled so
> catalogd can be unsynced with HMS. Create a partitioned table in Impala with
> one partition:
> {noformat}
> bin/start-impala-cluster.py --catalogd_args=--hms_event_polling_interval_s=0
> impala> create table my_part2 (id int) partitioned by (p int) stored as
> textfile;
> impala> insert into my_part2 partition(p=0) values (0); {noformat}
> Drop the partition in Hive:
> {code:sql}
> hive> alter table my_part2 drop partition (p=0);{code}
> Then insert the partition again in Impala
> {code:sql}
> impala> insert into my_part2 partition(p=0) values (1);
> ERROR: TableLoadingException: Failed to load metadata for table:
> default.my_part2
> CAUSED BY: NullPointerException: Invalid partition name: p=0
> {code}
> The exception:
> {noformat}
> E0710 19:34:43.569339 4413 JniUtil.java:183]
> bb4452d18eafe116:eaf16c4000000000] Error in Update catalog for
> default.my_part2. Time spent: 1s186ms
> I0710 19:34:43.569918 4413 jni-util.cc:288]
> bb4452d18eafe116:eaf16c4000000000]
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for
> table: default.my_part2
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1308)
> at
> org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:1521)
> at
> org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:6863)
> at
> org.apache.impala.service.JniCatalog.lambda$updateCatalog$16(JniCatalog.java:471)
> at
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
> at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
> at
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:230)
> at
> org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:470)
> Caused by: java.lang.NullPointerException: Invalid partition name: p=0
> at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:907)
> at
> org.apache.impala.catalog.HdfsTable.getPartitionsForNames(HdfsTable.java:1766)
> at
> org.apache.impala.catalog.HdfsTable$PartitionDeltaUpdater.apply(HdfsTable.java:1566)
> at
> org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(HdfsTable.java:1447)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1282)
> ... 9 more{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]