Quanlong Huang created IMPALA-12855:
---------------------------------------
Summary: NullPointerException in firing RELOAD events if the
partition is just dropped
Key: IMPALA-12855
URL: https://issues.apache.org/jira/browse/IMPALA-12855
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
REFRESH <table> PARTITION could fail in firing RELOAD events (when
--enable_reload_events=true) if the partition is dropped by a concurrent DDL.
The failure is a NullPointerException:
{noformat}
E0229 15:04:25.578933 7381 JniUtil.java:183]
824a23c46a6f71de:78a2f3dc00000000] Error in REFRESH TABLE default.part_tbl
PARTITIONS issued by quanlong. Time spent: 1s061ms
I0229 15:04:25.579373 7381 jni-util.cc:302] 824a23c46a6f71de:78a2f3dc00000000]
java.lang.NullPointerException
at
org.apache.impala.catalog.HdfsPartition.access$500(HdfsPartition.java:101)
at
org.apache.impala.catalog.HdfsPartition$Builder.<init>(HdfsPartition.java:1314)
at
org.apache.impala.service.CatalogOpExecutor.fireReloadEventAndUpdateRefreshEventId(CatalogOpExecutor.java:6810)
at
org.apache.impala.service.CatalogOpExecutor.execResetMetadataImpl(CatalogOpExecutor.java:6744)
at
org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6612)
at
org.apache.impala.service.JniCatalog.lambda$resetMetadata$4(JniCatalog.java:327)
at
org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:243)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:257)
at
org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:326){noformat}
The problem is that in the implementation of execResetMetadataImpl(), the table
lock is not held all the way. Instead, it's held when reloading the metadata
then released, and held again when we need to fire RELOAD events. In the time
between these, the partition could be dropped by concurrent DDL. Then firing
the RELOAD events failed by not finding the partition.
*Reproducing the issue*
For how to reproduce the issue, start catalogd with --enable_reload_events=true
{code:bash}
bin/start-impala-cluster.py --catalogd_args="--enable_reload_events=true"{code}
Create a partitioned table
{code:sql}
create table part_tbl (i int) partitioned by (p int);{code}
Run a loop to ADD+DROP partition on this table
{code:bash}
while true; do impala-shell.sh --quiet -B -q "ALTER TABLE part_tbl ADD
PARTITION (p=1); ALTER TABLE part_tbl DROP PARTITION (p=1);" > /dev/null;
done{code}
In another session, run a loop to REFRESH the partition
{code:bash}
while true; do impala-shell.sh --quiet -B -q "REFRESH part_tbl PARTITION (p=1)"
> /dev/null; done{code}
After a while, some REFRESH would fail:
{noformat}
Could not execute command: REFRESH part_tbl PARTITION (p=1)
ERROR: NullPointerException: null{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]