[ 
https://issues.apache.org/jira/browse/IMPALA-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824222#comment-17824222
 ] 

ASF subversion and git services commented on IMPALA-12855:
----------------------------------------------------------

Commit b9c2e00a6ba1eaf1ed3f63013be60408409152b6 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b9c2e00a6 ]

IMPALA-12855: Fix NPE in firing RELOAD events when the partition doesn't exist

When --enable_reload_events is set to true, catalogd will fire RELOAD
events for INVALIDATE/REFRESH statements. When the RELOAD event is fired
successfully for a REFRESH statement, we also update lastRefreshEventId
of the table/partition. This part could hit NullPointerException when
the partition is dropped by concurrent DDLs.

This patch ignores updating lastRefreshEventId if the partition doesn't
exists. Note that ideally we should hold the table lock of REFRESH until
finish firing the RELOAD events and updating lastRefreshEventId. So no
concurrent operations can drop the partition. However, when the table is
loaded from scratch, we don't actually hold the table write lock. We
just load the table and take a read lock to get the thrift object. The
partition could still be dropped concurrently after the load and before
taking the read lock. So ignoring missing partitions is a simpler
solution.

Refactors some codes of fireReloadEventAndUpdateRefreshEventId to save
some indention and avoid acquiring table lock if no events are fired.
Adds error messages in some Precondition checks in methods used by this
feature. Also refactors Table.getFullName() to not always constructing
the result. Improves logs of not reloading a partition for an event.

Tests:
 - Add e2e test

Change-Id: I01af3624bf7cf5cd69935cffa28d54f6a6807504
Reviewed-on: http://gerrit.cloudera.org:8080/21096
Reviewed-by: Csaba Ringhofer <csringho...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> NullPointerException in firing RELOAD events if the partition is just dropped
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-12855
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12855
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> REFRESH <table> PARTITION could fail in firing RELOAD events (when 
> --enable_reload_events=true) if the partition is dropped by a concurrent DDL. 
> The failure is a NullPointerException:
> {noformat}
> E0229 15:04:25.578933  7381 JniUtil.java:183] 
> 824a23c46a6f71de:78a2f3dc00000000] Error in REFRESH TABLE default.part_tbl 
> PARTITIONS issued by quanlong. Time spent: 1s061ms
> I0229 15:04:25.579373  7381 jni-util.cc:302] 
> 824a23c46a6f71de:78a2f3dc00000000] java.lang.NullPointerException
>         at 
> org.apache.impala.catalog.HdfsPartition.access$500(HdfsPartition.java:101)
>         at 
> org.apache.impala.catalog.HdfsPartition$Builder.<init>(HdfsPartition.java:1314)
>         at 
> org.apache.impala.service.CatalogOpExecutor.fireReloadEventAndUpdateRefreshEventId(CatalogOpExecutor.java:6810)
>         at 
> org.apache.impala.service.CatalogOpExecutor.execResetMetadataImpl(CatalogOpExecutor.java:6744)
>         at 
> org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6612)
>         at 
> org.apache.impala.service.JniCatalog.lambda$resetMetadata$4(JniCatalog.java:327)
>         at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>         at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>         at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>         at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
>         at 
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:243)
>         at 
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:257)
>         at 
> org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:326){noformat}
> The problem is that in the implementation of execResetMetadataImpl(), the 
> table lock is not held all the way. Instead, it's held when reloading the 
> metadata then released, and held again when we need to fire RELOAD events. In 
> the time between these, the partition could be dropped by concurrent DDL. 
> Then firing the RELOAD events failed by not finding the partition.
> *Reproducing the issue*
> For how to reproduce the issue, start catalogd with 
> --enable_reload_events=true
> {code:bash}
> bin/start-impala-cluster.py 
> --catalogd_args="--enable_reload_events=true"{code}
> Create a partitioned table
> {code:sql}
> create table part_tbl (i int) partitioned by (p int);{code}
> Run a loop to ADD+DROP partition on this table
> {code:bash}
> while true; do impala-shell.sh --quiet -B -q "ALTER TABLE part_tbl ADD 
> PARTITION (p=1); ALTER TABLE part_tbl DROP PARTITION (p=1);" > /dev/null; 
> done{code}
> In another session, run a loop to REFRESH the partition
> {code:bash}
> while true; do impala-shell.sh --quiet -B -q "REFRESH part_tbl PARTITION 
> (p=1)" > /dev/null; done{code}
> After a while, some REFRESH would fail:
> {noformat}
> Could not execute command: REFRESH part_tbl PARTITION (p=1)
> ERROR: NullPointerException: null{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to