[ https://issues.apache.org/jira/browse/IMPALA-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824222#comment-17824222 ]
ASF subversion and git services commented on IMPALA-12855: ---------------------------------------------------------- Commit b9c2e00a6ba1eaf1ed3f63013be60408409152b6 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b9c2e00a6 ] IMPALA-12855: Fix NPE in firing RELOAD events when the partition doesn't exist When --enable_reload_events is set to true, catalogd will fire RELOAD events for INVALIDATE/REFRESH statements. When the RELOAD event is fired successfully for a REFRESH statement, we also update lastRefreshEventId of the table/partition. This part could hit NullPointerException when the partition is dropped by concurrent DDLs. This patch ignores updating lastRefreshEventId if the partition doesn't exists. Note that ideally we should hold the table lock of REFRESH until finish firing the RELOAD events and updating lastRefreshEventId. So no concurrent operations can drop the partition. However, when the table is loaded from scratch, we don't actually hold the table write lock. We just load the table and take a read lock to get the thrift object. The partition could still be dropped concurrently after the load and before taking the read lock. So ignoring missing partitions is a simpler solution. Refactors some codes of fireReloadEventAndUpdateRefreshEventId to save some indention and avoid acquiring table lock if no events are fired. Adds error messages in some Precondition checks in methods used by this feature. Also refactors Table.getFullName() to not always constructing the result. Improves logs of not reloading a partition for an event. Tests: - Add e2e test Change-Id: I01af3624bf7cf5cd69935cffa28d54f6a6807504 Reviewed-on: http://gerrit.cloudera.org:8080/21096 Reviewed-by: Csaba Ringhofer <csringho...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > NullPointerException in firing RELOAD events if the partition is just dropped > ----------------------------------------------------------------------------- > > Key: IMPALA-12855 > URL: https://issues.apache.org/jira/browse/IMPALA-12855 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > > REFRESH <table> PARTITION could fail in firing RELOAD events (when > --enable_reload_events=true) if the partition is dropped by a concurrent DDL. > The failure is a NullPointerException: > {noformat} > E0229 15:04:25.578933 7381 JniUtil.java:183] > 824a23c46a6f71de:78a2f3dc00000000] Error in REFRESH TABLE default.part_tbl > PARTITIONS issued by quanlong. Time spent: 1s061ms > I0229 15:04:25.579373 7381 jni-util.cc:302] > 824a23c46a6f71de:78a2f3dc00000000] java.lang.NullPointerException > at > org.apache.impala.catalog.HdfsPartition.access$500(HdfsPartition.java:101) > at > org.apache.impala.catalog.HdfsPartition$Builder.<init>(HdfsPartition.java:1314) > at > org.apache.impala.service.CatalogOpExecutor.fireReloadEventAndUpdateRefreshEventId(CatalogOpExecutor.java:6810) > at > org.apache.impala.service.CatalogOpExecutor.execResetMetadataImpl(CatalogOpExecutor.java:6744) > at > org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6612) > at > org.apache.impala.service.JniCatalog.lambda$resetMetadata$4(JniCatalog.java:327) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100) > at > org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:243) > at > org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:257) > at > org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:326){noformat} > The problem is that in the implementation of execResetMetadataImpl(), the > table lock is not held all the way. Instead, it's held when reloading the > metadata then released, and held again when we need to fire RELOAD events. In > the time between these, the partition could be dropped by concurrent DDL. > Then firing the RELOAD events failed by not finding the partition. > *Reproducing the issue* > For how to reproduce the issue, start catalogd with > --enable_reload_events=true > {code:bash} > bin/start-impala-cluster.py > --catalogd_args="--enable_reload_events=true"{code} > Create a partitioned table > {code:sql} > create table part_tbl (i int) partitioned by (p int);{code} > Run a loop to ADD+DROP partition on this table > {code:bash} > while true; do impala-shell.sh --quiet -B -q "ALTER TABLE part_tbl ADD > PARTITION (p=1); ALTER TABLE part_tbl DROP PARTITION (p=1);" > /dev/null; > done{code} > In another session, run a loop to REFRESH the partition > {code:bash} > while true; do impala-shell.sh --quiet -B -q "REFRESH part_tbl PARTITION > (p=1)" > /dev/null; done{code} > After a while, some REFRESH would fail: > {noformat} > Could not execute command: REFRESH part_tbl PARTITION (p=1) > ERROR: NullPointerException: null{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org