Quanlong Huang created IMPALA-10875:
---------------------------------------
Summary: Transient stale catalog if catalogd is restarted more
than once shortly
Key: IMPALA-10875
URL: https://issues.apache.org/jira/browse/IMPALA-10875
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
This is a follow-up task of IMPALA-5476. Though it's rare in practise, we still
have a bug that client can see stale catalog in the following scenario:
* Catalogd is restarted twice inside a statestore catalog update cycle.
* A DDL finishes its execDdl RPC request on the second restarted catalogd. It
gets a new catalog service id which differs from the local one. Then wait until
the local one is updated.
* Coordinator receives catalog update from the first restarted catalogd. So
the local catalog service id changes, which wakes up the DDL execution thread.
* The DDL execution thread finds the catalog service id still differs from the
one that executes the DDL. Then ignores the DDL result and returns.
Client will see stale catalog until next catalog topic update comes.
The following test can reveal this bug (add it into
tests/custom_cluster/test_restart_services.py)
{code:python}
UPDATE_FREQUENCY_S = 10
@pytest.mark.execute_serially
@CustomClusterTestSuite.with_args(
statestored_args="--statestore_update_frequency_ms={frequency_ms}"
.format(frequency_ms=(UPDATE_FREQUENCY_S * 1000)))
def test_restart_catalogd_twice2(self):
self.execute_query_expect_success(self.client, "drop table if exists
join_aa")
self.execute_query_expect_success(self.client, "create table join_aa(id
int)")
# Make the catalog object version grow large enough
self.execute_query_expect_success(self.client, "invalidate metadata")
# No need to care whether the dll is executed successfully, it is just to
make
# the local catalog catche of impalad out of sync
for i in range(0, 10):
try:
query = "alter table join_aa add columns (age" + str(i) + " int)"
self.execute_query_async(query)
except Exception, e:
LOG.info(str(e))
self.cluster.catalogd.restart()
sleep(self.UPDATE_FREQUENCY_S * 2)
self.cluster.catalogd.restart()
self.execute_query_expect_success(self.client, "drop table join_aa")
# Should not see stale metadata on 'join_aa'
result = self.execute_query_expect_success(self.client, "show tables")
assert 'join_aa' not in result.data
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]