Ashwani Raina created KUDU-3484:
-----------------------------------
Summary: Kudu C++ client needs to invalidate cache if Java client
issued a DDL op on same partition
Key: KUDU-3484
URL: https://issues.apache.org/jira/browse/KUDU-3484
Project: Kudu
Issue Type: Improvement
Components: client
Reporter: Ashwani Raina
Assignee: Ashwani Raina
This Jira is created to track the work for improve Kudu C++ client's metacache
integrity when both C++ as well Java clients are used on same partition in a
particular sequence of steps that result in cache entry becoming stale in Kudu
C++ client.
Here is a detailed step-wise execution of a test that could result in such a
situation:
+++
metacache at Kudu client doesn't cleanup the old tablet from its cache after
new partition with same range is created.
This new tablet id is the valid one and hence received from server response and
should be used everywhere onwards.
If we look at the steps to repro:
1. We create a table first with following query:
+++
/** 1. Create table **/ drop table if exists impala_crash; create table if not
exists impala_crash ( dt string, col string, primary key(dt) ) partition by
range(dt) ( partition values <= '00000000' ) stored as kudu;
+++
2. Then, table is altered by adding a partition with range 20230301:
+++
alter table impala_crash drop if exists range partition value='20230301'; alter
table impala_crash add if not exists range partition value='20230301'; insert
into impala_crash values ('20230301','abc');
+++
3. Then, we alter the table again by adding a partition with same range after
deleting old partition:
+++
alter table impala_crash drop if exists range partition value='20230301'; alter
table impala_crash add if not exists range partition value='20230301'; insert
into impala_crash values ('20230301','abc');
+++
Even though old partition is dropped and new one is added, old cache entry
(with old tablet id) reference still remains in kudu client metacache, although
it is marked as stale.
When we try to write the new value to same range, it first searches entry
(using tablet id) inside the metacache and finds it to be stale. As a result,
rpc lookup is issued which connects with server and fetches payload response
that contains new tablet id as there is no old tablet entry on server anymore.
This new tablet id is recorded in client metacache. When PickLeader resumes
again, it goes into rpc lookup cycle which now does a successful fastpath
lookup because the latest entry is present in cache. But when its callback is
invoked, it again resumes work with old tablet id at hand which never gets
updated.
+++
Different approaches were discussed to address this. Following are some of the
approaches captured here for posterity:
+++
1. Maintain a context in impala that can be shared among different clients. The
same context can be used to notify the c++ client to get rid of cache if there
has been set of operations that could invalidate a cache. Simply passing tablet
id may not work because that may not be enough for a client take the decision.
2. Impala sends a hint to c++ client to remove the cache entry after a DDL
operation (invoked via java client) and perform a remote lookup instead of
relying on the local cache.
3. Kudu detects the problem internally and returns up to RPC layer and there it
over-writes the rpc structure with new tablet object and retry. This is a
tricky and unclean approach and has potential of introducing bugs.
4. Change the tablet id in the RPC itself. This is a non-trivial and error
prone approach as tablet id is defined const and implementation of rpc, batcher
and client is done with assumption that tablet id becomes read-only after RPC
is registered for an incoming op.
+++
The likelihood of finalising approach #1 or #2 is high as compared to rest.
Hence, two jiras may be required to track kudu work and impala work separately.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)