Ashwani Raina created KUDU-3484:
-----------------------------------

             Summary: Kudu C++ client needs to invalidate cache if Java client 
issued a DDL op on same partition
                 Key: KUDU-3484
                 URL: https://issues.apache.org/jira/browse/KUDU-3484
             Project: Kudu
          Issue Type: Improvement
          Components: client
            Reporter: Ashwani Raina
            Assignee: Ashwani Raina


This Jira is created to track the work for improve Kudu C++ client's metacache 
integrity when both C++ as well Java clients are used on same partition in a 
particular sequence of steps that result in cache entry becoming stale in Kudu 
C++ client.

Here is a detailed step-wise execution of a test that could result in such a 
situation:
+++

metacache at Kudu client doesn't cleanup the old tablet from its cache after 
new partition with same range is created.
This new tablet id is the valid one and hence received from server response and 
should be used everywhere onwards.

If we look at the steps to repro:
1. We create a table first with following query:
+++
/** 1. Create table **/ drop table if exists impala_crash; create table if not 
exists impala_crash ( dt string, col string, primary key(dt) ) partition by 
range(dt) ( partition values <= '00000000' ) stored as kudu;
+++

2. Then, table is altered by adding a partition with range 20230301:
+++
alter table impala_crash drop if exists range partition value='20230301'; alter 
table impala_crash add if not exists range partition value='20230301'; insert 
into impala_crash values ('20230301','abc');
+++

3. Then, we alter the table again by adding a partition with same range after 
deleting old partition:
+++
alter table impala_crash drop if exists range partition value='20230301'; alter 
table impala_crash add if not exists range partition value='20230301'; insert 
into impala_crash values ('20230301','abc');
+++
Even though old partition is dropped and new one is added, old cache entry 
(with old tablet id) reference still remains in kudu client metacache, although 
it is marked as stale.
When we try to write the new value to same range, it first searches entry 
(using tablet id) inside the metacache and finds it to be stale. As a result, 
rpc lookup is issued which connects with server and fetches payload response 
that contains new tablet id as there is no old tablet entry on server anymore. 
This new tablet id is recorded in client metacache. When PickLeader resumes 
again, it goes into rpc lookup cycle which now does a successful fastpath 
lookup because the latest entry is present in cache. But when its callback is 
invoked, it again resumes work with old tablet id at hand which never gets 
updated.
+++

Different approaches were discussed to address this. Following are some of the 
approaches captured here for posterity:
+++

1. Maintain a context in impala that can be shared among different clients. The 
same context can be used to notify the c++ client to get rid of cache if there 
has been set of operations that could invalidate a cache. Simply passing tablet 
id may not work because that may not be enough for a client take the decision.

2. Impala sends a hint to c++ client to remove the cache entry after a DDL 
operation (invoked via java client) and perform a remote lookup instead of 
relying on the local cache.

3. Kudu detects the problem internally and returns up to RPC layer and there it 
over-writes the rpc structure with new tablet object and retry. This is a 
tricky and unclean approach and has potential of introducing bugs.

4. Change the tablet id in the RPC itself. This is a non-trivial and error 
prone approach as tablet id is defined const and implementation of rpc, batcher 
and client is done with assumption that tablet id becomes read-only after RPC 
is registered for an incoming op.
+++

The likelihood of finalising approach #1 or #2 is high as compared to rest. 
Hence, two jiras may be required to track kudu work and impala work separately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to