[
https://issues.apache.org/jira/browse/KUDU-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727684#comment-17727684
]
Joe McDonnell commented on KUDU-3484:
-------------------------------------
[~araina] From Impala's point of view, we want the Kudu C++ client to behave
transparently as if the cache doesn't exist. I think #3 is closest to that
approach, so here is a sketch of #3:
# Operation arrives
# Take a timestamp before accessing the cache
# Try with the cache
## If success, then everything is the same
# If failure, look up the tablet
# If the tablet is older than the timestamp from before the cache access, then
the tablet existed before the operation started. Retry with the new tablet.
# If the tablet is newer than the timestamp from before the cache access,
throw an error (tablet not found)
If necessary, Impala can pass in a timestamp for when the operation starts.
> Kudu C++ client needs to invalidate cache if Java client issued a DDL op on
> same partition
> ------------------------------------------------------------------------------------------
>
> Key: KUDU-3484
> URL: https://issues.apache.org/jira/browse/KUDU-3484
> Project: Kudu
> Issue Type: Improvement
> Components: client
> Reporter: Ashwani Raina
> Assignee: Ashwani Raina
> Priority: Major
>
> This Jira is created to track the work for improve Kudu C++ client's
> metacache integrity when both C++ as well Java clients are used on same
> partition in a particular sequence of steps that result in cache entry
> becoming stale in Kudu C++ client.
> Here is a detailed step-wise execution of a test that could result in such a
> situation:
> +++
> metacache at Kudu client doesn't cleanup the old tablet from its cache after
> new partition with same range is created.
> This new tablet id is the valid one and hence received from server response
> and should be used everywhere onwards.
> If we look at the steps to repro:
> 1. We create a table first with following query:
> +++
> /** 1. Create table **/ drop table if exists impala_crash; create table if
> not exists impala_crash ( dt string, col string, primary key(dt) ) partition
> by range(dt) ( partition values <= '00000000' ) stored as kudu;
> +++
> 2. Then, table is altered by adding a partition with range 20230301:
> +++
> alter table impala_crash drop if exists range partition value='20230301';
> alter table impala_crash add if not exists range partition value='20230301';
> insert into impala_crash values ('20230301','abc');
> +++
> 3. Then, we alter the table again by adding a partition with same range after
> deleting old partition:
> +++
> alter table impala_crash drop if exists range partition value='20230301';
> alter table impala_crash add if not exists range partition value='20230301';
> insert into impala_crash values ('20230301','abc');
> +++
> Even though old partition is dropped and new one is added, old cache entry
> (with old tablet id) reference still remains in kudu client metacache,
> although it is marked as stale.
> When we try to write the new value to same range, it first searches entry
> (using tablet id) inside the metacache and finds it to be stale. As a result,
> rpc lookup is issued which connects with server and fetches payload response
> that contains new tablet id as there is no old tablet entry on server
> anymore. This new tablet id is recorded in client metacache. When PickLeader
> resumes again, it goes into rpc lookup cycle which now does a successful
> fastpath lookup because the latest entry is present in cache. But when its
> callback is invoked, it again resumes work with old tablet id at hand which
> never gets updated.
> +++
> Different approaches were discussed to address this. Following are some of
> the approaches captured here for posterity:
> +++
> 1. Maintain a context in impala that can be shared among different clients.
> The same context can be used to notify the c++ client to get rid of cache if
> there has been set of operations that could invalidate a cache. Simply
> passing tablet id may not work because that may not be enough for a client
> take the decision.
> 2. Impala sends a hint to c++ client to remove the cache entry after a DDL
> operation (invoked via java client) and perform a remote lookup instead of
> relying on the local cache.
> 3. Kudu detects the problem internally and returns up to RPC layer and there
> it over-writes the rpc structure with new tablet object and retry. This is a
> tricky and unclean approach and has potential of introducing bugs.
> 4. Change the tablet id in the RPC itself. This is a non-trivial and error
> prone approach as tablet id is defined const and implementation of rpc,
> batcher and client is done with assumption that tablet id becomes read-only
> after RPC is registered for an incoming op.
> +++
> The likelihood of finalising approach #1 or #2 is high as compared to rest.
> Hence, two jiras may be required to track kudu work and impala work
> separately.
> This Jira will be used to track kudu side work
> For impala side work, following Jira will be used:
> https://issues.apache.org/jira/browse/IMPALA-12172
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)