[
https://issues.apache.org/jira/browse/HADOOP-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847817#comment-15847817
]
Xiao Chen commented on HADOOP-14044:
------------------------------------
Thanks [~hgadre] for reporting the issue and providing a fix.
As discussed offline, this behavior from hadoop is non-deterministic currently.
I think it logically makes sense to allow the caller a way to know whether the
cancellation succeeded.
The only problem with AbstractDelegationTokenSecretManager is it's {{Public
Evolving}}. I can't seem to find an explicit saying in [Hadoop's compatibility
guideline|http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop-common/Compatibility.html]
regarding {{protected}} methods. But this also could break binary
compatibility if some outside jars are compiled with the old version (thanks
[~andrew.wang] for pointing this out!). So I'm afraid the fix in patch 1 can't
be committed, because we want to provide binary compat. even across major
releases...
Other options I can think of:
- change ZKDTSM to throw instead of return false. This would be an incompatible
behavior.
- find a way to guarantee all peer ZKDTSMs see the removal, before returning
success for the cancellation. This is against the current ZKDTSM architecture
where each ZKDTSM isn't peer-aware
So it seems there's no good way to satisfy your request of 'when 2 callers are
cancelling, exactly 1 should see success'. I guess this may end up inline with
zookeeper's
[documentation|https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_WatchRememberThese]
- client has to handle it.
Please feel free to share your thoughts... thanks.
> Synchronization issue in delegation token cancel functionality
> --------------------------------------------------------------
>
> Key: HADOOP-14044
> URL: https://issues.apache.org/jira/browse/HADOOP-14044
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Hrishikesh Gadre
> Assignee: Hrishikesh Gadre
> Attachments: dt_fail.log, dt_success.log, HADOOP-14044-001.patch
>
>
> We are using Hadoop delegation token authentication functionality in Apache
> Solr. As part of the integration testing, I found following issue with the
> delegation token cancelation functionality.
> Consider a setup with 2 Solr servers (S1 and S2) which are configured to use
> delegation token functionality backed by Zookeeper. Now invoke following
> steps,
> [Step 1] Send a request to S1 to create a delegation token.
> (Delegation token DT is created successfully)
> [Step 2] Send a request to cancel DT to S2
> (DT is canceled successfully. client receives HTTP 200 response)
> [Step 3] Send a request to cancel DT to S2 again
> (DT cancelation fails. client receives HTTP 404 response)
> [Step 4] Send a request to cancel DT to S1
> At this point we get two different responses.
> - DT cancelation fails. client receives HTTP 404 response
> - DT cancelation succeeds. client receives HTTP 200 response
> Also as per the current implementation, each server maintains an in_memory
> cache of current tokens which is updated using the ZK watch mechanism. e.g.
> the ZK watch on S1 will ensure that the in_memory cache is synchronized after
> step 2.
> After investigation, I found the root cause for this behavior is due to the
> race condition between step 4 and the firing of ZK watch on S1. Whenever the
> watch fires before the step 4 - we get HTTP 404 response (as expected). When
> that is not the case - we get HTTP 200 response along with following ERROR
> message in the log,
> {noformat}
> Attempted to remove a non-existing znode /ZKDTSMTokensRoot/DT_XYZ
> {noformat}
> From client perspective, the server *should* return HTTP 404 error when the
> cancel request is sent out for an invalid token.
> Ref: Here is the relevant Solr unit test for reference,
> https://github.com/apache/lucene-solr/blob/746786636404cdb8ce505ed0ed02b8d9144ab6c4/solr/core/src/test/org/apache/solr/cloud/TestSolrCloudWithDelegationTokens.java#L285
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]