[ 
https://issues.apache.org/jira/browse/HDFS-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922724#comment-16922724
 ] 

Chen Liang commented on HDFS-14822:
-----------------------------------

Effectively there are three cases getLastSeenStateId() gets called currently.

a. GlobalStatIdContext.receiveRequestState()
b. Handler in Server.run()
c. GlobalStateIdContext.updateResponseState()

(a): The use case here is when server side receives an RPC request, it reads 
the client state id field and builds the in-memory call object. To do this it 
gets the server id (which acquires the lock currently). The server id has two 
use cases here:
1. if current NN is active, and client state is larger than ANN id (which is an 
invalid case), the code sets the call id to server id overwriting client id. 
Since this is an invalid case, I don't think it worth protecting with a lock. 
Also, removing the lock means overwriting client state id with a potentially 
older server id, which is still fine.
2. checks if server id is too far behind client id, if yes, throws exception. 
For this case, removing the lock means getting inaccurate server id. But if 
it's just for the purpose of evaluating if NN is too far behind, the impact of 
this inaccuracy should be insignificant.

(b): The use case here is when server side needs to check if a call should be 
reinserted to the queue. It needs to check if the call is ahead or behind the 
state id of the NN itself. If we remove the lock here, we may get a potentially 
not so current, older NN id. For Observer, this just means there is higher 
chance client id can be larger than server id, and the call just gets 
reinserted, which is fine. For ANN, because of the check in (a), if client id 
is larger than ANN id, it would have been overwritten by ANN state by that 
time. So by the time here, it could not be larger than ANN id.

(c): This is where NN replies the RPC call, it sets its own state id in the RPC 
response. For ANN, removing the locking here means "There would be no guarantee 
client will always get the most recent state of ANN" (this is what the comments 
are trying to say). But thinking further, even with current locking, it could 
happen that while the response is on the fly, ANN state has already proceeded. 
There is just no way of getting THE current state of ANN. But, this should not 
matter anyway. All that matters is that client sees consistent view. Due to the 
fact that here, get state id is called on replying to client, even without the 
locking, the txnid of this particular client operation is guaranteed to be 
updated at this point (also because txnid is a volatile) which further 
ganrantees that client always sees his/her operation being done with it 
returned.

> [SBN read] Revisit GlobalStateIdContext locking when getting server state id
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-14822
>                 URL: https://issues.apache.org/jira/browse/HDFS-14822
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>            Priority: Major
>
> As mentioned under HDFS-14277. One potential performance issue of Observer 
> read is that {{GlobalStateIdContext#getLastSeenStateId}} calls 
> getCorrectLastAppliedOrWrittenTxId which ends up acquiring lock on txnid. We 
> internally had some discussion and analysis, we believe this lock can be 
> avoided, by calling the non-locking version method 
> {{getLastAppliedOrWrittenTxId.}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to