[GitHub] [hbase] Apache9 edited a comment on pull request #2574: HBASE-25212 Optionally abort requests in progress after deciding a region should close

GitBox Thu, 29 Oct 2020 17:35:31 -0700


Apache9 edited a comment on pull request #2574:
URL: https://github.com/apache/hbase/pull/2574#issuecomment-719100163



   > @Apache9 I am having trouble responding to your comment above.
   
   I'm not sure what does this mean so I will just ignore it and focus on the 
tech part below. If this is indeed something important please tell me if I'm 
doing something wrong. Thanks.
   
   > 
   > > After reading this discussion several times, I think the reason we do 
not want to niterrupte a WAL sync is that it may lead to a region server abort?
   > > I would say this is not the case here. I checked the code again, the 
actual sync is done in the disruptor thread, in the rpc thread we just block on 
a SyncFuture(as Andrew mentioned above), the interruption on the rpc thread 
will just lead to an IOException tp client, the actual sync operation will not 
be interrupted so we are safe.
   > > So I do not think we need to disable interrupts here?
   > 
   > You are correct about the SyncFuture.
   > 
   > Initially my thinking was the same as yours.
   > 
   > At some point I became concerned about this case, though:
   > 
   > * We start the WAL append
   > * SyncFuture is interrupted, client gets an exception, client thinks the 
mutation failed
   > * WALedit is actually applied to the WAL by the disruptor, so the mutation 
is included in the WAL, and so we are at risk of the failed from client 
perspective mutation being applied during WAL replay for some reason
   > 
   > So we have a case where the client's understanding of what happened is 
incorrect. What happens in the WAL and the client's understanding of what 
happend should be in sync. If the WAL commit fails the client should get an 
exception. If the WAL commit succeeds the client should see success. Whether we 
interrupt or not is our option. disableInterrupt/enableInterrupt here is 
consideration for the client.
   > 
   > I can add the above as a code comment next to the disableInterrupt() call, 
would that help?
   > 
   > Here is another case of concern:
   > 
   > * We start the WAL append
   > * SyncFuture is interrupted, memstore is not updated, client gets an 
exception. Unless the WAL is replayed the local cluster A will not have the 
mutation applied.
   > * WALedit is actually applied to the WAL by the disruptor, so the mutation 
is included in the WAL, and so it is shipped to the remote cluster, and is 
applied at the remote cluster B.
   > * Now the data in cluster A and B are out of sync.
   > 
   > The disableInterrupt/enableInterrupt pair combines the WAL append and 
memstore update into a single protected operation , so this error case will not 
happen because of the interrupts I am adding with this patch.
   
   I think this is a valid point. We should not return an exception to client 
unless we can make sure the (update) operation is failed. And you are right 
about the replication part, if the WAL sync succeeds then we should consider 
the operation succeeds, that's why we will abort the RS if a WAL sync fails, as 
we do not know if it succeeds so the only safe way is to abort the RS, close 
the ongoing WAL file, and reconstruct the memstore by reading the WAL file.
   
   But then, I do not think we can interrupt the rpc handler after we issue the 
WAL sync right? So let's just remove the rpc handler from the Map before we 
issuing the WAL sync?
   
   And think more, for a put operation, usually there are 3 places where we 
could block, first is acquiring row lock, second is WAL sync, third is 
completing MVCC. If we consider the argumenta above, it seems that we can 
interrupt the first acquiring row lock operation? Then is it still worth to add 
so many interruption checks in our write path?
   
   Just my thoughts. Thanks.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] Apache9 edited a comment on pull request #2574: HBASE-25212 Optionally abort requests in progress after deciding a region should close

Reply via email to