[ 
https://issues.apache.org/jira/browse/GEODE-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193815#comment-17193815
 ] 

ASF subversion and git services commented on GEODE-8339:
--------------------------------------------------------

Commit 6b79dab953089979657bc9763321765f45c0f37e in geode's branch 
refs/heads/develop from Ray Ingles
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=6b79dab ]

GEODE-8339: fix Redis Rename hang (#5501)

The hang was caused by a thread holding a read lock, the rebalance waiting for 
that thread so it could get the write lock, and then another thread waiting to 
get the same read lock that is now blocked behind the write lock. This other 
thread needs to complete before the first thread will release its read lock so 
we ended up deadlocked.
Now the second thread is told that the read lock is already held on hits behalf 
so it does not try to obtain it again.

Co-authored-by: Ray Ingles <ring...@vmware.com>
Co-authored-by: Sarah <sab...@pivotal.io>

> Redis rename hangs when servers are killed and revived
> ------------------------------------------------------
>
>                 Key: GEODE-8339
>                 URL: https://issues.apache.org/jira/browse/GEODE-8339
>             Project: Geode
>          Issue Type: Bug
>          Components: redis
>            Reporter: Sarah Abbey
>            Priority: Major
>              Labels: pull-request-available
>
> See ignored test associated with this ticket.  
> {noformat}
> [vm2] [warn 2020/07/07 17:12:48.216 EDT <ResourceManagerRecoveryThread 1> 
> tid=0x46] 15 seconds have elapsed while waiting for replies: 
> <DeposePrimaryBucketMessage$DeposePrimaryBucketResponse 1171 waiting for 1 
> replies from [192.168.0.104(server-1:27717)<v1>:41001]> on 
> 192.168.0.104(server-2:27730)<v9>:41002 whose current membership list is: 
> [[192.168.0.104(locator-0:27716:locator)<ec><v0>:41000, 
> 192.168.0.104(server-1:27717)<v1>:41001, 
> 192.168.0.104(server-2:27730)<v9>:41002, 
> 192.168.0.104(server-3:27731)<v11>:41003]]
> [vm2] [warn 2020/07/07 17:12:48.216 EDT <Function Execution Processor3> 
> tid=0x5c] 15 seconds have elapsed while waiting for replies: 
> <PRFunctionStreamingResultCollector 1170 waiting for 1 replies from 
> [192.168.0.104(server-1:27717)<v1>:41001]> on 
> 192.168.0.104(server-2:27730)<v9>:41002 whose current membership list is: 
> [[192.168.0.104(locator-0:27716:locator)<ec><v0>:41000, 
> 192.168.0.104(server-1:27717)<v1>:41001, 
> 192.168.0.104(server-2:27730)<v9>:41002, 
> 192.168.0.104(server-3:27731)<v11>:41003]]
> {noformat}
> The Redis RENAME command could hang during a rebalance, if the old key was 
> stored in a bucket on one server, and the new key was in a different bucket 
> on a separate server. Rename would read-lock the buckets, but rebalance would 
> wait for a write lock, and the Rename region put/destroy would then wait on 
> the write lock.
> Now Rename passes a callback argument that indicates it has already locked 
> the primary, and will not attempt to lock the primary again when doing the 
> put/destroy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to