[ 
https://issues.apache.org/jira/browse/COUCHDB-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978952#comment-15978952
 ] 

ASF subversion and git services commented on COUCHDB-3376:
----------------------------------------------------------

Commit e4c3705def6021a6b801c0bc0ceaac4abbc7c0d8 in couchdb's branch 
refs/heads/COUCHDB-3376-fix-mem3-shards from [~paul.joseph.davis]
[ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=e4c3705 ]

Fix stale shards cache

There's a race condition in mem3_shards that can result in having shards
in the cache for a database that's been deleted. This results in a
confused cluster that thinks a database exists until you attempt to open
it.

The fix is to ignore any cache insert requests that come from an older
version of the dbs db than mem3_shards cache knows about.

Big thanks to @jdoane for the identification and original patch.

COUCHDB-3376


> Fix mem3_shards under load
> --------------------------
>
>                 Key: COUCHDB-3376
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-3376
>             Project: CouchDB
>          Issue Type: Bug
>            Reporter: Paul Joseph Davis
>
> There were two issues with mem3_shards that were fixed while I've been 
> testing the PSE code.
> The first issue was found by [~jaydoane] where a database can have its shards 
> inserted into the cache after its been deleted. This can happen if a client 
> does a rapid CREATE/DELETE/GET cycle on a database. The fix for this is to 
> track the changes feed update sequence from the changes feed listener and 
> only insert shard maps that come from a client that has read as recent of an 
> update_seq as mem3_shards.
> The second issue found during heavy benchmarking was that large shard maps 
> (in the Q>=128 range) can quite easily cause mem3_shards to backup when 
> there's a thundering herd attempting to open the database. There's no 
> coordination among workers trying to add a shard map to the cache so if a 
> bunch of independent clients all send the shard map at once (say, at the 
> beginning of a benchmark) then mem3_shards can get overwhelmed. The fix for 
> this was two fold. First, rather than send the shard map directly to 
> mem3_shards, we copy it into a spawned process and when/if mem3_shards wants 
> to write it, it tells this writer process to do its business. The second 
> optimization for this change is to create an ets table to track these 
> processes. Then independent clients can check if a shard map is already 
> enroute to mem3_shards by using ets:insert_new and canceling their writer if 
> that returns false.
> PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to