[ https://issues.apache.org/jira/browse/COUCHDB-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978952#comment-15978952 ]
ASF subversion and git services commented on COUCHDB-3376: ---------------------------------------------------------- Commit e4c3705def6021a6b801c0bc0ceaac4abbc7c0d8 in couchdb's branch refs/heads/COUCHDB-3376-fix-mem3-shards from [~paul.joseph.davis] [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=e4c3705 ] Fix stale shards cache There's a race condition in mem3_shards that can result in having shards in the cache for a database that's been deleted. This results in a confused cluster that thinks a database exists until you attempt to open it. The fix is to ignore any cache insert requests that come from an older version of the dbs db than mem3_shards cache knows about. Big thanks to @jdoane for the identification and original patch. COUCHDB-3376 > Fix mem3_shards under load > -------------------------- > > Key: COUCHDB-3376 > URL: https://issues.apache.org/jira/browse/COUCHDB-3376 > Project: CouchDB > Issue Type: Bug > Reporter: Paul Joseph Davis > > There were two issues with mem3_shards that were fixed while I've been > testing the PSE code. > The first issue was found by [~jaydoane] where a database can have its shards > inserted into the cache after its been deleted. This can happen if a client > does a rapid CREATE/DELETE/GET cycle on a database. The fix for this is to > track the changes feed update sequence from the changes feed listener and > only insert shard maps that come from a client that has read as recent of an > update_seq as mem3_shards. > The second issue found during heavy benchmarking was that large shard maps > (in the Q>=128 range) can quite easily cause mem3_shards to backup when > there's a thundering herd attempting to open the database. There's no > coordination among workers trying to add a shard map to the cache so if a > bunch of independent clients all send the shard map at once (say, at the > beginning of a benchmark) then mem3_shards can get overwhelmed. The fix for > this was two fold. First, rather than send the shard map directly to > mem3_shards, we copy it into a spawned process and when/if mem3_shards wants > to write it, it tells this writer process to do its business. The second > optimization for this change is to create an ets table to track these > processes. Then independent clients can check if a shard map is already > enroute to mem3_shards by using ets:insert_new and canceling their writer if > that returns false. > PR incoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)