nickva opened a new issue, #5166:
URL: https://github.com/apache/couchdb/issues/5166

   Saw this in the logs:
   
   ```
   exit:{{function_clause,[
   {gb_trees,delete_1,[1395375226,nil],[{file,"gb_trees.erl"},{line,408}]},
   {gb_trees,delete_1,2,[{file,"gb_trees.erl"},{line,412}]},
   {gb_trees,delete_1,2,[{file,"gb_trees.erl"},{line,409}]},
   {gb_trees,delete_1,2,[{file,"gb_trees.erl"},{line,412}]},
   {gb_trees,delete,2,[{file,"gb_trees.erl"},{line,404}]},
   {couch_lru,close_int,2,[{file,"src/couch_lru.erl"},{line,56}]},
   
{couch_server,maybe_close_lru_db,1,[{file,"src/couch_server.erl"},{line,455}]},
   {couch_server,handle_call,3, [{file,"src/couch_server.erl"},{line,609}]}]},
   {gen_server,call,[couch_server_10,{open,<<"shards/00000000-3fffffff ...
   
   [{gen_server,call,3,[{file,"gen_server.erl"},{line,385}]},
    {couch_server,open_int,2,[{file,"src/couch_server.erl"},{line,130}]},
   {couch_server,open,2,[{file,"src/couch_server.erl"},{line,113}]},
   {mem3_util,get_or_create_db_int,2,[{file,"src/mem3_util.erl"},{line,619}]},
   {fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,356}]},
   {rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,138}]}]
   ```
   
   Our LRU has a bug in it. While we're traversing the tree we're also 
updating/deleting entries. In fact, we're keeping two separate view of the Tree 
going: one in the `Iter`, created when we start `close_int` 
https://github.com/apache/couchdb/blob/83658d06d12447b7d1abccc1a64b84e020899ba4/src/couch/src/couch_lru.erl#L39
 and the `{Tree, _} = Cache` variable.
   
   When we iterate over the tree, we do either of 3 things:
      1. We try to lock the entry, if that returns `false` (entry is not found 
or is locked), delete the entry from the cache and also restart the iteration 
from the beginning.
      2. If we find and lock the entry, and it's idle, we evict it and return. 
That stops the iteration `gb_trees:delete(Lru, Tree)...`
      3. If we find and lock the entry, and it's not idle, we delete/re-insert 
it and continue iterating.
     
   There a few odd things that jump out:
     * We manage two views of the Tree, one in the iterator while traversing 
and one in the Cache variable.
     * We consider a missing entry and  a locked entry the same in the result 
of the try_lock
     * We always restart the iteration when if the entry is missing/locked
     * If we can't find any idle entries, we always end up re-inserting all the 
entries in the cache. So a single close, call, ends up traversing and 
re-writting the whole cache. If there are thousands of concurrent calls, they 
all end up doing that work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to