wohali commented on issue #549: [Jenkins] timeout triggered by all_dbs_active
URL: https://github.com/apache/couchdb/issues/549#issuecomment-312941204
 
 
   This is a really odd failure. My guess is we've hit a race condition where 
an attempt to open the database is happening prior to the database having been 
created yet, but I'm not comfortable enough with the `couch_server` code to be 
sure.
   
   We see in the couch.log file that the PUT to create the eunit test database 
returns with a 201, so presumably the database has been created. But just a few 
hundred milliseconds, later, we also see:
   
   ```
   [error] 2017-05-29T02:02:35.181959Z nonode@nohost <0.5150.1> -------- Could 
not open file 
/tmp/tmp.c3FrN4iLsk/apache-couchdb-2.0.0-79067c9/tmp/data/shards/40000000-5fffffff/eunit-test-db-149623354967440.1496023354.couch:
 no such file or directory
   ```
   
   **Question 1**: Is it normal for a PUT to get back a 201 before all the 
shards have been created on disk?
   
   During the open call, we hit `couch_server:handle_call({open...})` > 
`couch_server:make_room/2` > `couch_server:maybe_close_lru_db/1`, which is the 
only place in the code that returns `all_dbs_active`. We must be walking this 
code path. What's *especially* odd is that, because we get `all_dbs_active` 
back, we **must** be failing this guard:
   
   ```erlang
   maybe_close_lru_db(#server{dbs_open=NumOpen, max_dbs_open=MaxOpen}=Server)
           when NumOpen < MaxOpen ->
       {ok, Server};
   ```
   
   **Question 2**: Why did we fail that guard?
   
   So we get past the guard, head into `couch_lru:close/1` which attempts 
`close_int(gb_trees:next(gb_trees:iterator(Tree)), Cache)`. Inside of 
`close_int/2` I see:
   
   ```erlang
   close_int(none, _) ->
       false;
   ```
   
   In this particular failure, we can't possibly have >100 dbs open already. I 
wonder if `gb_trees:iterator(Tree)` is returning an empty list, possibly 
because nothing is actually open yet, and this bubbles back up as a false?
   
   **Question 3**: Should we handle the special case of `couch_lru:close/1` 
being called on an empty LRU to not return `false`?
   
   -----
   
   Also: I noticed
   
   ```erlang
   couch/src/couch_server.erl:-define(MAX_DBS_OPEN, 100).
   ```
   
   but `default.ini` defines `max_dbs_open = 500`. Should we bump the default 
in `couch_server.erl`? 500 seems like a more reasonable default to me. I'll 
open a new issue for this.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to