nickva opened a new issue, #5130:
URL: https://github.com/apache/couchdb/issues/5130
Noticed one of the production nodes's `_dbs` database was throwing this
error:
```
with exit value:#012{epoch_order,[
{couch_db,validate_epochs,1,[{file,"src/couch_db.erl"},{line,1758}]},
{couch_db,get_epochs,1,[{file,"src/couch_db.erl"},{line,584}]},
{mem3_rep,compare_epochs,2,[{file,"src/mem3_rep.erl"},{line,557}]},
... {mem3_rep,go,1,[{file,"src/mem3_rep.erl"},{line,122}]}]}
```
Examining the db handle close noticed it has `nonode@nohost` epoch entries.
```
> (element(2,Db#db.engine))#st.header#db_header.epochs, couch_db:close(Db)
[{'[email protected]',23436889},
{nonode@nohost,23353728},
{'[email protected]',842473},
{nonode@nohost,830216},
{'[email protected]',18423167},
{nonode@nohost,18401097},
{'[email protected]',2227879},
{nonode@nohost,2204657},
{'[email protected]',10924197},
{nonode@nohost,10924195},
{'[email protected]',4548304},
{nonode@nohost,4534552},
{'[email protected]',0}]
```
`nonode@nohost` is what we expect `node()` to return if the node doesn't
have a `-name/-sname` defined, is standalone.
All the nodes we have in production have a `-node
[email protected]` in the vm.args file, so the expectation is that
`node()` will always return that node name, which clearly doesn't always happen.
The fact that his occurred with the dbs metadata db, may point that it has
something to do with a startup or shutdown state: there is a period of time
when the network dist sub-system either hasn't start or is shut down and we're
opening the couch file and updating it, and so `node()` would return
`nonode@nohost`.
Another idea is perhaps net_kernel crashes and restarts, but in that case we
might expect a full node restart.
If this is indeed an expected state, and our previous assumption was wrong,
we'd have to handle. Perhaps when we update/check epoch we have to also check
init state and arguments and if the user did specify a node name we either
ignore, crash or patch up the state.
Maybe a combination of `node()` and these results can be useful:
```
([email protected])5> init:get_status().
{started,started}
([email protected])6> proplists:get_value(name, init:get_arguments()).
["[email protected]"]
([email protected])7> proplists:get_value(sname, init:get_arguments()).
undefined
```
We also can't blindly always ignore a `nonode@nohost` epoch change in case
the user moved a file from a stand-alone couch file to a clustered setup.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]