Happened again today. I received a  "111 connection refused" error. So I 
fired up tcptrack on my database server to look for tcp packets on 8529. 
(tcptrack -i eth1 port 8529).

There was not a single  connection waiting to be closed! Instead 
connections were popping up and closing constantly with a 3 second timeout 
and it really wasn't any different than any other day. 

I tried updating the file-max to 1000000 and even giving to arangodb user a 
limit of hard 10024 and a soft limit to 4096, but it didn't help.

I then tried to change the connection's persistence to "close" in order to 
force the app to use a new connection on page refresh. Still nothing.

All operations are reads and updates, not a single delete (unless it is 
performed from the webUI)
Memory consumption and cpu usage are not by any means excessive (4 Core 
CPU, 8GB RAM) and we try to unload any unused collections to save ram at 
regular intervals, so I can't really understand what resources are depleted.

The only thing that keeps doing the trick is restarting the service, which 
is very dangerous as arangodb can stop, but after a "111" error 
almost always a WAL file will throw out segmentation fault, so my only 
option is to delete it or ignore it resulting in complete data loss.

Also, from our use case, there is a high possibility that these errors are 
caused after multiple update operations.

I have updated all new collections to use sync in order to make sure that 
data are being written to disk and I am still losing data. 

Right now it seems like a domino of disasters. More update operations lead 
to refused connections which lead to data loss which then require more 
update operations from our side.

My only hope now is to check the "time wait" solution.

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to