Hi everyone,

I just wanted to share with you my recent experience in troubleshooting 
strange problems.

Background: This project uses Foxx where most of the app logic is 
implemented. From Foxx functions, I used the request module to post events 
to Azure Table Storage.

Everything was really working fine until ~2 weeks ago when I started to 
notice that my ArangoDB instances would sometimes go through some "apnea" 
with:
- requests taking a long time to run (many minutes!)
- lock timeouts in Foxx transactions
- general performance degradation with the web dashboard not available
Those issues would last for 10 to 15 minutes and everything would get back 
to normal.

I first suspected my code to be at fault and spent a lot of time trying to 
figure out what triggered those problems. But then I found out that:
- both staging and production environments were impacted, but they were not 
running the same version of my app (and the prod was >1 week older)
- when those apnea happen, I would sometimes get error logs about SSL 
handshakes
- (not confirmed) issues in prod and staging would happen approximately at 
the same time
- (not confirmed) issues would happen when the Azure Table Storage would 
have higher response time

I asked on Slack about the SSL handshake thing and someone answered that 
there was a bug introduced with TLS support (which I guess was 3.1), and 
then it hit me that I upgraded my instances from 3.0.10 to 3.1.15 not too 
long ago.

So I decided to change the flow of events within the system (not a small 
change!) to avoid having Arango use the request module. This was deployed 
nearly a week ago, and I didn't have any problem since then!

Cheers,
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to