[GitHub] [couchdb] zdravko123 opened a new issue #2930: Daily IOPs spike on AZURE VMs Cluster causing delays in replications

GitBox Fri, 05 Jun 2020 21:35:12 -0700


zdravko123 opened a new issue #2930:
URL: https://github.com/apache/couchdb/issues/2930



   Everyday we get a spike in the IOPs for about 3-5 minutes which causes 
delays in the replications.
   This is a bit of an issue for us, as we need the database to synchronize 
almost instantly which they generally do.  I have thought about upgrading the 
disk IOPS to 5x higher, but I suspect we will still get some spikes and it's 
expensive, we could go up to 10x dis iops if we needed to.  I have also 
considered splitting out the Data from the Views to eliminate some issues.  
What I suspect it is is related to the beam Queuing technology, I suspect it 
might be crashing or garbage collecting and dumping things to disk, or perhaps 
log files.  Is there anything I can do to investigate? My linux skills aren't 
as good as windows but happy to have a poke around along with another 
developer.  I have read on other forums that this causes High CPU, but for us 
it seems to be high IOPS and thus causing outages.  It happens on all the 
nodes, 4 in the cluster at the same time.
   
   More information can be found about our set up here: 
https://github.com/apache/couchdb/issues/2298
   
   We are in the process of eliminating the need for the synchronizations as we 
have a FULLDB --> MiniDB, we are working on changing our application to write 
directly to the mini and not full so it doesn't need the real time replication. 
 That is about 1 month away, and we need an interim fix until then.  We can not 
guarantee that this won't affect other things because we also had a 2-3 minute 
outage when we had a IOPS spike that caused the whole cluster to be 
unresponsive ie bad gate way.
   
   So I guess I'm looking for assistance. What could cause this on a daily 
cycle? What can we investigate that might be related to , and what we can do to 
fix it, apart from increasing the disk usage?  We do realize the tech isn't 
suitable for our implementation where we are using it in a transactional ACID 
way rather than an eventual consistency, which is where a fundamental design 
flaw/assumption was made, that we are trying to rectify.  That was just the 
trade of at the time to use PouchDB client side offline storage and replication.
   
   Any help would be much appreciated!
   
   Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [couchdb] zdravko123 opened a new issue #2930: Daily IOPs spike on AZURE VMs Cluster causing delays in replications

Reply via email to