A few months ago a bunch of code landed on master around IO QoS and 
prioritization. I think we need to have a conversation about the defaults for 
that system and what we want to allow users to enable.

First topic - there are actually two different generations of the IOQ system: 
IOQ and IOQ2. Only one can be active at a given time, and the configurations 
are not compatible. The best use case for this queueing system is to 
de-prioritize IO for bookkeeping tasks like internal replication and compaction 
in favor of IO to respond to client requests.

The original and currently default IOQ system primarily works by classifying 
the IO based on whether it’s serving an interactive read or write request, an 
index build, a compaction job, etc. It builds queues for each of these IO 
classes and allows for relative prioritization of the different classes of IO. 
The main downside of this system is that it can only sustain a total throughput 
of about 20,000 operations/sec/node. Heavily-loaded systems frequently have to 
configure “bypasses” for certain classes of IO to keep latencies low.

IOQ2 was conceived to deliver higher throughput without resorting to bypasses 
and thus defeating the QoS. It’s a significantly more complex system. Tenants 
are a first-class concept in IOQ2, but of course they’re not in the rest of the 
CouchDB, so some of the code in there that computes per-user priorities will 
not work correctly. As far as I can tell it will fail gracefully (i.e., it will 
bucket every database as belonging to the same “user”), but I doubt this has 
been tested. IOQ2 definitely can sustain higher throughputs, though it has been 
known to enqueue so many more IO requests than it can issue that it effectively 
led to an outage anyway. It is still a material overhead compared to bypassing 
the QoS entirely.

I think there are a few possible paths forward:

1) Switch to IOQ2 and only document that one.
2) Document IOQ, installing bypasses across the board by default to avoid a big 
performance regression on upgrade
3) Just bypass the whole thing and don’t document it, to avoid introducing a 
big new admin capability in 3.0 and removing it in 4.0

Personally I think I’m leaning towards 3) at this point, but could be convinced 
otherwise. Regards,

Adam

Reply via email to