[GitHub] couchdb-couch issue #185: 3061 adaptive header search

jaydoane Wed, 21 Sep 2016 18:18:42 -0700

Github user jaydoane commented on the issue:

https://github.com/apache/couchdb-couch/pull/185

The following image adds vmpage_io.memory.in vmpage_io.memory.out to the
experiment. All experiments were searching 4 files in parallel. The first
starts around 23:48, using the current algorithm, and it is followed around
23:57, 23:59 and 00:03 using vectored reads.

<img width="1244" alt="screen shot 2016-09-21 at 5 06 22 pm"
src="https://cloud.githubusercontent.com/assets/51209/18733383/5a83b6e8-801f-11e6-84f1-3a4fc074f9e1.png";>
The most notable aspect of the graph to me is the consistently high
vmpage_io.memory.in for the vectored read. Just eyeballing the graphs, it looks
like the area under the curves for vmpage_io.memory.in are similar for both
algorithms, which I think is what @theburge was expecting to see.

As for a more realistic MT scenario, I want to clarify something. It's my
understanding that under normal circumstances when opening a couch file, the
header is found at the end of the file. In such cases, the existing algorithm
will be used (since it's been micro-optimized for this case by reading the
entire remainder of the block in a single disk read operation). Only when the
existing algorithm fails to find a header do we employ the vectored read
algorithm.

The only scenario I know of for which we have deeply buried headers is that
of .compact.meta files, and the number of those presumably is limited to the
number of simultaneous compactions that occur at any time. My understanding is
that concurrency is governed by smoosh, and typical numbers are on the order of
10. If all of those assumptions are true, a realistic scenario probably
wouldn't have more than a handful of vectored searches happening at one time on
any given node, and so my test case of 4 is not terribly unrealistic.

That said, the image below shows a series of 3 experiments using 8 parallel
searches; the first with the current algorithm, and the other 2 using vectored
reads. The main thing to note is that the speed improvement drops to "only" 4x
the current algorithm.

@davisp, I'm all for getting this wrapped up. What are some final tweaks
you had in mind? Clearly, it should be squashed into a single commit. Are there
other problems you'd like to see addressed?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch issue #185: 3061 adaptive header search

Reply via email to