Github user jaydoane commented on the issue: https://github.com/apache/couchdb-couch/pull/185 The following image adds vmpage_io.memory.in vmpage_io.memory.out to the experiment. All experiments were searching 4 files in parallel. The first starts around 23:48, using the current algorithm, and it is followed around 23:57, 23:59 and 00:03 using vectored reads. <img width="1244" alt="screen shot 2016-09-21 at 5 06 22 pm" src="https://cloud.githubusercontent.com/assets/51209/18733383/5a83b6e8-801f-11e6-84f1-3a4fc074f9e1.png"> The most notable aspect of the graph to me is the consistently high vmpage_io.memory.in for the vectored read. Just eyeballing the graphs, it looks like the area under the curves for vmpage_io.memory.in are similar for both algorithms, which I think is what @theburge was expecting to see. As for a more realistic MT scenario, I want to clarify something. It's my understanding that under normal circumstances when opening a couch file, the header is found at the end of the file. In such cases, the existing algorithm will be used (since it's been micro-optimized for this case by reading the entire remainder of the block in a single disk read operation). Only when the existing algorithm fails to find a header do we employ the vectored read algorithm. The only scenario I know of for which we have deeply buried headers is that of .compact.meta files, and the number of those presumably is limited to the number of simultaneous compactions that occur at any time. My understanding is that concurrency is governed by smoosh, and typical numbers are on the order of 10. If all of those assumptions are true, a realistic scenario probably wouldn't have more than a handful of vectored searches happening at one time on any given node, and so my test case of 4 is not terribly unrealistic. That said, the image below shows a series of 3 experiments using 8 parallel searches; the first with the current algorithm, and the other 2 using vectored reads. The main thing to note is that the speed improvement drops to "only" 4x the current algorithm. <img width="1246" alt="screen shot 2016-09-21 at 6 11 04 pm" src="https://cloud.githubusercontent.com/assets/51209/18734223/076489da-8027-11e6-9517-844e741cd40b.png"> @davisp, I'm all for getting this wrapped up. What are some final tweaks you had in mind? Clearly, it should be squashed into a single commit. Are there other problems you'd like to see addressed?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---