Github user jaydoane commented on the issue:

    https://github.com/apache/couchdb-couch/pull/185
  
    The following image adds vmpage_io.memory.in vmpage_io.memory.out to the 
experiment. All experiments were searching 4 files in parallel. The first 
starts around 23:48, using the current algorithm, and it is followed around 
23:57, 23:59 and 00:03 using vectored reads.
    
    <img width="1244" alt="screen shot 2016-09-21 at 5 06 22 pm" 
src="https://cloud.githubusercontent.com/assets/51209/18733383/5a83b6e8-801f-11e6-84f1-3a4fc074f9e1.png";>
    The most notable aspect of the graph to me is the consistently high 
vmpage_io.memory.in for the vectored read. Just eyeballing the graphs, it looks 
like the area under the curves for vmpage_io.memory.in are similar for both 
algorithms, which I think is what @theburge was expecting to see.
    
    As for a more realistic MT scenario, I want to clarify something. It's my 
understanding that under normal circumstances when opening a couch file, the 
header is found at the end of the file. In such cases, the existing algorithm 
will be used (since it's been micro-optimized for this case by reading the 
entire remainder of the block in a single disk read operation). Only when the 
existing algorithm fails to find a header do we employ the vectored read 
algorithm.
    
    The only scenario I know of for which we have deeply buried headers is that 
of .compact.meta files, and the number of those presumably is limited to the 
number of simultaneous compactions that occur at any time. My understanding is 
that concurrency is governed by smoosh, and typical numbers are on the order of 
10. If all of those assumptions are true, a realistic scenario probably 
wouldn't have more than a handful of vectored searches happening at one time on 
any given node, and so my test case of 4 is not terribly unrealistic. 
    
    That said, the image below shows a series of 3 experiments using 8 parallel 
searches; the first with the current algorithm, and the other 2 using vectored 
reads. The main thing to note is that the speed improvement drops to "only" 4x 
the current algorithm.
    
    <img width="1246" alt="screen shot 2016-09-21 at 6 11 04 pm" 
src="https://cloud.githubusercontent.com/assets/51209/18734223/076489da-8027-11e6-9517-844e741cd40b.png";>
    
    @davisp, I'm all for getting this wrapped up. What are some final tweaks 
you had in mind? Clearly, it should be squashed into a single commit. Are there 
other problems you'd like to see addressed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to