nickimho commented on issue #2329: Add option to enforce fetching data from 
local shards, instead of from shards on on remote nodes(if data present on 
local node)
URL: https://github.com/apache/couchdb/issues/2329#issuecomment-573271004
 
 
   Thanks @kocolosk, for the detailed info!
   
   I just want say that I am not a coder, and we have been looking at this from 
a black box point of view so far. So, please forgive if I am not very precise 
in some of the discussion or not using the right terms =)  . 
   
   At a high level, here is how we setup the environment:
   1. 15 CouchDB nodes, 5 in each zone (couchdb{1,2,3,4,5}-zone{a,b,c})
   1.a. q=1, n=3, 
   2. Each zone has it own IP subnet
   3, We Netropy (impairment tool) as the default gateway for those subnets, 
and impair between them with 60ms RTD. 
   4. In Each subnet, we setup a client host (client-zone{a,b,c}) to generate 
the query using CURL command
   5. For each test, we basically run tcpdump on all CouchDB nodes (especially 
QUERY_NODE to analysis where delays are coming from). In general, we are 
focusing on the potential network delays. 
   5.a We saw some i/o delays too if mydb is cold in Couchdb_DATA_NODES (I am 
suspecting due to mydb is not in file descriptor as well could cause this); 
this is not a concern for us as it is expected the mydb's our service care 
about would get hit often. 
   
   For data points I provided above:
   1. We basically start tcpdump on all CouchDB nodes.
   2. Run the CURL command test case (e.g. "time curl 
CouchDB2-zonea:5984/mydb/mydoc"
   3. Stop tcpdump's
   4. We start looking at Couchdb_QUERY_NODE's pcap using wireshark and the 
rest is just following the packets to understand the network interactions 
between all the nodes. 
   
   For 2.b.2, we ran the view tests without any parameters specified. We 
noticed for a given mydb, one zone always performs better (almost about 60ms). 
This become more clear if we use "time curl 
Couchdb_QUERY_NODE:5984/mydb/mydoc?r=1 #from Client of the same zone" (r=1 
eliminated the delay from 2.a.iii before returning result). The preferred node 
(the first entry in "by_range" if i remember correctly), will always get pick 
for Couchdb_METALOOKUP_NODE, resulting in almost no network delay for the 
query. Where as the other two zones will always have the added 60ms. We then 
tweak the :5986/dbs/mydb doc to re-arrange the  "by_range" array and saw that 
the Couchdb_METALOOKUP_NODE follows the first one in the array. 
   
   Actually, I should check the notes later on this; there might be an except 
to this:
   1. If Couchdb_QUERY_NODE is one of the Couchdb_DATA_NODES, 2.b.2, I think 
there is some difference in how the process work. I will provide you with this 
info later after I find the notes. 
   
   And thanks for the insight on attachment! This is good to know. For what we 
do now, we should be OK (as long as this behavior also exists in BigCouch which 
is where we are upgrading from). I will have my team run the analysis against 
BigCouch and CouchDB next week and will provide the result here later. In 
general, we see attachments as bigger data retrieval and the higher layer 
application should have logic to handle more delay and caching; also, we are 
moving attachment to external storage in general and just use doc as a pointer 
to those external resource. We do want doc and view query be as optimized as 
possible (or at least consistent in all zones) for better user experience. 
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to