nickimho edited a comment on issue #2329: Add option to enforce fetching data 
from local shards, instead of from shards on on remote nodes(if data present on 
local node)
URL: https://github.com/apache/couchdb/issues/2329#issuecomment-573249534
 
 
   We tested some behavior with 3 zone cluster (each zone with 5 nodes, n=3, 
q=1, and placement is one in each zone{a,b,c}). For us, we use network 
impairment tools so that there is 60ms RTD between each zone. We used 
CouchDB2.3.1
   
        1. Terms/Definition
                   a. Client – This is the host that initiates the query to 
couchdb's port 5984
                   b. Couchdb_QUERY_NODE – This is the couchdb node in cluster 
that receives the database query from Client on port 5984. This node may or may 
NOT be the node that holds shard for the database. 
                   c. Couchdb_METALOOKUP_NODE – This is the couchdb node that 
Couchdb_QUERY_NODE queries for some meta info (not sure what it is). 
Couchdb_METALOOKUP_NODE is a node in Couchdb_DATA_NODES. The selection of this 
Couchdb_METALOOKUP_NODE
                                    i The selection of Couchdb_METALOOKUP_NODE 
is based on "by_range" key in the couchdb:5986/dbs/mydb. The first one in the 
array ia picked.
                   d. Couchdb_DATA_NODES – This is the set of couchdb nodes 
that actually hold a copy of the database asked by the query.
        2.  General data flow we observed:
                   a.  General data flow for doc query:
                                        i.  Client -> Couchdb_QUERY_NODE:5984
                                        ii. If Couchdb_QUERY_NODE NOT is NOT 
Couchdb_DATA_NODES,  Couchdb_QUERY_NODE -> Couchdb_METALOOKUP_NODE:11500  
                                                1. This selection is 
determinitic based on 1.c.i. Suppose Couchdb_DATA_NODES in zonea for mydb is 
first in "by_range" key, it will always be queried for this phase. This makes 
queries into mydb from zonec and zonb having an additional 60ms RTD network 
delay compared to zonea.  
                                        iii. Couchdb_QUERY_NODE -> “three 
Couchdb_DATA_NODES”:11500
                                                1. Once enough 
Couchdb_DATA_NODE’s (default read quorum is 2 when n=3) returns data, this 
phase stops
                                        iv.  Couchdb_QUERY_NODE->Client with 
query result
                   b. View query largely follows the same as doc. Except for 
the following:
                                        i. Couchdb_QUERY_NODE seems to cache 
the View definition/metadata
                                                1. During the first query to 
/mydb/_view/myview, it will retrieve the the view doc following 2.a process
                                                        a. subsequent query to 
/mydb/_view/myview would bypass this. 
                                        2. When Couchdb_QUERY_NODE actually 
retrieve the myview result, it seems to ONLY query the Couchdb_DATA_NODES in 
the SAME zone as itself. This is good as it saves bandwidth for large returns 
between zones.
   
   We haven't tested attachment retrieve yet, but it seems to me that it should 
follow the same view query logic in 2.b.2 if not already. We will try to test 
this some time next week. 
   
   Also, not sure if this needs to be a different ticket, but we would really 
like to see 2.a.ii.1 to be optimized so that it would query its local zone 
Couchdb_METALOOKUP_NODE first. Currently, we plan to workaround this by change 
the "by_range" order in 5986/dbs/mydb to favorite the primary zone for our 
service. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to