[
https://issues.apache.org/jira/browse/COUCHDB-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875356#comment-15875356
]
Nick Vatamaniuc commented on COUCHDB-3302:
------------------------------------------
I think I know what is happening. MP parser is created on the coordinating
node. Then we send its pid to all the other nodes via a regular update_docs
fabric request. Then each node talks directly to the MP parser (via the dist
protocol!) to get bytes. While that happens, nothing happens at the fabric
level. So fabric's update_docs request eventually times out.
Now fabric_doc_update does handle a {{ attachment_chunk_received }} guessing in
attempt to keep the request alive after each chunk is consume, but nobody
sends.
So crossing fingers and hoping attachments can be fixed just by having the
nodes reply with that message periodically from their MP consumer callbacks.
> Attachment replication over low bandwidth network connections
> -------------------------------------------------------------
>
> Key: COUCHDB-3302
> URL: https://issues.apache.org/jira/browse/COUCHDB-3302
> Project: CouchDB
> Issue Type: Bug
> Components: Replication
> Reporter: Jan Lehnardt
> Attachments: attach_large.py, replication-failure.log,
> replication-failure-target.log
>
>
> Setup:
> Two CouchDB instances `source` (5981) and `target` (5983) with a 2MBit
> network connection (simulated locally with traffic shaping, see way below for
> an example).
> {noformat}
> git clone https://github.com/apache/couchdb.git
> cd couchdb
> ./configure --disable-docs --disable-fauxton
> make release
> cd ..
> cp -r couchdb/rel/couchdb source
> cp -r couchdb/rel/couchdb target
> # set up local ini: chttpd / port: 5981 / 5983
> # set up vm.args: [email protected] / [email protected]
> # no admins
> Start both CouchDB in their own terminal windows: ./bin/couchdb
> # create all required databases, and our `t` test database
> curl -X PUT http://127.0.0.1:598{1,3}/{_users,_replicator,_global_changes,t}
> # create 64MB attachments
> dd if=/dev/urandom of=att-64 bs=1024 count=65536
> # create doc on source
> curl -X PUT http://127.0.0.1:5981/t/doc1/att_64 -H 'Content-Type:
> application/octet-stream' -d @att-64
> # replicate to target
> curl -X POST http://127.0.0.1:5981/_replicate -Hcontent-type:application/json
> -d '{"source":"http://127.0.0.1:5981/t","target":"http://127.0.0.1:5983/t"}'
> {noformat}
> With the traffic shaping in place, the replication call doesn’t return, and
> eventually CouchDB fails with:
> {noformat}
> [error] 2017-02-16T17:37:30.488990Z [email protected] emulator --------
> Error in process <0.15811.0> on node '[email protected]' with exit value:
> {{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,591}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,642}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}
> [error] 2017-02-16T17:37:30.490610Z [email protected] <0.8721.0> --------
> Replicator, request PUT to "http://127.0.0.1:5983/t/doc1?new_edits=false"
> failed due to error {error,
> {'EXIT',
> {{{nocatch,{mp_parser_died,noproc}},
> [{couch_att,'-foldl/4-fun-0-',3,
> [{file,"src/couch_att.erl"},{line,591}]},
> {couch_att,fold_streamed_data,4,
> [{file,"src/couch_att.erl"},{line,642}]},
> {couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,595}]},
> {couch_httpd_multipart,atts_to_mp,4,
> [{file,"src/couch_httpd_multipart.erl"},{line,208}]}]},
> {gen_server,call,
> [<0.15778.0>,
> {send_req,
> {{url,"http://127.0.0.1:5983/t/doc1?new_edits=false",
> "127.0.0.1",5983,undefined,undefined,
> "/t/doc1?new_edits=false",http,ipv4_address},
> [{"Accept","application/json"},
> {"Content-Length",33194202},
> {"Content-Type",
> "multipart/related;
> boundary=\"0dea87076009b928b191e0b456375c93\""},
> {"User-Agent","CouchDB-Replicator/2.0.0"}],
> put,
> {#Fun<couch_replicator_api_wrap.11.59841038>,
>
> {<<"{\"_id\":\"doc1\",\"_rev\":\"1-15ae43c5b53de894b936c08db31d537c\",\"_revisions\":{\"start\":1,\"ids\":[\"15ae43c5b53de894b936c08db31d537c\"]},\"_attachments\":{\"att_64\":{\"content_type\":\"application/octet-stream\",\"revpos\":1,\"digest\":\"md5-s3AA0cYvwOzrSFTaALGh8g==\",\"length\":33193656,\"follows\":true}}}">>,
> [{att,<<"att_64">>,<<"application/octet-stream">>,
> 33193656,33193656,
> <<179,112,0,209,198,47,192,236,235,72,84,218,0,177,
> 161,242>>,
> 1,
> {follows,<0.8720.0>,#Ref<0.0.1.23804>},
> identity}],
> <<"0dea87076009b928b191e0b456375c93">>,33194202}},
> [{response_format,binary},
> {inactivity_timeout,30000},
> {socket_options,[{keepalive,true},{nodelay,false}]}],
> infinity}},
> infinity]}}}}
> {noformat}
> Expected Behaviour:
> Replication eventually succeeds.
> Appendix:
> Set up Traffic Shaping on a Mac:
> {noformat}
> (cat /etc/pf.conf && echo "dummynet-anchor \"reptest\"" && echo "anchor
> \"reptest\"") | sudo pfctl -f -
> echo "dummynet in quick proto tcp from any to any port 5983 pipe 3" | sudo
> pfctl -a reptest -f -
> echo "dummynet out quick proto tcp from any to any port 5983 pipe 4" | sudo
> pfctl -a reptest -f -
> sudo dnctl pipe 3 config bw 2Mbit/s
> sudo dnctl pipe 4 config bw 2Mbit/s
> sudo pfctl -E
> {noformat}
> Reset with:
> {noformat}
> sudo pfctl -f /etc/pf.conf
> sudo dnctl flush
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)