> On 26 Jul 2015, at 14:47, Jan Lehnardt <[email protected]> wrote:
> 
> Hey all,
> 
> I’m trying to upgrade a database from 1.6.1 to 2.0.0/master/0c579b98 and I’m 
> seeing a number of issues.
> 
> Any help is greatly appreciated. Since this is our official upgrade path for 
> 2.0, this has to be rock-solid.
> 
> Feel free to break out individual issue into new threads, if it helps keeping 
> things organised.
> 
> Scroll down for detailed information about the database, and machine 
> configurations.
> 
> 
> ## The Scenario
> 
> Replication is running on 2.0, pulling from 1.6.1 over the EC2 internal ip 
> address.
> 
> ## The Issues
> 
> 1. repeated log entries for “write quorum for <targetdb> failed”. I’ve seen 
> this in other contexts as well, why is this happening and should it?
> 
> 
> 2. getting a lot of “cassim_metadata_cache changes listener died” from all 
> nodes about every 5 seconds. What’s up with these?
> 
> - 2015-07-26 08:30:34.400 [error] Undefined emulator Error in process 
> <0.14633.26> on node '[email protected]' with exit value: 
> {function_clause,[{cassim_metadata_cache,changes_callback,[waiting_for_updates,"0"]},{fabric_view_changes,keep_sending_changes,8},{fabric_view_changes,go,5}]}
> 
> - 2015-07-26 08:30:39.401 [notice] [email protected] <0.314.0> 
> cassim_metadata_cache changes listener died 
> {function_clause,[{cassim_metadata_cache,changes_callback,[waiting_for_updates,"0"]},{fabric_view_changes,keep_sending_changes,8},{fabric_view_changes,go,5}]}

Alexander pointed to 
https://github.com/apache/couchdb-fabric/commit/b6659c8344c9a028b5ab451be41a991801c2ab3d#diff-2af86e058b4e7a4a99a7c5a12da6debdR96
 which is part of Adam’s recent work on COUCHDB-2724.

Adam, any insights? :)

Best
Jan
--



> 
> 
> 3. A number of  Replicator, request PUT to 
> "http://0.0.0.0:15984/<target>/edbef049aae9c8828f336534984e5e4f" failed due 
> to error {error,req_timedout} this happens for regular docs, local docs, and 
> _bulk_docs. The machine is basically idle (see below for details), the three 
> beam.smp processes over at 200-250% CPU each, io is 98% idle (it’s mostly 
> logs being written), the machine is basically idle.
> 
> 
> 4, two issues from couch_replicator_api_wrap.erl:
> 
> - 2015-07-26 08:22:49.849 [error] Undefined <0.3546.0> gen_server <0.3546.0> 
> terminated with reason: no function clause matching 
> couch_replicator_api_wrap:'-update_docs/4-fun-2-'(400, 
> [{"Server","MochiWeb/1.0 (Any of you quaids got a smint?)"},{"Date","Sun, 26 
> Jul 2015 08:22:49 G..."},...], null, 
> [<<"{\"_id\":\”12345678\",\"_rev\":\"1050-ee6c7d54276b43bc937470e44e0283f2\”,...
> 
> - 2015-07-26 08:30:08.514 [notice] [email protected] <0.6360.26> Retrying GET 
> to 
> http://172.31.10.115:5984/generic_db_name/12348765?revs=true&open_revs=%5B%228-b2826209867a286c76e6a2762f10b1e0%22%5D&latest=true
>  in 1.0 seconds due to error 
> {function_clause,[{couch_replicator_api_wrap,run_user_fun,4},{couch_replicator_api_wrap,receive_docs,4},{couch_replicator_api_wrap,receive_docs_loop,6},{couch_replicator_api_wrap,'-open_doc_revs/6-fun-4-',7}]}
> 
> 
> 
> 5. Eventually, replication reliably stops with an “invalid_ejson” error, but 
> I don’t yet know if that’s because of the api_wrap issue or something else.
> 
> 
> 
> 6. Replication has stopped numerous times until I got here, I didn’t have 
> time to look into why that happened, but I have all the logs, but they are 
> 130MB total, so it’ll be a while.
> 
> 
> 7. When replication ran, it replicated at a rate of about 1000 docs/s, which 
> felt a little slow, but I have no experience there, yet.
> 
> 
> ## Source Database Info
> 
> {
>  "db_name": "generic_db_name",
>  "doc_count": 6808004,
>  "doc_del_count": 18856,
>  "update_seq": 8044450,
>  "purge_seq": 0,
>  "compact_running": false,
>  "disk_size": 16293904519,
>  "data_size": 11711402577,
>  "instance_start_time": "1437834202967309",
>  "disk_format_version": 6,
>  "committed_update_seq": 8044450
> }
> 
> Mostly small-ish docs, no big outliers, no attachments.
> 
> Source machine info:
> 
> Amazon EC2 m3.xlarge 4 cores, 64bit, 16GB RAM, 100GB SSD, 3000 provisioned 
> iops. FFM Availability Zone.
> 
> Standard EC2 Ubuntu, Erlang R16B03 (I know, but that’s not the problem here, 
> this couch behaves fine).
> 
> Target machine info:
> 
> Amazon EC2 m4.10xlarge, 40 cores, 64bit, 160GB RAM, 100GB SSD, 3000 iops (not 
> provisioned), 10GigE networking, FFM AZ.
> 
> The latency between both instances is very small and the network throughput 
> is (copying a file is between 100 and 200MB/s).
> 
> Standard EC2 Amazon Linux (Redhat/Fedora derivative), Erlang R14B04. CouchDB 
> 2.0 running as dev/run
> 
> 
> Thanks!
> Jan
> -- 
> 

-- 
Professional Support for Apache CouchDB:
http://www.neighbourhood.ie/couchdb-support/

Reply via email to