Hey all,

I’m trying to upgrade a database from 1.6.1 to 2.0.0/master/0c579b98 and I’m 
seeing a number of issues.

Any help is greatly appreciated. Since this is our official upgrade path for 
2.0, this has to be rock-solid.

Feel free to break out individual issue into new threads, if it helps keeping 
things organised.

Scroll down for detailed information about the database, and machine 
configurations.


## The Scenario

Replication is running on 2.0, pulling from 1.6.1 over the EC2 internal ip 
address.

## The Issues

1. repeated log entries for “write quorum for <targetdb> failed”. I’ve seen 
this in other contexts as well, why is this happening and should it?


2. getting a lot of “cassim_metadata_cache changes listener died” from all 
nodes about every 5 seconds. What’s up with these?

 - 2015-07-26 08:30:34.400 [error] Undefined emulator Error in process 
<0.14633.26> on node '[email protected]' with exit value: 
{function_clause,[{cassim_metadata_cache,changes_callback,[waiting_for_updates,"0"]},{fabric_view_changes,keep_sending_changes,8},{fabric_view_changes,go,5}]}

 - 2015-07-26 08:30:39.401 [notice] [email protected] <0.314.0> 
cassim_metadata_cache changes listener died 
{function_clause,[{cassim_metadata_cache,changes_callback,[waiting_for_updates,"0"]},{fabric_view_changes,keep_sending_changes,8},{fabric_view_changes,go,5}]}


3. A number of  Replicator, request PUT to 
"http://0.0.0.0:15984/<target>/edbef049aae9c8828f336534984e5e4f" failed due to 
error {error,req_timedout} this happens for regular docs, local docs, and 
_bulk_docs. The machine is basically idle (see below for details), the three 
beam.smp processes over at 200-250% CPU each, io is 98% idle (it’s mostly logs 
being written), the machine is basically idle.


4, two issues from couch_replicator_api_wrap.erl:

 - 2015-07-26 08:22:49.849 [error] Undefined <0.3546.0> gen_server <0.3546.0> 
terminated with reason: no function clause matching 
couch_replicator_api_wrap:'-update_docs/4-fun-2-'(400, [{"Server","MochiWeb/1.0 
(Any of you quaids got a smint?)"},{"Date","Sun, 26 Jul 2015 08:22:49 
G..."},...], null, 
[<<"{\"_id\":\”12345678\",\"_rev\":\"1050-ee6c7d54276b43bc937470e44e0283f2\”,...

 - 2015-07-26 08:30:08.514 [notice] [email protected] <0.6360.26> Retrying GET to 
http://172.31.10.115:5984/generic_db_name/12348765?revs=true&open_revs=%5B%228-b2826209867a286c76e6a2762f10b1e0%22%5D&latest=true
 in 1.0 seconds due to error 
{function_clause,[{couch_replicator_api_wrap,run_user_fun,4},{couch_replicator_api_wrap,receive_docs,4},{couch_replicator_api_wrap,receive_docs_loop,6},{couch_replicator_api_wrap,'-open_doc_revs/6-fun-4-',7}]}



5. Eventually, replication reliably stops with an “invalid_ejson” error, but I 
don’t yet know if that’s because of the api_wrap issue or something else.



6. Replication has stopped numerous times until I got here, I didn’t have time 
to look into why that happened, but I have all the logs, but they are 130MB 
total, so it’ll be a while.


7. When replication ran, it replicated at a rate of about 1000 docs/s, which 
felt a little slow, but I have no experience there, yet.


## Source Database Info

{
  "db_name": "generic_db_name",
  "doc_count": 6808004,
  "doc_del_count": 18856,
  "update_seq": 8044450,
  "purge_seq": 0,
  "compact_running": false,
  "disk_size": 16293904519,
  "data_size": 11711402577,
  "instance_start_time": "1437834202967309",
  "disk_format_version": 6,
  "committed_update_seq": 8044450
}

Mostly small-ish docs, no big outliers, no attachments.

Source machine info:

Amazon EC2 m3.xlarge 4 cores, 64bit, 16GB RAM, 100GB SSD, 3000 provisioned 
iops. FFM Availability Zone.

Standard EC2 Ubuntu, Erlang R16B03 (I know, but that’s not the problem here, 
this couch behaves fine).

Target machine info:

Amazon EC2 m4.10xlarge, 40 cores, 64bit, 160GB RAM, 100GB SSD, 3000 iops (not 
provisioned), 10GigE networking, FFM AZ.

The latency between both instances is very small and the network throughput is 
(copying a file is between 100 and 200MB/s).

Standard EC2 Amazon Linux (Redhat/Fedora derivative), Erlang R14B04. CouchDB 
2.0 running as dev/run


Thanks!
Jan
-- 

Reply via email to