> On 26 Jul 2015, at 14:47, Jan Lehnardt <[email protected]> wrote: > > Hey all, > > I’m trying to upgrade a database from 1.6.1 to 2.0.0/master/0c579b98 and I’m > seeing a number of issues. > > Any help is greatly appreciated. Since this is our official upgrade path for > 2.0, this has to be rock-solid. > > Feel free to break out individual issue into new threads, if it helps keeping > things organised. > > Scroll down for detailed information about the database, and machine > configurations. > > > ## The Scenario > > Replication is running on 2.0, pulling from 1.6.1 over the EC2 internal ip > address. > > ## The Issues > > 1. repeated log entries for “write quorum for <targetdb> failed”. I’ve seen > this in other contexts as well, why is this happening and should it? > > > 2. getting a lot of “cassim_metadata_cache changes listener died” from all > nodes about every 5 seconds. What’s up with these? > > - 2015-07-26 08:30:34.400 [error] Undefined emulator Error in process > <0.14633.26> on node '[email protected]' with exit value: > {function_clause,[{cassim_metadata_cache,changes_callback,[waiting_for_updates,"0"]},{fabric_view_changes,keep_sending_changes,8},{fabric_view_changes,go,5}]} > > - 2015-07-26 08:30:39.401 [notice] [email protected] <0.314.0> > cassim_metadata_cache changes listener died > {function_clause,[{cassim_metadata_cache,changes_callback,[waiting_for_updates,"0"]},{fabric_view_changes,keep_sending_changes,8},{fabric_view_changes,go,5}]}
Alexander pointed to https://github.com/apache/couchdb-fabric/commit/b6659c8344c9a028b5ab451be41a991801c2ab3d#diff-2af86e058b4e7a4a99a7c5a12da6debdR96 which is part of Adam’s recent work on COUCHDB-2724. Adam, any insights? :) Best Jan -- > > > 3. A number of Replicator, request PUT to > "http://0.0.0.0:15984/<target>/edbef049aae9c8828f336534984e5e4f" failed due > to error {error,req_timedout} this happens for regular docs, local docs, and > _bulk_docs. The machine is basically idle (see below for details), the three > beam.smp processes over at 200-250% CPU each, io is 98% idle (it’s mostly > logs being written), the machine is basically idle. > > > 4, two issues from couch_replicator_api_wrap.erl: > > - 2015-07-26 08:22:49.849 [error] Undefined <0.3546.0> gen_server <0.3546.0> > terminated with reason: no function clause matching > couch_replicator_api_wrap:'-update_docs/4-fun-2-'(400, > [{"Server","MochiWeb/1.0 (Any of you quaids got a smint?)"},{"Date","Sun, 26 > Jul 2015 08:22:49 G..."},...], null, > [<<"{\"_id\":\”12345678\",\"_rev\":\"1050-ee6c7d54276b43bc937470e44e0283f2\”,... > > - 2015-07-26 08:30:08.514 [notice] [email protected] <0.6360.26> Retrying GET > to > http://172.31.10.115:5984/generic_db_name/12348765?revs=true&open_revs=%5B%228-b2826209867a286c76e6a2762f10b1e0%22%5D&latest=true > in 1.0 seconds due to error > {function_clause,[{couch_replicator_api_wrap,run_user_fun,4},{couch_replicator_api_wrap,receive_docs,4},{couch_replicator_api_wrap,receive_docs_loop,6},{couch_replicator_api_wrap,'-open_doc_revs/6-fun-4-',7}]} > > > > 5. Eventually, replication reliably stops with an “invalid_ejson” error, but > I don’t yet know if that’s because of the api_wrap issue or something else. > > > > 6. Replication has stopped numerous times until I got here, I didn’t have > time to look into why that happened, but I have all the logs, but they are > 130MB total, so it’ll be a while. > > > 7. When replication ran, it replicated at a rate of about 1000 docs/s, which > felt a little slow, but I have no experience there, yet. > > > ## Source Database Info > > { > "db_name": "generic_db_name", > "doc_count": 6808004, > "doc_del_count": 18856, > "update_seq": 8044450, > "purge_seq": 0, > "compact_running": false, > "disk_size": 16293904519, > "data_size": 11711402577, > "instance_start_time": "1437834202967309", > "disk_format_version": 6, > "committed_update_seq": 8044450 > } > > Mostly small-ish docs, no big outliers, no attachments. > > Source machine info: > > Amazon EC2 m3.xlarge 4 cores, 64bit, 16GB RAM, 100GB SSD, 3000 provisioned > iops. FFM Availability Zone. > > Standard EC2 Ubuntu, Erlang R16B03 (I know, but that’s not the problem here, > this couch behaves fine). > > Target machine info: > > Amazon EC2 m4.10xlarge, 40 cores, 64bit, 160GB RAM, 100GB SSD, 3000 iops (not > provisioned), 10GigE networking, FFM AZ. > > The latency between both instances is very small and the network throughput > is (copying a file is between 100 and 200MB/s). > > Standard EC2 Amazon Linux (Redhat/Fedora derivative), Erlang R14B04. CouchDB > 2.0 running as dev/run > > > Thanks! > Jan > -- > -- Professional Support for Apache CouchDB: http://www.neighbourhood.ie/couchdb-support/
