> On 27 Jul 2015, at 13:46, Jan Lehnardt <[email protected]> wrote:
> 
> 
>> On 26 Jul 2015, at 19:03, Jan Lehnardt <[email protected]> wrote:
>> 
>> 
>>> On 26 Jul 2015, at 14:47, Jan Lehnardt <[email protected]> wrote:
>>> 
>>> Hey all,
>>> 
>>> I’m trying to upgrade a database from 1.6.1 to 2.0.0/master/0c579b98 and 
>>> I’m seeing a number of issues.
>>> 
>>> Any help is greatly appreciated. Since this is our official upgrade path 
>>> for 2.0, this has to be rock-solid.
>>> 
>>> Feel free to break out individual issue into new threads, if it helps 
>>> keeping things organised.
>>> 
>>> Scroll down for detailed information about the database, and machine 
>>> configurations.
>>> 
>>> 
>>> ## The Scenario
>>> 
>>> Replication is running on 2.0, pulling from 1.6.1 over the EC2 internal ip 
>>> address.
>>> 
>>> ## The Issues
>>> 
>>> 1. repeated log entries for “write quorum for <targetdb> failed”. I’ve seen 
>>> this in other contexts as well, why is this happening and should it?
>>> 
>>> 
>>> 2. getting a lot of “cassim_metadata_cache changes listener died” from all 
>>> nodes about every 5 seconds. What’s up with these?
>>> 
>>> - 2015-07-26 08:30:34.400 [error] Undefined emulator Error in process 
>>> <0.14633.26> on node '[email protected]' with exit value: 
>>> {function_clause,[{cassim_metadata_cache,changes_callback,[waiting_for_updates,"0"]},{fabric_view_changes,keep_sending_changes,8},{fabric_view_changes,go,5}]}
>>> 
>>> - 2015-07-26 08:30:39.401 [notice] [email protected] <0.314.0> 
>>> cassim_metadata_cache changes listener died 
>>> {function_clause,[{cassim_metadata_cache,changes_callback,[waiting_for_updates,"0"]},{fabric_view_changes,keep_sending_changes,8},{fabric_view_changes,go,5}]}
>> 
>> Alexander pointed to 
>> https://github.com/apache/couchdb-fabric/commit/b6659c8344c9a028b5ab451be41a991801c2ab3d#diff-2af86e058b4e7a4a99a7c5a12da6debdR96
>>  which is part of Adam’s recent work on COUCHDB-2724.
>> 
>> Adam, any insights? :)
> 
> Bob says this should fix it: 
> https://gist.github.com/rnewson/b9efd4f45e1c62315816
> 
> In the meantime, I reverted the changes optimisation commit on fabric and now 
> I’m getting this once it is time to start replicating more documents after 
> the existing update sequence is all caught up with during replication:
> 
> https://gist.github.com/janl/75804904dad73d17ed0e
> 
> During which I found out that there *are* a few small attachments in the 
> source database, sorry for the confusion about this earlier.
> 
> I still see function_clause errors after the revert, Bob suggests to wait for 
> Adam to comment.

Bob’s latest commits fixed the replication issue, but I’d love to hear about 
the other things I mentioned.

Best
Jan
--
 
> 
> Best
> Jan
> --
> 
> 
>> 
>> Best
>> Jan
>> --
>> 
>> 
>> 
>>> 
>>> 
>>> 3. A number of  Replicator, request PUT to 
>>> "http://0.0.0.0:15984/<target>/edbef049aae9c8828f336534984e5e4f" failed due 
>>> to error {error,req_timedout} this happens for regular docs, local docs, 
>>> and _bulk_docs. The machine is basically idle (see below for details), the 
>>> three beam.smp processes over at 200-250% CPU each, io is 98% idle (it’s 
>>> mostly logs being written), the machine is basically idle.
>>> 
>>> 
>>> 4, two issues from couch_replicator_api_wrap.erl:
>>> 
>>> - 2015-07-26 08:22:49.849 [error] Undefined <0.3546.0> gen_server 
>>> <0.3546.0> terminated with reason: no function clause matching 
>>> couch_replicator_api_wrap:'-update_docs/4-fun-2-'(400, 
>>> [{"Server","MochiWeb/1.0 (Any of you quaids got a smint?)"},{"Date","Sun, 
>>> 26 Jul 2015 08:22:49 G..."},...], null, 
>>> [<<"{\"_id\":\”12345678\",\"_rev\":\"1050-ee6c7d54276b43bc937470e44e0283f2\”,...
>>> 
>>> - 2015-07-26 08:30:08.514 [notice] [email protected] <0.6360.26> Retrying GET 
>>> to 
>>> http://172.31.10.115:5984/generic_db_name/12348765?revs=true&open_revs=%5B%228-b2826209867a286c76e6a2762f10b1e0%22%5D&latest=true
>>>  in 1.0 seconds due to error 
>>> {function_clause,[{couch_replicator_api_wrap,run_user_fun,4},{couch_replicator_api_wrap,receive_docs,4},{couch_replicator_api_wrap,receive_docs_loop,6},{couch_replicator_api_wrap,'-open_doc_revs/6-fun-4-',7}]}
>>> 
>>> 
>>> 
>>> 5. Eventually, replication reliably stops with an “invalid_ejson” error, 
>>> but I don’t yet know if that’s because of the api_wrap issue or something 
>>> else.
>>> 
>>> 
>>> 
>>> 6. Replication has stopped numerous times until I got here, I didn’t have 
>>> time to look into why that happened, but I have all the logs, but they are 
>>> 130MB total, so it’ll be a while.
>>> 
>>> 
>>> 7. When replication ran, it replicated at a rate of about 1000 docs/s, 
>>> which felt a little slow, but I have no experience there, yet.
>>> 
>>> 
>>> ## Source Database Info
>>> 
>>> {
>>> "db_name": "generic_db_name",
>>> "doc_count": 6808004,
>>> "doc_del_count": 18856,
>>> "update_seq": 8044450,
>>> "purge_seq": 0,
>>> "compact_running": false,
>>> "disk_size": 16293904519,
>>> "data_size": 11711402577,
>>> "instance_start_time": "1437834202967309",
>>> "disk_format_version": 6,
>>> "committed_update_seq": 8044450
>>> }
>>> 
>>> Mostly small-ish docs, no big outliers, no attachments.
>>> 
>>> Source machine info:
>>> 
>>> Amazon EC2 m3.xlarge 4 cores, 64bit, 16GB RAM, 100GB SSD, 3000 provisioned 
>>> iops. FFM Availability Zone.
>>> 
>>> Standard EC2 Ubuntu, Erlang R16B03 (I know, but that’s not the problem 
>>> here, this couch behaves fine).
>>> 
>>> Target machine info:
>>> 
>>> Amazon EC2 m4.10xlarge, 40 cores, 64bit, 160GB RAM, 100GB SSD, 3000 iops 
>>> (not provisioned), 10GigE networking, FFM AZ.
>>> 
>>> The latency between both instances is very small and the network throughput 
>>> is (copying a file is between 100 and 200MB/s).
>>> 
>>> Standard EC2 Amazon Linux (Redhat/Fedora derivative), Erlang R14B04. 
>>> CouchDB 2.0 running as dev/run
>>> 
>>> 
>>> Thanks!
>>> Jan
>>> -- 
>>> 
>> 
>> -- 
>> Professional Support for Apache CouchDB:
>> http://www.neighbourhood.ie/couchdb-support/
>> 
> 
> -- 
> Professional Support for Apache CouchDB:
> http://www.neighbourhood.ie/couchdb-support/
> 

-- 
Professional Support for Apache CouchDB:
http://www.neighbourhood.ie/couchdb-support/

Reply via email to