[jira] [Updated] (COUCHDB-2484) replication crashes

Gunther Gruber (JIRA) Mon, 01 Dec 2014 11:09:07 -0800

     [ 
https://issues.apache.org/jira/browse/COUCHDB-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gunther Gruber updated COUCHDB-2484:
------------------------------------
    Description: 
We are Using Couchdb Version 1.6 with 8.3T of data, biggest Database ist 2.1T.  
At this moment we switch to  new hardware with more storage space. We copied 
the files with rsync and started the replication. 

One system is already in sync, the other is doing the replication.

I appreciate that besides the errors in the log, the first system is now in 
sync.

The log looks like the following

Retrying POST request to http://replication:XXXX/database/_revs_diff in 0.5 
seconds due to error req_timedout


and then

 Mon, 01 Dec 2014 13:00:28 GMT] [error] [<0.27044.1>] ** Generic server 
<0.27044.1> terminating 
** Last message in was {'EXIT',<0.26965.1>,killed}
** When Server state == {state,<0.26965.1>,<0.27045.1>,40,
                            {httpdb,
                                "http://replication:[email protected]/sm_chemie/";,
                                nil,
                                [{"Accept","application/json"},
                                 {"User-Agent","CouchDB/1.2.0"}],
                                30000,
                                [{socket_options,
                                     [{recbuf,262144},
                                      {sndbuf,262144},
                                      {nodelay,true},
                                      {keepalive,true}]}],
                                10,250,<0.26966.1>,40},
                            {httpdb,
                                "http://replication:XXX@XXX:5984/sm_chemie/";,
                                nil,
                                [{"Accept","application/json"},
                                 {"User-Agent","CouchDB/1.2.0"}],
                                30000,
                                [{socket_options,
                                     [{recbuf,262144},
                                      {sndbuf,262144},
                                      {nodelay,true},
                                      {keepalive,true}]}],
                                10,250,<0.26968.1>,40},
                            [],nil,nil,nil,
                            {rep_stats,0,0,0,0,0},
                            nil,nil,
                            {batch,[],0}}
** Reason for termination == 
** killed

[Mon, 01 Dec 2014 13:00:28 GMT] [error] [<0.27042.1>] {error_report,<0.31.0>,
                       {<0.27042.1>,crash_report,
                        [[{initial_call,
                           {couch_replicator_worker,init,['Argument__1']}},
                          {pid,<0.27042.1>},
                          {registered_name,[]},
                          {error_info,
                           {exit,killed,
                            [{gen_server,terminate,6,
                              [{file,"gen_server.erl"},{line,747}]},
                             {proc_lib,init_p_do_apply,3,
                              [{file,"proc_lib.erl"},{line,227}]}]}},
                          {ancestors,
                           [<0.26965.1>,couch_rep_sup,couch_primary_services,
                            couch_server_sup,<0.32.0>]},
                          {messages,[]},
                          {links,[<0.27043.1>]},
                          {dictionary,
                           [{last_stats_report,{1417,438797,704976}}]},
                          {trap_exit,true},
                          {status,running},
                          {heap_size,377},
                          {stack_size,24},
                          {reductions,372}],
                         []]}}

It seems to me like a timeout and the replication task then exits. I allready 
played arround with the configuration setting with no succes. I can provide 
more information if needed.

/etc/couchdb/local.d/001-user_config.ini
[couchdb]
file_compression = snappy
max_dbs_open = 400

[httpd]
bind_address = ::
server_options = [{backlog, 128}, {acceptor_pool_size, 16}]
socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}, 
{keepalive, true}]

[couch_httpd_auth]
secret = 

[log_level_by_module]
couch_httpd = warning
couch_replicator = debug
couch_query_servers = warning 

[daemons]
httpsd = {couch_httpd, start_link, [https]}

[ssl]
cert_file = /etc/couchdb/ssl/certs/couchdb-couch1.prime.adns.de.pem
key_file =  /etc/couchdb/ssl/private/couchdb-couch1.prime.adns.de.pem
verify_ssl_certificates = false

[replicator]
worker_batch_size = 2000
worker_processes = 40
http_connections = 40
socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}, 
{keepalive, true}]


/etc/default/couchdb
# Sourced by init script for configuration.

COUCHDB_USER=couchdb
COUCHDB_STDOUT_FILE=/dev/null
COUCHDB_STDERR_FILE=/dev/null
COUCHDB_RESPAWN_TIMEOUT=5
COUCHDB_OPTIONS=

# 32 Threads to handle I/O
export ERL_FLAGS="+A 32"
# 8192 open files
export ERL_MAX_PORTS=8192
ulimit -n 8192

Current solution is to restart couchdb every other hour

  was:
We are Using Couchdb Version 1.6 with 8.3T of data, biggest Database ist 2.1T.  
At this moment we switch to  new hardware with more storage space. We copied 
the files with rsync and started the replication. 

One system is already in sync, the other is doing the replication.

I appreciate that besides the errors in the log, the first system is now in 
sync.

The log looks like the following

Retrying POST request to http://replication:XXXX/database/_revs_diff in 0.5 
seconds due to error req_timedout


and then

 Mon, 01 Dec 2014 13:00:28 GMT] [error] [<0.27044.1>] ** Generic server 
<0.27044.1> terminating 
** Last message in was {'EXIT',<0.26965.1>,killed}
** When Server state == {state,<0.26965.1>,<0.27045.1>,40,
                            {httpdb,
                                "http://replication:[email protected]/sm_chemie/";,
                                nil,
                                [{"Accept","application/json"},
                                 {"User-Agent","CouchDB/1.2.0"}],
                                30000,
                                [{socket_options,
                                     [{recbuf,262144},
                                      {sndbuf,262144},
                                      {nodelay,true},
                                      {keepalive,true}]}],
                                10,250,<0.26966.1>,40},
                            {httpdb,
                                "http://replication:XXX@XXX:5984/sm_chemie/";,
                                nil,
                                [{"Accept","application/json"},
                                 {"User-Agent","CouchDB/1.2.0"}],
                                30000,
                                [{socket_options,
                                     [{recbuf,262144},
                                      {sndbuf,262144},
                                      {nodelay,true},
                                      {keepalive,true}]}],
                                10,250,<0.26968.1>,40},
                            [],nil,nil,nil,
                            {rep_stats,0,0,0,0,0},
                            nil,nil,
                            {batch,[],0}}
** Reason for termination == 
** killed

[Mon, 01 Dec 2014 13:00:28 GMT] [error] [<0.27042.1>] {error_report,<0.31.0>,
                       {<0.27042.1>,crash_report,
                        [[{initial_call,
                           {couch_replicator_worker,init,['Argument__1']}},
                          {pid,<0.27042.1>},
                          {registered_name,[]},
                          {error_info,
                           {exit,killed,
                            [{gen_server,terminate,6,
                              [{file,"gen_server.erl"},{line,747}]},
                             {proc_lib,init_p_do_apply,3,
                              [{file,"proc_lib.erl"},{line,227}]}]}},
                          {ancestors,
                           [<0.26965.1>,couch_rep_sup,couch_primary_services,
                            couch_server_sup,<0.32.0>]},
                          {messages,[]},
                          {links,[<0.27043.1>]},
                          {dictionary,
                           [{last_stats_report,{1417,438797,704976}}]},
                          {trap_exit,true},
                          {status,running},
                          {heap_size,377},
                          {stack_size,24},
                          {reductions,372}],
                         []]}}

It seems to me like a timeout and the replication task then exits. I allready 
played arround with the configuration setting with no succes. I can provide 
more information if needed.

/etc/couchdb/local.d/001-user_config.ini
[couchdb]
file_compression = snappy
max_dbs_open = 400

[httpd]
bind_address = ::
server_options = [{backlog, 128}, {acceptor_pool_size, 16}]
socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}, 
{keepalive, true}]

[couch_httpd_auth]
secret = 

[log_level_by_module]
couch_httpd = warning
couch_replicator = debug
couch_query_servers = warning 

[daemons]
httpsd = {couch_httpd, start_link, [https]}

[ssl]
cert_file = /etc/couchdb/ssl/certs/couchdb-couch1.prime.adns.de.pem
key_file =  /etc/couchdb/ssl/private/couchdb-couch1.prime.adns.de.pem
verify_ssl_certificates = false

[replicator]
worker_batch_size = 2000
worker_processes = 40
http_connections = 40
socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}, 
{keepalive, true}]


/etc/default/couchdb
# Sourced by init script for configuration.

COUCHDB_USER=couchdb
COUCHDB_STDOUT_FILE=/dev/null
COUCHDB_STDERR_FILE=/dev/null
COUCHDB_RESPAWN_TIMEOUT=5
COUCHDB_OPTIONS=

# 32 Threads to handle I/O
export ERL_FLAGS="+A 32"
# 8192 open files
export ERL_MAX_PORTS=8192
ulimit -n 8192



> replication crashes
> -------------------
>
>                 Key: COUCHDB-2484
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2484
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Database Core
>    Affects Versions: 1.6.0
>            Reporter: Gunther Gruber
>            Priority: Minor
>
> We are Using Couchdb Version 1.6 with 8.3T of data, biggest Database ist 
> 2.1T.  At this moment we switch to  new hardware with more storage space. We 
> copied the files with rsync and started the replication. 
> One system is already in sync, the other is doing the replication.
> I appreciate that besides the errors in the log, the first system is now in 
> sync.
> The log looks like the following
> Retrying POST request to http://replication:XXXX/database/_revs_diff in 0.5 
> seconds due to error req_timedout
> and then
>  Mon, 01 Dec 2014 13:00:28 GMT] [error] [<0.27044.1>] ** Generic server 
> <0.27044.1> terminating 
> ** Last message in was {'EXIT',<0.26965.1>,killed}
> ** When Server state == {state,<0.26965.1>,<0.27045.1>,40,
>                             {httpdb,
>                                 "http://replication:[email protected]/sm_chemie/";,
>                                 nil,
>                                 [{"Accept","application/json"},
>                                  {"User-Agent","CouchDB/1.2.0"}],
>                                 30000,
>                                 [{socket_options,
>                                      [{recbuf,262144},
>                                       {sndbuf,262144},
>                                       {nodelay,true},
>                                       {keepalive,true}]}],
>                                 10,250,<0.26966.1>,40},
>                             {httpdb,
>                                 "http://replication:XXX@XXX:5984/sm_chemie/";,
>                                 nil,
>                                 [{"Accept","application/json"},
>                                  {"User-Agent","CouchDB/1.2.0"}],
>                                 30000,
>                                 [{socket_options,
>                                      [{recbuf,262144},
>                                       {sndbuf,262144},
>                                       {nodelay,true},
>                                       {keepalive,true}]}],
>                                 10,250,<0.26968.1>,40},
>                             [],nil,nil,nil,
>                             {rep_stats,0,0,0,0,0},
>                             nil,nil,
>                             {batch,[],0}}
> ** Reason for termination == 
> ** killed
> [Mon, 01 Dec 2014 13:00:28 GMT] [error] [<0.27042.1>] {error_report,<0.31.0>,
>                        {<0.27042.1>,crash_report,
>                         [[{initial_call,
>                            {couch_replicator_worker,init,['Argument__1']}},
>                           {pid,<0.27042.1>},
>                           {registered_name,[]},
>                           {error_info,
>                            {exit,killed,
>                             [{gen_server,terminate,6,
>                               [{file,"gen_server.erl"},{line,747}]},
>                              {proc_lib,init_p_do_apply,3,
>                               [{file,"proc_lib.erl"},{line,227}]}]}},
>                           {ancestors,
>                            [<0.26965.1>,couch_rep_sup,couch_primary_services,
>                             couch_server_sup,<0.32.0>]},
>                           {messages,[]},
>                           {links,[<0.27043.1>]},
>                           {dictionary,
>                            [{last_stats_report,{1417,438797,704976}}]},
>                           {trap_exit,true},
>                           {status,running},
>                           {heap_size,377},
>                           {stack_size,24},
>                           {reductions,372}],
>                          []]}}
> It seems to me like a timeout and the replication task then exits. I allready 
> played arround with the configuration setting with no succes. I can provide 
> more information if needed.
> /etc/couchdb/local.d/001-user_config.ini
> [couchdb]
> file_compression = snappy
> max_dbs_open = 400
> [httpd]
> bind_address = ::
> server_options = [{backlog, 128}, {acceptor_pool_size, 16}]
> socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}, 
> {keepalive, true}]
> [couch_httpd_auth]
> secret = 
> [log_level_by_module]
> couch_httpd = warning
> couch_replicator = debug
> couch_query_servers = warning 
> [daemons]
> httpsd = {couch_httpd, start_link, [https]}
> [ssl]
> cert_file = /etc/couchdb/ssl/certs/couchdb-couch1.prime.adns.de.pem
> key_file =  /etc/couchdb/ssl/private/couchdb-couch1.prime.adns.de.pem
> verify_ssl_certificates = false
> [replicator]
> worker_batch_size = 2000
> worker_processes = 40
> http_connections = 40
> socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}, 
> {keepalive, true}]
> /etc/default/couchdb
> # Sourced by init script for configuration.
> COUCHDB_USER=couchdb
> COUCHDB_STDOUT_FILE=/dev/null
> COUCHDB_STDERR_FILE=/dev/null
> COUCHDB_RESPAWN_TIMEOUT=5
> COUCHDB_OPTIONS=
> # 32 Threads to handle I/O
> export ERL_FLAGS="+A 32"
> # 8192 open files
> export ERL_MAX_PORTS=8192
> ulimit -n 8192
> Current solution is to restart couchdb every other hour



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (COUCHDB-2484) replication crashes

Reply via email to