Anyone? On Thu, Jun 20, 2019 at 5:59 PM Tiemen Ruiten <t.rui...@tech-lab.io> wrote:
> Hello, > > I'm trying to do a rolling upgrade of a MariaDB 10.3 Galera cluster to to > 10.4, following the instructions here > <https://mariadb.com/kb/en/library/upgrading-from-mariadb-103-to-mariadb-104-with-galera-cluster/>. > (I had to adjust the wsrep_provider variable for Galera 4 > to /usr/lib64/galera-4/libgalera_smm.so, not mentioned in the > instructions). After I start the mariadb service, one mysqld thread uses > 100% of one core for about 45 minutes, then I see the following in the log > and mariadb restarts, repeating the same process: > > Jun 20 16:24:44 rubidium mysqld: 2019-06-20 16:24:44 0 [Note] WSREP: > Trying to continue unpaused monitor > Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: REPL > Protocols: 9 (4, 2) > Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: > ####### My UUID: ddb4f4da-935f-11e9-bcff-8209857266f0 > Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: > Server mariadb4 connected to cluster at position > 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 with ID > ddb4f4da-935f-11e9-bcff-8209857266f0 > Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: > Server status change disconnected -> connected > Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: > wsrep_notify_cmd is not defined, skipping notification. > Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: State > transfer required: > Jun 20 16:24:47 rubidium mysqld: Group state: > 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 > Jun 20 16:24:47 rubidium mysqld: Local state: > 6a1fd0aa-4739-11e5-af88-42756e861245:7403368279 > Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: > Server status change connected -> joiner > Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: > wsrep_notify_cmd is not defined, skipping notification. > Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 0 [Note] WSREP: > Running: 'wsrep_sst_rsync --role 'joiner' --address '10.100.130.24' > --datadir '/var/lib/mysql/' --parent '21283' '' ''' > Jun 20 16:24:48 rubidium rsyncd[26235]: rsyncd version 3.1.2 starting, > listening on port 4444 > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: > Prepared SST request: rsync|10.100.130.24:4444/rsync_sst > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: Cert > index reset to 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 (proto: 9), > state transfer needed: yes > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: > Service thread queue flushed. > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: > ####### Assign initial position for certification: > 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644, protocol version: 4 > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: > ####### IST uuid:6a1fd0aa-4739-11e5-af88-42756e861245 f: 7403368280, l: > 7403466644, STRv: 2 > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: IST > receiver addr using tcp://10.100.130.24:4568 > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: > Prepared IST receiver for 7403368280-7403466644, listening at: tcp:// > 10.100.130.24:4568 > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: > Member 2.0 (mariadb4) requested state transfer from 'mariadb2'. Selected > 1.0 (mariadb2)(SYNCED) as donor. > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: > Shifting PRIMARY -> JOINER (TO: 7403946566) > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP: > Slave queue grew too long while trying to request state transfer 1 time(s). > Please make sure that there is at least one fully synced member in the > group. Application must be restarted. > Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP: > State transfer request failed unrecoverably: 35 (Resource deadlock > avoided). Most likely it is due to inability to communicate with the > cluster primary component. Restart required. > > On the donor, I see: > > 2019-06-20 16:24:44 0 [Note] WSREP: Trying to continue unpaused monitor > 2019-06-20 16:24:48 0 [Note] WSREP: Member 2.0 (mariadb4) requested state > transfer from 'mariadb2'. Selected 1.0 (mariadb2)(SYNCED) as donor. > 2019-06-20 16:24:48 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: > 7403946566) > 2019-06-20 16:24:48 18 [Note] WSREP: IST request: > 6a1fd0aa-4739-11e5-af88-42756e861245:7403368279-7403466644|tcp:// > 10.100.130.24:4568 > 2019-06-20 16:24:48 18 [Note] WSREP: wsrep_notify_cmd is not defined, > skipping notification. > 2019-06-20 16:24:48 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role > 'donor' --address '10.100.130.24:4444/rsync_sst' --socket > '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid > '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' > --bypass' > 2019-06-20 16:24:48 18 [Note] WSREP: sst_donor_thread signaled with 0 > 2019-06-20 16:24:48 0 [Note] WSREP: async IST sender starting to serve > tcp://10.100.130.24:4568 sending 7403368280-7403466644 > 2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567') > turning message relay requesting on, nonlive peers: tcp:// > 10.100.130.24:4567 > 2019-06-20 16:24:48 0 [Note] WSREP: declaring 116c0b4c at tcp:// > 10.100.130.22:4567 stable > 2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp:// > 10.100.130.24:4567) > 2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567') > turning message relay requesting off > 2019-06-20 16:24:48 0 [Note] WSREP: Node 116c0b4c state prim > 2019-06-20 16:24:48 0 [Note] WSREP: view(view_id(PRIM,116c0b4c,297) memb { > 116c0b4c,0 > 700b363f,0 > } joined { > } left { > } partitioned { > ddb4f4da,0 > }) > 2019-06-20 16:24:48 0 [Note] WSREP: save pc into disk > 2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp:// > 10.100.130.24:4567) > 2019-06-20 16:24:48 0 [Note] WSREP: New COMPONENT: primary = yes, > bootstrap = no, my_idx = 1, memb_num = 2 > 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. > 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: sent state msg: > 286f5ce4-9367-11e9-acee-4e4dec61d7c1 > 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg: > 286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 0 (mariadb1) > 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg: > 286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 1 (mariadb2) > 2019-06-20 16:24:48 0 [Note] WSREP: Quorum results: > version = 4, > component = PRIMARY, > conf_id = 269, > members = 2/2 (joined/total), > act_id = 7403946566, > last_appl. = 7403946444, > protocols = 0/9/3 (gcs/repl/appl), > group UUID = 6a1fd0aa-4739-11e5-af88-42756e861245 > 2019-06-20 16:24:48 0 [Note] WSREP: Flow-control interval: [23, 23] > 2019-06-20 16:24:48 0 [Note] WSREP: Trying to continue unpaused monitor > 2019-06-20 16:24:48 14 [Note] WSREP: New cluster view: global state: > 6a1fd0aa-4739-11e5-af88-42756e861245:7403946566, view# 270: Primary, number > of nodes: 2, my index: 1, protocol version 3 > 2019-06-20 16:24:48 14 [Note] WSREP: wsrep_notify_cmd is not defined, > skipping notification. > 2019-06-20 16:24:48 14 [Note] WSREP: REPL Protocols: 9 (4, 2) > 2019-06-20 16:24:48 14 [Note] WSREP: Assign initial position for > certification: 7403946566, protocol version: 4 > 2019-06-20 16:24:48 0 [Note] WSREP: Service thread queue flushed. > 2019-06-20 16:24:48 0 [ERROR] WSREP: async IST sender failed to serve > tcp://10.100.130.24:4568: ist send failed: asio.system:104', asio error > 'read: Connection reset by peer': 104 (Connection reset by peer) > at galera/src/ist.cpp:send():790 > 2019-06-20 16:24:48 0 [Note] WSREP: async IST sender served > WSREP_SST: [INFO] Bypassing state dump. (20190620 16:24:48.149) > rsync: safe_read failed to read 1 bytes [sender]: Connection reset by peer > (104) > rsync error: error in rsync protocol data stream (code 12) at io.c(276) > [sender=3.1.2] > 2019-06-20 16:24:48 0 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync > --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket > '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid > '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' > --bypass > 2019-06-20 16:24:48 0 [ERROR] WSREP: Process completed with error: > wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' > --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' > --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id > '0' --bypass: 12 (Cannot allocate memory) > 2019-06-20 16:24:48 0 [ERROR] WSREP: Command did not run: wsrep_sst_rsync > --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket > '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid > '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' > --bypass > 2019-06-20 16:24:48 0 [Warning] WSREP: Could not find peer: > ddb4f4da-935f-11e9-bcff-8209857266f0 > 2019-06-20 16:24:48 0 [Warning] WSREP: 1.0 (mariadb2): State transfer to > -1.-1 (left the group) failed: -12 (Cannot allocate memory) > > What is going on? > > > > >
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp