2013/7/4 "Артём Н." <[email protected]> > Конфиги DRBD и Pacemaker во вложении. > > Split brain: > version: 8.3.11 (api:88/proto:86-96) > srcversion: F937DCB2E5D83C6CCE4A6C9 > 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- > ns:2796 nr:1832 dw:4628 dr:124530 al:6 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 > wo:f oos:0 > 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- > ns:0 nr:0 dw:0 dr:784 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 > > 10: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- > ns:104696 nr:1136 dw:105832 dr:186265 al:82 bm:3 lo:0 pe:0 ua:0 ap:0 > ep:1 > wo:f oos:0 > > > Периодически всё работает нормально: > 0:system_data Connected Primary/Primary UpToDate/UpToDate C r----- > 1:vm_volumes Connected Primary/Primary UpToDate/UpToDate C r----- > 10:repository SyncTarget Primary/Primary Inconsistent/UpToDate C r----- > [=========>..........] sync'ed: 54.2% (432504/933888)K > ... > 0:system_data Connected Primary/Primary UpToDate/UpToDate C r----- > /mnt/system ocfs2 90G 968M 90G 2% > 1:vm_volumes Connected Primary/Primary UpToDate/UpToDate C r----- > 10:repository Connected Primary/Primary UpToDate/UpToDate C r----- > /mnt/repo > ocfs2 250G 82G 169G 33% > ... > > Master/Slave Set: ms_drbd_repo [p_drbd_repo] > Masters: [ cluster-data-1 cluster-data-2 ] > Master/Slave Set: ms_drbd_system_data [p_drbd_system_data] > Masters: [ cluster-data-1 cluster-data-2 ] > Master/Slave Set: ms_drbd_vm_volumes [p_drbd_vm_volumes] > Masters: [ cluster-data-1 cluster-data-2 ] > Clone Set: ce_ocfs2mgmt [g_ocfs2mgmt] > Started: [ cluster-data-1 cluster-data-2 ] > Clone Set: ce_mysql [p_mysql] > Started: [ cluster-data-1 cluster-data-2 ] > Clone Set: ce_system_fs [p_system_fs] > Started: [ cluster-data-1 cluster-data-2 ] > Clone Set: ce_rabbitmq [p_rabbitmq] > Started: [ cluster-data-1 cluster-data-2 ] > Clone Set: ce_repo_fs [p_repo_fs] > Started: [ cluster-data-1 cluster-data-2 ] > Clone Set: ce_data_ip [p_data_ip] > Started: [ cluster-data-1 cluster-data-2 ] > Clone Set: ce_webserver [p_webserver] > Started: [ cluster-data-1 cluster-data-2 ] > > > В Primary/Primary заработало после перевода в Primay/Secondary и обратно: > > master-max у ms_drbd_system_data был 1. > > root@cluster-data-1:~# drbd-overview > 0:system_data StandAlone Primary/Unknown UpToDate/DUnknown r----- > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > [>....................] sync'ed: 1.7% (729260/741688)Mfinish: > 1:44:23 > speed: 119,200 (112,632) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > > root@cluster-data-1:~# drbd-overview > ^C > > root@cluster-data-1:~# crm resource stop ms_drbd_system_data > root@cluster-data-1:~# crm configure edit ms_drbd_system_data > > root@cluster-data-1:~# drbd-overview > 0:system_data StandAlone Primary/Unknown UpToDate/DUnknown r----- > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > [>....................] sync'ed: 3.0% (719600/741688)Mfinish: > 1:46:58 > speed: 114,784 (113,096) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > > root@cluster-data-1:~# drbd-overview > 0:system_data StandAlone Primary/Unknown UpToDate/DUnknown r----- > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > [>....................] sync'ed: 3.3% (717228/741688)Mfinish: > 1:45:44 > speed: 115,744 (113,340) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > > root@cluster-data-1:~# crm resource manage ms_drbd_system_data > root@cluster-data-1:~# drbd-overview > 0:system_data Unconfigured . . . . > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > [>....................] sync'ed: 4.2% (710848/741688)Mfinish: > 1:45:17 > speed: 115,212 (113,600) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > > root@cluster-data-1:~# crm resource start ms_drbd_system_data > root@cluster-data-1:~# drbd-overview > 0:system_data WFBitMapS Primary/Secondary UpToDate/Consistent C > r----- > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > [>....................] sync'ed: 4.8% (706760/741688)Mfinish: > 1:41:44 > speed: 118,540 (113,908) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > root@cluster-data-1:~# drbd-overview > 0:system_data Connected Primary/Secondary UpToDate/UpToDate C > r----- > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > [>...................] sync'ed: 5.2% (703360/741688)Mfinish: > 1:44:21 > speed: 115,008 (113,764) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > > root@cluster-data-1:~# crm configure edit ms_drbd_system_data > root@cluster-data-1:~# crm configure show ms_drbd_system_data > ms ms_drbd_system_data p_drbd_system_data \ > meta notify="true" clone-max="2" master-max="2" > target-role="Started" > is-managed="true" > root@cluster-data-1:~# drbd-overview > 0:system_data Connected Primary/Primary UpToDate/UpToDate C r----- > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C r----- > [>...................] sync'ed: 7.0% (690308/741688)Mfinish: > 1:40:21 > speed: 117,388 (114,128) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C r----- > > Иногда получается так, что вообще не останавливается ресурс: > root@cluster-data-1:~# cat /proc/drbd > version: 8.3.11 (api:88/proto:86-96) > srcversion: F937DCB2E5D83C6CCE4A6C9 > 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- > ns:0 nr:0 dw:600 dr:263205 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f > oos:696 > 1: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C r----- > ns:0 nr:5327932 dw:5327616 dr:32 al:0 bm:325 lo:2 pe:7494 ua:2 ap:1 > ep:1 > wo:f oos:853695940 > [>....................] sync'ed: 0.7% (833684/838888)Mfinish: > 2:03:44 > speed: 114,984 (108,724) want: 1,000,001 K/sec > > 10: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- > ns:84 nr:12 dw:44 dr:263073 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f > oos:0 > root@cluster-data-1:~# drbd-overview > 0:system_data StandAlone Primary/Unknown UpToDate/DUnknown r----- > /mnt/system ocfs2 90G 862M 90G 1% > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > > [>....................] sync'ed: 0.7% (833252/838888)Mfinish: > 1:58:12 > speed: 120,292 (108,924) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > /mnt/repo ocfs2 250G 82G 169G 33% > root@cluster-data-1:~# drbd-overview > 0:system_data StandAlone Primary/Unknown UpToDate/DUnknown r----- > /mnt/system ocfs2 90G 862M 90G 1% > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > > [>....................] sync'ed: 1.0% (830752/838888)Mfinish: > 2:01:35 > speed: 116,584 (109,628) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > /mnt/repo ocfs2 250G 82G 169G 33% > root@cluster-data-1:~# crm resource restart ms_drbd_system_data > INFO: ordering ms_drbd_system_data to stop > waiting for stop to finish > > ...................................................................................................................................................................................................................................................................................................................................................................................... > done > INFO: ordering ms_drbd_system_data to start > root@cluster-data-1:~# drbd-overview > 0:system_data StandAlone Primary/Unknown UpToDate/DUnknown r----- > /mnt/system ocfs2 90G 862M 90G 1% > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > > [>....................] sync'ed: 4.3% (803584/838888)Mfinish: > 1:58:20 > speed: 115,884 (112,964) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > /mnt/repo ocfs2 250G 82G 169G 33% > root@cluster-data-1:~# crm configure show ms_drbd_system_data > ms ms_drbd_system_data p_drbd_system_data \ > meta notify="true" clone-max="2" master-max="2" > target-role="Started" > root@cluster-data-1:~# crm configure edit ms_drbd_system_data > root@cluster-data-1:~# crm configure show ms_drbd_system_data > ms ms_drbd_system_data p_drbd_system_data \ > meta notify="true" clone-max="2" master-max="1" > target-role="Started" > root@cluster-data-1:~# crm resource restart ms_drbd_system_data > INFO: ordering ms_drbd_system_data to stop > waiting for stop to finish > > .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................^CCtrl-C, > leaving > root@cluster-data-1:~# drbd-overview > 0:system_data StandAlone Primary/Unknown UpToDate/DUnknown r----- > /mnt/system ocfs2 90G 862M 90G 1% > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > > [=>..................] sync'ed: 10.2% (754088/838888)Mfinish: > 1:52:40 > speed: 114,200 (113,804) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > /mnt/repo ocfs2 250G 82G 169G 33% > root@cluster-data-1:~# crm resource stop ms_drbd_system_data > root@cluster-data-1:~# drbd-overview > 0:system_data StandAlone Primary/Unknown UpToDate/DUnknown r----- > /mnt/system ocfs2 90G 862M 90G 1% > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > > [=>..................] sync'ed: 10.5% (750956/838888)Mfinish: > 1:51:34 > speed: 114,840 (113,688) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > /mnt/repo ocfs2 250G 82G 169G 33% > root@cluster-data-1:~# crm resource stop ms_drbd_system_data > root@cluster-data-1:~# drbd-overview > 0:system_data StandAlone Primary/Unknown UpToDate/DUnknown r----- > /mnt/system ocfs2 90G 862M 90G 1% > 1:vm_volumes SyncTarget Primary/Primary Inconsistent/UpToDate C > r----- > > [=>..................] sync'ed: 10.9% (748088/838888)Mfinish: > 1:46:49 > speed: 119,500 (113,804) want: 1,000,001 K/sec > 10:repository Connected Primary/Primary UpToDate/UpToDate C > r----- > /mnt/repo ocfs2 250G 82G 169G 33% > >
Настойки выглядят адекватно, но смущает одна вещь - у вас вообще выключены все хендлеры - так и задумано? На счет переодическое выпадения в Unknown - оно так с обоих нод выглядит? Что в dmesg в районе split-brain? У меня две версии - проблема с сетью (что маловероятно) или диски не успевают (вполне вероятно)

