Hi, All of a sudden, a SAN pair which was running without any problems for six months, now decides to fall over every couple of hours.
The logs I have to go on are below: Oct 29 19:09:23 iscsi2cl6 last message repeated 12 times Oct 29 19:09:23 iscsi2cl6 kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:6 by sid:844424967684608 (Function Complete) Oct 29 19:09:24 iscsi2cl6 lrmd: [4677]: info: RA output: (ClusterIP:monitor:stderr) Converted dotted-quad netmask to CIDR as: 24 Oct 29 19:09:49 iscsi2cl6 last message repeated 24 times Oct 29 19:09:49 iscsi2cl6 kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:6 by sid:1125899927618048 (Function Complete) Oct 29 19:09:49 iscsi2cl6 kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:6 by sid:1407374904328704 (Function Complete) Oct 29 19:09:49 iscsi2cl6 kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:6 by sid:281474997486080 (Function Complete) Oct 29 19:09:50 iscsi2cl6 lrmd: [4677]: info: RA output: (ClusterIP:monitor:stderr) Converted dotted-quad netmask to CIDR as: 24 Oct 29 19:09:50 iscsi2cl6 kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:6 by sid:562949974196736 (Function Complete) Oct 29 19:09:51 iscsi2cl6 lrmd: [4677]: info: RA output: (ClusterIP:monitor:stderr) Converted dotted-quad netmask to CIDR as: 24 Oct 29 19:09:53 iscsi2cl6 last message repeated 2 times Oct 29 19:09:53 iscsi2cl6 kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:6 by sid:844424967684608 (Function Complete) Oct 29 19:09:53 iscsi2cl6 kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:6 by sid:844424967684608 (Function Complete) Oct 29 19:09:54 iscsi2cl6 lrmd: [4677]: info: RA output: (ClusterIP:monitor:stderr) Converted dotted-quad netmask to CIDR as: 24 Oct 29 19:10:05 iscsi2cl6 last message repeated 11 times Oct 29 19:10:06 iscsi2cl6 kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:6 by sid:1407374904328704 (Function Complete) Oct 29 19:10:06 iscsi2cl6 last message repeated 4 times Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 2077806177s +3584; pending: 2077806177s +3584 Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 2077806184s +512; pending: 2077806184s +512 Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 1693425337s +3584; pending: 1693425337s +3584 Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 1693425344s +512; pending: 1693425344s +512 Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 1693425321s +3584; pending: 1693425321s +3584 Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 1693425328s +512; pending: 1693425328s +512 Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 1693425313s +3584; pending: 1693425313s +3584 Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 1693425320s +512; pending: 1693425320s +512 Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 1743088585s +3584; pending: 1743088585s +3584 Oct 29 19:10:06 iscsi2cl6 kernel: block drbd0: istiod1[4695] Concurrent local write detected! [DISCARD L] new: 1743088592s +512; pending: 1743088592s +512 Oct 29 19:10:06 iscsi2cl6 lrmd: [4677]: info: RA output: (ClusterIP:monitor:stderr) Converted dotted-quad netmask to CIDR as: 24 After this event, both members of the SAN pair reboot. It is very disruptive, as it's killing the VMs using this SAN, requiring fsck's after failure. The load on the SAN doesn't need to be very high for this happen. Running the following: CentOS 5 with kernel 2.6.18-274.7.1.el5 IET 1.4.20.2 Pacemaker 1.0.11-1.2.el5 DRBD 8.3.11 Googling appears to reveal many possible reasons for these Abort Tasks, any help appreciated :( Regards, James _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
