I know about the "issue" with quorum and ZFS when given an entire disk. The quorum was added *after* pool was created. Then even if it was not the case (it was) then it would not cause these additional checksum errors within the pool.
When it comes to scsi-2 reservation - the cluster is connected to Sun 6540 disk arrays which have other clusters connected to it as well and working fine. Because the corruption happen within the pool as well I'm pretty sure there is some issue with a driver and/or firmware or eventually a fc switch (less likely). re cmm buffer - i actually intercepted writes to it when it was happening and it was complaining about wrong checksum as you can see in my original email. Unfortunately after last changes (adding different qd and then re-adding the original one, etc.) old entries are already gone... thanks for looking into it ps. and no, the pool has not been imported simultaneously on both nodes On 04/02/2010 18:20, Ellard Roush wrote: > Hi Robert, > > There is a common problem that affects people with > your configuration. > > ZFS recommends that you place an entire disk in the zpool. > When that happens ZFS formats the disk. > The format operation destroys any quorum information on the disk. > If you configured the disk as a quorum device and then added > that disk to a ZFS zpool, this sequence of operations would > cause the quorum information to be destroyed. > > You can use a disk in a zpool as a quorum device. > The Sun Cluster documentation states that you must > add the disk to the ZFS zpool first and then > configure the disk as a quorum device. > > There are other possible issues. Sometimes the vendor of the storage > device does not properly test the SCSI-2 Reservations or SCSI-3 PGR. > Sun Cluster has an audit trail of all changes to information on the > quorum device (plus other membership subsystem operation). > Use mdb to look at the following memory resident debug print buffer > > mdb> *cmm_dbg_buf/s > > This will provide more information. > > Regards, > Ellard > > On 02/04/10 08:33, Robert Milkowski wrote: >> Hi, >> >> S10, SC3.2 + patches, Generic_142900-03, 2x T5220 with QLE2462 >> connected to 6540s. >> >> We started to observe below messages yesterday at both nodes at the >> same time after several weeks of running: >> >> <pre> >> XXX cl_runtime: [ID 856360 kern.warning] WARNING: QUORUM_GENERIC: >> quorum_read_keys error: Reading the registration keys failed on >> quorum device /dev/did/rdsk/d7s2 with error 22. >> XXX cl_runtime: [ID 868277 kern.warning] WARNING: CMM: Erstwhile >> online quorum device /dev/did/rdsk/d7s2 (qid 1) is inaccessible now. >> >> d7 is a quorum device and it was marked by cluster as offline: >> >> # clq status >> >> === Cluster Quorum === >> >> --- Quorum Votes Summary from latest node reconfiguration --- >> >> Needed Present Possible >> ------ ------- -------- >> 2 3 3 >> >> >> --- Quorum Votes by Node (current status) --- >> >> Node Name Present Possible Status >> --------- ------- -------- ------ >> XXXXXXXXXXXXXXX 1 1 Online >> YYYYYYYYYYYYYYY 1 1 Online >> >> >> --- Quorum Votes by Device (current status) --- >> >> Device Name Present Possible Status >> ----------- ------- -------- ------ >> d7 0 1 Offline >> >> >> >> By looking at the source code I found that the above message is >> printed from within quorum_device_generic_impl::quorum_read_keys() >> and it will only happen if quorum_pgre_key_read() returns with return >> code 22 (actually any other than 0 or EACCESS but we already know >> that the rc is 22 from the syslog message). >> >> Now quorum_pgre_key_read() calls quorum_scsi_sector_read() and passes >> its return code as its own. >> The quorum_scsi_sector_read() can possibly return with error if >> quorum_ioctl_with_retries() return with error or if there is a >> checksum mismatch. >> >> This is the relevant source code: >> 406 int >> 407 quorum_scsi_sector_read( >> [...] >> 449 error = quorum_ioctl_with_retries(vnode_ptr, USCSICMD, >> (intptr_t)&ucmd, >> 450 &retval); >> 451 if (error != 0) { >> 452 CMM_TRACE(("quorum_scsi_sector_read: ioctl USCSICMD " >> 453 "returned error (%d).\n", error)); >> 454 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH); >> 455 return (error); >> 456 } >> 457 458 // >> 459 // Calculate and compare the checksum if check_data is true. >> 460 // Also, validate the pgres_id string at the beg of the >> sector. >> 461 // >> 462 if (check_data) { >> 463 PGRE_CALCCHKSUM(chksum, sector, iptr); >> 464 465 // Compare the checksum. >> 466 if (PGRE_GETCHKSUM(sector) != chksum) { >> 467 CMM_TRACE(("quorum_scsi_sector_read: " >> 468 "checksum mismatch.\n")); >> 469 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH); >> 470 return (EINVAL); >> 471 } >> 472 473 // >> 474 // Validate the PGRE string at the beg of the sector. >> 475 // It should contain PGRE_ID_LEAD_STRING[1|2]. >> 476 // >> 477 if ((os::strncmp((char *)sector->pgres_id, >> PGRE_ID_LEAD_STRING1, >> 478 strlen(PGRE_ID_LEAD_STRING1)) != 0) && >> 479 (os::strncmp((char *)sector->pgres_id, >> PGRE_ID_LEAD_STRING2, >> 480 strlen(PGRE_ID_LEAD_STRING2)) != 0)) { >> 481 CMM_TRACE(("quorum_scsi_sector_read: pgre id " >> 482 "mismatch. The sector id is %s.\n", >> 483 sector->pgres_id)); >> 484 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH); >> 485 return (EINVAL); >> 486 } >> 487 488 } >> 489 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH); >> 490 491 return (error); >> 492 } >> >> >> >> 56 -> __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ >> 6308555744942019 enter >> 56 -> __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ >> 6308555744957176 enter >> 56 <- __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ >> 6308555745089857 rc: 0 >> 56 -> __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745108310 enter >> 56 -> __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ >> 6308555745120941 enter >> 56 -> __1cCosHsprintf6FpcpkcE_v_ 6308555745134231 enter >> 56 <- __1cCosHsprintf6FpcpkcE_v_ 6308555745148729 rc: >> 2890607504684 >> 56 <- __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745162898 rc: >> 1886718112 >> 56 <- __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745175529 rc: >> 1886718112 >> 56 <- __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ >> 6308555745188599 rc: 22 >> >> From the above output we know that quorum_ioctl_with_retries() >> returns with 0 so it must be a checksum mismatch! >> As CMM_TRACE() is being called above and there are two of them in the >> code lets check which one it is: >> >> 21 -> __1cNdbg_print_bufIdbprintf6MpcE_v_ 6309628794339298 >> CMM_DEBUG: quorum_scsi_sector_read: checksum mismatch. >> >> >> So this is where it fails: >> >> 462 if (check_data) { >> 463 PGRE_CALCCHKSUM(chksum, sector, iptr); >> 464 465 // Compare the checksum. >> 466 if (PGRE_GETCHKSUM(sector) != chksum) { >> 467 CMM_TRACE(("quorum_scsi_sector_read: " >> 468 "checksum mismatch.\n")); >> 469 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH); >> 470 return (EINVAL); >> 471 } >> >> >> >> By adding another quorum device, them removing d7 and adding it again >> (and removing the extra one) everything came back to normal. However >> I wonder how did we end-up there? HBA? firmware? 6540's firmware? SC >> bug? >> >> # fcinfo hba-port -l >> HBA Port WWN: 2100001b3291014c >> OS Device Name: /dev/cfg/c2 >> Manufacturer: QLogic Corp. >> Model: 375-3356-02 >> Firmware Version: 05.01.00 >> FCode/BIOS Version: BIOS: 2.10; fcode: 2.4; EFI: 2.4; >> Serial Number: 0402R00-0927731201 >> Driver Name: qlc >> Driver Version: 20090519-2.31 >> Type: N-port >> State: online >> Supported Speeds: 1Gb 2Gb 4Gb Current Speed: 4Gb Node >> WWN: 2000001b3291014c >> Link Error Statistics: >> Link Failure Count: 0 >> Loss of Sync Count: 0 >> Loss of Signal Count: 0 >> Primitive Seq Protocol Error Count: 0 >> Invalid Tx Word Count: 0 >> Invalid CRC Count: 0 >> HBA Port WWN: 2101001b32b1014c >> OS Device Name: /dev/cfg/c3 >> Manufacturer: QLogic Corp. >> Model: 375-3356-02 >> Firmware Version: 05.01.00 >> FCode/BIOS Version: BIOS: 2.10; fcode: 2.4; EFI: 2.4; >> Serial Number: 0402R00-0927731201 >> Driver Name: qlc >> Driver Version: 20090519-2.31 >> Type: N-port >> State: online >> Supported Speeds: 1Gb 2Gb 4Gb Current Speed: 4Gb Node >> WWN: 2001001b32b1014c >> Link Error Statistics: >> Link Failure Count: 0 >> Loss of Sync Count: 0 >> Loss of Signal Count: 0 >> Primitive Seq Protocol Error Count: 0 >> Invalid Tx Word Count: 0 >> Invalid CRC Count: 0 >> >> >> 142084-02 is applied and by a quick glance I can't see anything >> related to the above which might be addressed by 142084-03. >> >> Each 6540 presents one 2TB LUN and we are using ZFS to mirror between >> them. One of LUNs is used as the quorum device as well. >> Since it looks like data was corrupted for quorum the pool itself >> might be affected as well so I run scrub and after couple of hours I >> got so far: >> >> # zpool status -v XXXX >> pool: XXXX >> state: DEGRADED >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: scrub in progress for 2h29m, 56.94% done, 1h52m to go >> config: >> >> NAME STATE READ WRITE >> CKSUM >> XXXX DEGRADED 0 >> 0 14 >> mirror DEGRADED 0 >> 0 28 >> c4t600A0B800029AF0000006CD4486B3B05d0 DEGRADED 0 >> 0 28 too many errors >> c4t600A0B800029B74600004255486B6A4Fd0 DEGRADED 0 >> 0 28 too many errors >> >> errors: Permanent errors have been detected in the following files: >> >> /XXXX/XXXX/XXXXXXXX/YYYYYY.dbf >> >> >> I can't see any other errors in the system nor in logs or from FMA. >> The HBA firmware seems to be the latest version as well. >> >> Because of the corruption within the zfs pool I think that while the >> issue manifested itself first as a problem with the quorum device it >> has rather nothing to do with the SC itself and data corruption is >> happening somewhere. The other interesting thing is that so far all >> the corrupted blocks detected by ZFS were corrupted on both sides of >> the mirror. Since each side is a separate disk array I think the >> corruption must probably have originated on the server itself rather >> than on SAN or disk arrays. Now the HBA is a dual-ported card and >> both paths are used (MPxIO). The issue is also rather not caused by >> ZFS itself as it shouldn't have affect the SC keys on the quorum device. >> >> >> Any ideas? >> </pre> >> > _______________________________________________ > ha-clusters-discuss mailing list > ha-clusters-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss