[ha-clusters-discuss] data corruption

Robert Milkowski Thu, 04 Feb 2010 17:10:46 +0000

putting storage-discuss@ and zfs-discuss@ as well.


On 04/02/2010 16:33, Robert Milkowski wrote:
> Hi,
>
> S10, SC3.2 + patches, Generic_142900-03, 2x T5220 with QLE2462 connected to 
> 6540s.
>
> We started to observe below messages yesterday at both nodes at the same time 
> after several weeks of running:
>
> <pre>
> XXX cl_runtime: [ID 856360 kern.warning] WARNING: QUORUM_GENERIC: 
> quorum_read_keys error: Reading the registration keys failed on quorum device 
> /dev/did/rdsk/d7s2 with error 22.
> XXX cl_runtime: [ID 868277 kern.warning] WARNING: CMM: Erstwhile online 
> quorum device /dev/did/rdsk/d7s2 (qid 1) is inaccessible now.
>
> d7 is a quorum device and it was marked by cluster as offline:
>
> # clq status
>
> === Cluster Quorum ===
>
> --- Quorum Votes Summary from latest node reconfiguration ---
>
>              Needed   Present   Possible
>              ------   -------   --------
>              2        3         3
>
>
> --- Quorum Votes by Node (current status) ---
>
> Node Name             Present     Possible     Status
> ---------             -------     --------     ------
> XXXXXXXXXXXXXXX     1           1            Online
> YYYYYYYYYYYYYYY     1           1            Online
>
>
> --- Quorum Votes by Device (current status) ---
>
> Device Name       Present      Possible      Status
> -----------       -------      --------      ------
> d7                0            1             Offline
>
>
>
> By looking at the source code I found that the above message is printed from 
> within quorum_device_generic_impl::quorum_read_keys() and it will only happen 
> if quorum_pgre_key_read() returns with return code 22 (actually any other 
> than 0 or EACCESS but we already know that the rc is 22 from the syslog 
> message).
>
> Now quorum_pgre_key_read() calls quorum_scsi_sector_read() and passes its 
> return code as its own.
> The quorum_scsi_sector_read() can possibly return with error if 
> quorum_ioctl_with_retries() return with error or if there is a checksum 
> mismatch.
>
> This is the relevant source code:
>      406 int
>      407 quorum_scsi_sector_read(
> [...]
>      449      error = quorum_ioctl_with_retries(vnode_ptr, USCSICMD, 
> (intptr_t)&ucmd,
>      450      &retval);
>      451      if (error != 0) {
>      452              CMM_TRACE(("quorum_scsi_sector_read: ioctl USCSICMD "
>      453                  "returned error (%d).\n", error));
>      454              kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
>      455              return (error);
>      456      }
>      457
>      458      //
>      459      // Calculate and compare the checksum if check_data is true.
>      460      // Also, validate the pgres_id string at the beg of the sector.
>      461      //
>      462      if (check_data) {
>      463              PGRE_CALCCHKSUM(chksum, sector, iptr);
>      464
>      465              // Compare the checksum.
>      466              if (PGRE_GETCHKSUM(sector) != chksum) {
>      467                      CMM_TRACE(("quorum_scsi_sector_read: "
>      468                          "checksum mismatch.\n"));
>      469                      kmem_free(ucmd.uscsi_rqbuf, 
> (size_t)SENSE_LENGTH);
>      470                      return (EINVAL);
>      471              }
>      472
>      473              //
>      474              // Validate the PGRE string at the beg of the sector.
>      475              // It should contain PGRE_ID_LEAD_STRING[1|2].
>      476              //
>      477              if ((os::strncmp((char *)sector->pgres_id, 
> PGRE_ID_LEAD_STRING1,
>      478                  strlen(PGRE_ID_LEAD_STRING1)) != 0)&&
>      479                  (os::strncmp((char *)sector->pgres_id, 
> PGRE_ID_LEAD_STRING2,
>      480                  strlen(PGRE_ID_LEAD_STRING2)) != 0)) {
>      481                      CMM_TRACE(("quorum_scsi_sector_read: pgre id "
>      482                          "mismatch. The sector id is %s.\n",
>      483                          sector->pgres_id));
>      484                      kmem_free(ucmd.uscsi_rqbuf, 
> (size_t)SENSE_LENGTH);
>      485                      return (EINVAL);
>      486              }
>      487
>      488      }
>      489      kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
>      490
>      491      return (error);
>      492 }
>
>
>
>   56  ->  __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 
> 6308555744942019 enter
>   56    ->  __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555744957176 
> enter
>   56<- __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555745089857 rc: 0
>   56    ->  __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745108310 enter
>   56      ->  __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745120941 enter
>   56        ->  __1cCosHsprintf6FpcpkcE_v_      6308555745134231 enter
>   56<- __1cCosHsprintf6FpcpkcE_v_      6308555745148729 rc: 2890607504684
>   56<- __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745162898 rc: 
> 1886718112
>   56<- __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745175529 rc: 1886718112
>   56<- __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 
> 6308555745188599 rc: 22
>
>  From the above output we know that quorum_ioctl_with_retries() returns with 
> 0 so it must be a checksum mismatch!
> As CMM_TRACE() is being called above and there are two of them in the code 
> lets check which one it is:
>
>   21  ->  __1cNdbg_print_bufIdbprintf6MpcE_v_   6309628794339298 CMM_DEBUG: 
> quorum_scsi_sector_read: checksum mismatch.
>
>
> So this is where it fails:
>
>      462      if (check_data) {
>      463              PGRE_CALCCHKSUM(chksum, sector, iptr);
>      464
>      465              // Compare the checksum.
>      466              if (PGRE_GETCHKSUM(sector) != chksum) {
>      467                      CMM_TRACE(("quorum_scsi_sector_read: "
>      468                          "checksum mismatch.\n"));
>      469                      kmem_free(ucmd.uscsi_rqbuf, 
> (size_t)SENSE_LENGTH);
>      470                      return (EINVAL);
>      471              }
>
>
>
> By adding another quorum device, them removing d7 and adding it again (and 
> removing the extra one) everything came back to normal. However I wonder how 
> did we end-up there? HBA? firmware? 6540's firmware? SC bug?
>
> # fcinfo hba-port -l
> HBA Port WWN: 2100001b3291014c
>       OS Device Name: /dev/cfg/c2
>       Manufacturer: QLogic Corp.
>       Model: 375-3356-02
>       Firmware Version: 05.01.00
>       FCode/BIOS Version:  BIOS: 2.10; fcode: 2.4; EFI: 2.4;
>       Serial Number: 0402R00-0927731201
>       Driver Name: qlc
>       Driver Version: 20090519-2.31
>       Type: N-port
>       State: online
>       Supported Speeds: 1Gb 2Gb 4Gb
>       Current Speed: 4Gb
>       Node WWN: 2000001b3291014c
>       Link Error Statistics:
>               Link Failure Count: 0
>               Loss of Sync Count: 0
>               Loss of Signal Count: 0
>               Primitive Seq Protocol Error Count: 0
>               Invalid Tx Word Count: 0
>               Invalid CRC Count: 0
> HBA Port WWN: 2101001b32b1014c
>       OS Device Name: /dev/cfg/c3
>       Manufacturer: QLogic Corp.
>       Model: 375-3356-02
>       Firmware Version: 05.01.00
>       FCode/BIOS Version:  BIOS: 2.10; fcode: 2.4; EFI: 2.4;
>       Serial Number: 0402R00-0927731201
>       Driver Name: qlc
>       Driver Version: 20090519-2.31
>       Type: N-port
>       State: online
>       Supported Speeds: 1Gb 2Gb 4Gb
>       Current Speed: 4Gb
>       Node WWN: 2001001b32b1014c
>       Link Error Statistics:
>               Link Failure Count: 0
>               Loss of Sync Count: 0
>               Loss of Signal Count: 0
>               Primitive Seq Protocol Error Count: 0
>               Invalid Tx Word Count: 0
>               Invalid CRC Count: 0
>
>
> 142084-02 is applied and by a quick glance I can't see anything related to 
> the above which might be addressed by 142084-03.
>
> Each 6540 presents one 2TB LUN and we are using ZFS to mirror between them. 
> One of LUNs is used as the quorum device as well.
> Since it looks like data was corrupted for quorum the pool itself might be 
> affected as well so I run scrub and after couple of hours I got so far:
>
> # zpool status -v XXXX
>    pool: XXXX
>   state: DEGRADED
> status: One or more devices has experienced an error resulting in data
>       corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>       entire pool from backup.
>     see: http://www.sun.com/msg/ZFS-8000-8A
>   scrub: scrub in progress for 2h29m, 56.94% done, 1h52m to go
> config:
>
>       NAME                                       STATE     READ WRITE CKSUM
>       XXXX                                       DEGRADED     0     0    14
>         mirror                                   DEGRADED     0     0    28
>           c4t600A0B800029AF0000006CD4486B3B05d0  DEGRADED     0     0    28  
> too many errors
>           c4t600A0B800029B74600004255486B6A4Fd0  DEGRADED     0     0    28  
> too many errors
>
> errors: Permanent errors have been detected in the following files:
>
>          /XXXX/XXXX/XXXXXXXX/YYYYYY.dbf
>
>
> I can't see any other errors in the system nor in logs or from FMA. The HBA 
> firmware seems to be the latest version as well.
>
> Because of the corruption within the zfs pool I think that while the issue 
> manifested itself first as a problem with the quorum device it has rather 
> nothing to do with the SC itself and data corruption is happening somewhere. 
> The other interesting thing is that  so far all the corrupted blocks detected 
> by ZFS were corrupted on both sides of the mirror. Since each side is a 
> separate disk array I think the corruption must probably have originated on 
> the server itself rather than on SAN or disk arrays. Now the HBA is a 
> dual-ported card and both paths are used (MPxIO). The issue is also rather 
> not caused by ZFS itself as it shouldn't have affect the SC keys on the 
> quorum device.
>
>
> Any ideas?
> </pre>
>
>

[ha-clusters-discuss] data corruption

Reply via email to