Has anybody else noticed issues with HyperV R2 clustered against a COMSTAR 
iSCSI target on snv_129?

I recently upgraded from sxce 125 to sxce 129.  Since then, attempting to do 
heavy IO (specifically, moving VMs around to try to get the benefits of the 
dedup feature) results in the cluster failing the shared iSCSI disk.  
Everything was fine in sxce 125, but now failed in sxce snv_129.

My setup is: two HyperV R2 nodes with up-to-date patches, clustered, 3 iscsi 
targets (on separate subnets), using MPIO on the HyperV R2 boxes against 3 
iSCSI disks (each available on all 3 iSCSI targets).  That's 6 iSCSI 
connections (3 from each HyperV box).  The iSCSI target is SXCE snv_129

I noticed that there have been changes related to persistent reservations and 
sequence number marking in the COMSTAR sbd code.  Unfortunately, I have no idea 
what those changes are attempting to accomplish. (What is the best way to 
browse this source code at a source-code-control repository level?   Mercurial? 
 The genunix repository only claims to be up to snv_111 [or maybe later, I 
forget, but definitely not up to 129].)  Because of these changes, I think the 
following message from my Windows box is relevant (full message is below):

  * Initiator could not find a match for the initiator task tag in the received 
PDU. Dump data contains the entire iSCSI header.


The most obvious symptom to note is that the "iostat -xn 1" display shows 
progressively less and less data, then all IO ceases and eventually the cluster 
offlines the disk.  Later, I issue an "svcadm restart stmf" command, at which 
point the cluster (usually) can online the iSCSI disks again.  The Windows box 
informs me that a task cannot be executed.  Until stmf is restarted, the 
cluster will not bring the disks back online.  Once it has been restarted, 
generally the cluster will recover.

I'll work on a dtrace script and subsequent data collection in the coming days. 
 If you have any recommendations for items to capture, please let me know.  
Thanks!

>From my Windows box, I see messages such as (yes, my computes are named after 
>mythical [mostly undead] creatures):

Log Name:      System
Source:        iScsiPrt
Date:          12/29/2009 1:59:23 AM
Event ID:      27
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      BANSHEE.phantom.to
Description:
Initiator could not find a match for the initiator task tag in the received 
PDU. Dump data contains the entire iSCSI header.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event";>
  <System>
    <Provider Name="iScsiPrt" />
    <EventID Qualifiers="49152">27</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2009-12-29T06:59:23.354105900Z" />
    <EventRecordID>8692</EventRecordID>
    <Channel>System</Channel>
    <Computer>BANSHEE.phantom.to</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\RaidPort1</Data>
    
<Binary>0000700001000000000000001B0000C0000000000000000000000000000000000000000000000000258000000000001000000000000000000000624CF0040000000000000000624E0000664D000000000000000000000000690071006E002E0031003900380036002D00300033002E0063006F006D002E00730075006E003A00300032003A00730074006F0072006100670065002D003100</Binary>
  </EventData>
</Event>

and

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          12/29/2009 1:59:24 AM
Event ID:      1038
Task Category: Physical Disk Resource
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      BANSHEE.phantom.to
Description:
Ownership of cluster disk 'Undead-Data-2.ntfs' has been unexpectedly lost by 
this node. Run the Validate a Configuration wizard to check your storage 
configuration.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event";>
  <System>
    <Provider Name="Microsoft-Windows-FailoverClustering" 
Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
    <EventID>1038</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>18</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2009-12-29T06:59:24.385362500Z" />
    <EventRecordID>8697</EventRecordID>
    <Correlation />
    <Execution ProcessID="2700" ThreadID="396" />
    <Channel>System</Channel>
    <Computer>BANSHEE.phantom.to</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="ResourceName">Undead-Data-2.ntfs</Data>
  </EventData>
</Event>

... and ...

Log Name:      System
Source:        iScsiPrt
Date:          12/29/2009 12:15:19 AM
Event ID:      129
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      BANSHEE.phantom.to
Description:
The description for Event ID 129 from source iScsiPrt cannot be found. Either 
the component that raises this event is not installed on your local computer or 
the installation is corrupted. You can install or repair the component on the 
local computer.

If the event originated on another computer, the display information had to be 
saved with the event.

The following information was included with the event: 

\Device\RaidPort1

the message resource is present but the message is not found in the 
string/message table

Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event";>
  <System>
    <Provider Name="iScsiPrt" />
    <EventID Qualifiers="32772">129</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2009-12-29T05:15:19.944085300Z" />
    <EventRecordID>8247</EventRecordID>
    <Channel>System</Channel>
    <Computer>BANSHEE.phantom.to</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\RaidPort1</Data>
    
<Binary>0F001800010000000000000081000480040000000000000000000000000000000000000000000000000000000000000000001700810004800000000000000000</Binary>
  </EventData>
</Event>

... and ...

Log Name:      System
Source:        iScsiPrt
Date:          12/29/2009 12:15:19 AM
Event ID:      39
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      BANSHEE.phantom.to
Description:
Initiator sent a task management command to reset the target. The target name 
is given in the dump data.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event";>
  <System>
    <Provider Name="iScsiPrt" />
    <EventID Qualifiers="49152">39</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2009-12-29T05:15:19.944085300Z" />
    <EventRecordID>8248</EventRecordID>
    <Channel>System</Channel>
    <Computer>BANSHEE.phantom.to</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\RaidPort1</Data>
    
<Binary>000040000100000000000000270000C0000000000000000000000000000000000000000000000000690071006E002E0031003900380036002D00300033002E0063006F006D002E00730075006E003A00300032003A00730074006F0072006100670065002D003100</Binary>
  </EventData>
</Event>

Log Name:      System
Source:        iScsiPrt
Date:          12/29/2009 12:15:20 AM
Event ID:      39
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      BANSHEE.phantom.to
Description:
Initiator sent a task management command to reset the target. The target name 
is given in the dump data.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event";>
  <System>
    <Provider Name="iScsiPrt" />
    <EventID Qualifiers="49152">39</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2009-12-29T05:15:20.569089300Z" />
    <EventRecordID>8296</EventRecordID>
    <Channel>System</Channel>
    <Computer>BANSHEE.phantom.to</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\RaidPort1</Data>
    
<Binary>000040000100000000000000270000C0000000000000000000000000000000000000000000000000690071006E002E0031003900380036002D00300033002E0063006F006D002E00730075006E003A00300032003A00730074006F0072006100670065002D003300</Binary>
  </EventData>
</Event>
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to