Has anybody else noticed issues with HyperV R2 clustered against a COMSTAR iSCSI target on snv_129?
I recently upgraded from sxce 125 to sxce 129. Since then, attempting to do heavy IO (specifically, moving VMs around to try to get the benefits of the dedup feature) results in the cluster failing the shared iSCSI disk. Everything was fine in sxce 125, but now failed in sxce snv_129. My setup is: two HyperV R2 nodes with up-to-date patches, clustered, 3 iscsi targets (on separate subnets), using MPIO on the HyperV R2 boxes against 3 iSCSI disks (each available on all 3 iSCSI targets). That's 6 iSCSI connections (3 from each HyperV box). The iSCSI target is SXCE snv_129 I noticed that there have been changes related to persistent reservations and sequence number marking in the COMSTAR sbd code. Unfortunately, I have no idea what those changes are attempting to accomplish. (What is the best way to browse this source code at a source-code-control repository level? Mercurial? The genunix repository only claims to be up to snv_111 [or maybe later, I forget, but definitely not up to 129].) Because of these changes, I think the following message from my Windows box is relevant (full message is below): * Initiator could not find a match for the initiator task tag in the received PDU. Dump data contains the entire iSCSI header. The most obvious symptom to note is that the "iostat -xn 1" display shows progressively less and less data, then all IO ceases and eventually the cluster offlines the disk. Later, I issue an "svcadm restart stmf" command, at which point the cluster (usually) can online the iSCSI disks again. The Windows box informs me that a task cannot be executed. Until stmf is restarted, the cluster will not bring the disks back online. Once it has been restarted, generally the cluster will recover. I'll work on a dtrace script and subsequent data collection in the coming days. If you have any recommendations for items to capture, please let me know. Thanks! >From my Windows box, I see messages such as (yes, my computes are named after >mythical [mostly undead] creatures): Log Name: System Source: iScsiPrt Date: 12/29/2009 1:59:23 AM Event ID: 27 Task Category: None Level: Error Keywords: Classic User: N/A Computer: BANSHEE.phantom.to Description: Initiator could not find a match for the initiator task tag in the received PDU. Dump data contains the entire iSCSI header. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="iScsiPrt" /> <EventID Qualifiers="49152">27</EventID> <Level>2</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2009-12-29T06:59:23.354105900Z" /> <EventRecordID>8692</EventRecordID> <Channel>System</Channel> <Computer>BANSHEE.phantom.to</Computer> <Security /> </System> <EventData> <Data>\Device\RaidPort1</Data> <Binary>0000700001000000000000001B0000C0000000000000000000000000000000000000000000000000258000000000001000000000000000000000624CF0040000000000000000624E0000664D000000000000000000000000690071006E002E0031003900380036002D00300033002E0063006F006D002E00730075006E003A00300032003A00730074006F0072006100670065002D003100</Binary> </EventData> </Event> and Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 12/29/2009 1:59:24 AM Event ID: 1038 Task Category: Physical Disk Resource Level: Error Keywords: User: SYSTEM Computer: BANSHEE.phantom.to Description: Ownership of cluster disk 'Undead-Data-2.ntfs' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" /> <EventID>1038</EventID> <Version>0</Version> <Level>2</Level> <Task>18</Task> <Opcode>0</Opcode> <Keywords>0x8000000000000000</Keywords> <TimeCreated SystemTime="2009-12-29T06:59:24.385362500Z" /> <EventRecordID>8697</EventRecordID> <Correlation /> <Execution ProcessID="2700" ThreadID="396" /> <Channel>System</Channel> <Computer>BANSHEE.phantom.to</Computer> <Security UserID="S-1-5-18" /> </System> <EventData> <Data Name="ResourceName">Undead-Data-2.ntfs</Data> </EventData> </Event> ... and ... Log Name: System Source: iScsiPrt Date: 12/29/2009 12:15:19 AM Event ID: 129 Task Category: None Level: Warning Keywords: Classic User: N/A Computer: BANSHEE.phantom.to Description: The description for Event ID 129 from source iScsiPrt cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer. If the event originated on another computer, the display information had to be saved with the event. The following information was included with the event: \Device\RaidPort1 the message resource is present but the message is not found in the string/message table Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="iScsiPrt" /> <EventID Qualifiers="32772">129</EventID> <Level>3</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2009-12-29T05:15:19.944085300Z" /> <EventRecordID>8247</EventRecordID> <Channel>System</Channel> <Computer>BANSHEE.phantom.to</Computer> <Security /> </System> <EventData> <Data>\Device\RaidPort1</Data> <Binary>0F001800010000000000000081000480040000000000000000000000000000000000000000000000000000000000000000001700810004800000000000000000</Binary> </EventData> </Event> ... and ... Log Name: System Source: iScsiPrt Date: 12/29/2009 12:15:19 AM Event ID: 39 Task Category: None Level: Error Keywords: Classic User: N/A Computer: BANSHEE.phantom.to Description: Initiator sent a task management command to reset the target. The target name is given in the dump data. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="iScsiPrt" /> <EventID Qualifiers="49152">39</EventID> <Level>2</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2009-12-29T05:15:19.944085300Z" /> <EventRecordID>8248</EventRecordID> <Channel>System</Channel> <Computer>BANSHEE.phantom.to</Computer> <Security /> </System> <EventData> <Data>\Device\RaidPort1</Data> <Binary>000040000100000000000000270000C0000000000000000000000000000000000000000000000000690071006E002E0031003900380036002D00300033002E0063006F006D002E00730075006E003A00300032003A00730074006F0072006100670065002D003100</Binary> </EventData> </Event> Log Name: System Source: iScsiPrt Date: 12/29/2009 12:15:20 AM Event ID: 39 Task Category: None Level: Error Keywords: Classic User: N/A Computer: BANSHEE.phantom.to Description: Initiator sent a task management command to reset the target. The target name is given in the dump data. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="iScsiPrt" /> <EventID Qualifiers="49152">39</EventID> <Level>2</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2009-12-29T05:15:20.569089300Z" /> <EventRecordID>8296</EventRecordID> <Channel>System</Channel> <Computer>BANSHEE.phantom.to</Computer> <Security /> </System> <EventData> <Data>\Device\RaidPort1</Data> <Binary>000040000100000000000000270000C0000000000000000000000000000000000000000000000000690071006E002E0031003900380036002D00300033002E0063006F006D002E00730075006E003A00300032003A00730074006F0072006100670065002D003300</Binary> </EventData> </Event> -- This message posted from opensolaris.org _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org