Re: [BUG] mpt2sas: driver init fails on kernel >=4.2 for 9211-8i IT

2016-06-20 Thread Matthias Prager
Still tracking - small update: So far with the most current esxi 6
(build 3825889 - 2016-05-12) the fix is not there yet.

Sidenote: the mpt driver seems to have learned to work around the irq
handling issues (I tested kernel 4.6.2) whereas the e1000e ethernet
driver still has trouble initializing.

... patiently waiting for VMware to release 6.0 update 3.

Am 24.03.2016 um 12:02 schrieb Jason Taylor:
> The last update we have from VMware is that the fix will be in 6.0p3/5.5p8 - 
> both of which are targeted for a mid year release.
> 
> -Original Message-
> From: Thomas Gleixner [mailto:t...@linutronix.de] 
> Sent: Thursday, March 24, 2016 2:06 AM
> To: Matthias Prager <li...@matthiasprager.de>
> Cc: Sreekanth Reddy <sreekanth.re...@broadcom.com>; linux-scsi 
> <linux-scsi@vger.kernel.org>; Jason Taylor <jason.tay...@simplivity.com>; 
> LKML <linux-ker...@vger.kernel.org>; x...@kernel.org
> Subject: Re: [BUG] mpt2sas: driver init fails on kernel >=4.2 for 9211-8i IT
> 
> On Thu, 24 Mar 2016, Matthias Prager wrote:
>> The timeout happens reliably after two warm boots with a 'bad' kernel 
>> after coming from a 'good' kernel, and also after one cold boot with a 
>> 'bad' kernel (meaning cold booting a 'bad' kernel leads directly to 
>> the timeout and warm booting needs a second run to produce the 
>> timeout). I haven't tested bare metal yet - I can do that tomorrow if 
>> necessary.
> 
> That's on VMWare, right? Jason has seen the same issue:
> 
>  http://marc.info/?l=linux-kernel=145280623530135=2
>  http://marc.info/?l=linux-kernel=145523785705624=2
> 
>  "Works fine on real hardware and KVM. Filed an issue with VMware and the fix
>   is in the works."
> 
> Jason, what's the state of this?
> 
> Thanks,
> 
>   tglx
> ---
> PRIVACY STATEMENT:
> This message is a PRIVATE communication.  This message and all attachments 
> are a private communication sent by SimpliVity and are considered to be 
> confidential or protected by privilege. If you are not the intended 
> recipient, you are hereby notified that any disclosure, copying, distribution 
> or use of the information contained in or attached to this message is 
> strictly prohibited.  Please notify the sender of the delivery error by 
> replying to this message, and then delete it from your system.
> ---
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] mpt2sas: driver init fails on kernel >=4.2 for 9211-8i IT

2016-03-24 Thread Matthias Prager
Am 24.03.2016 um 07:06 schrieb Thomas Gleixner:
> On Thu, 24 Mar 2016, Matthias Prager wrote:
>> The timeout happens reliably after two warm boots with a 'bad' kernel
>> after coming from a 'good' kernel, and also after one cold boot with a
>> 'bad' kernel (meaning cold booting a 'bad' kernel leads directly to the
>> timeout and warm booting needs a second run to produce the timeout). I
>> haven't tested bare metal yet - I can do that tomorrow if necessary.
> 
> That's on VMWare, right? Jason has seen the same issue:
Correct ESXi 5.1 patchlevel 3070626 (the two controllers are passed
through to the VM).

> 
>  http://marc.info/?l=linux-kernel=145280623530135=2
>  http://marc.info/?l=linux-kernel=145523785705624=2
> 
>  "Works fine on real hardware and KVM. Filed an issue with VMware and the fix
>   is in the works."
> 
> Jason, what's the state of this?
> 
> Thanks,
> 
>   tglx
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] mpt2sas: driver init fails on kernel >=4.2 for 9211-8i IT

2016-03-23 Thread Matthias Prager
inux.intel.com
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> 
> :04 04 786bcad9a3fad413e0b744e2cfa20da7ff402db6 
> 22618cac66dee85a7752bb3af81169fff3a242d8 March
> :04 04 acee54015803d4cd52d582a9e5e93aa56ad08482 
> 40d2c7a02c0f8677e596c98c936404b2211336a3 Mdrivers 

---
Matthias



Am 22.03.2016 um 22:02 schrieb Matthias Prager:
> Hi Sreekanth,
> 
> Am 22.03.2016 um 11:52 schrieb Sreekanth Reddy:
>> Hi Matthias,
>>
>> Thanks for providing detail explanation.
>>
>> Currently I am trying to reproduce this issue locally. I have used the
>> same HBA card but still I am not able to reproduce this issue on 4.5
>> kernel.
>>
>> I will try for few more times and I will also try to get the diff
>> between 4.1 to 4.5 kernels w.r.t msix vectors support.
> I will try to bisect this tomorrow. And I will try to reproduce this
> outside the VM environment on bare metal.
> 
>>
>> Meanwhile can you please provide me the output of "lspci -vvv" command
>> for LSI SAS2 HBA's and also hexdump of config sysfs parameter for SAS2
>> pci device. I need this data to be collected before "timeout" print is
>> observed, or in the mpt3sas_base.c file and in the
>> mpt3sas_base_attach(). return from mpt3sas_base_attach() function if
>> return value of mpt3sas_base_map_resources() is non zero instead of
>> freeing the resources, as shown below
>>
>> ---
>> r = mpt3sas_base_map_resources(ioc);
>> if (r)
>> - goto out_free_resources;
>> +    return r;
>> -
> I will get you this data tomorrow too.
> 
>>
>> Also are you sure that this issue won't occur on less than 4.1 kernel?
> Yes definitely - I never observed these timeouts with any kernel
> <=4.1.20, and I started seeing them from kernel 4.2.1 onward (I haven't
> actually tested 4.2.0 yet).
> 
>>
>> Thanks,
>> Sreekanth
>>
>> On Mon, Mar 21, 2016 at 10:00 PM, Matthias Prager
>> <li...@matthiasprager.de> wrote:
>>> Am 21.03.2016 um 16:52 schrieb Matthias Prager:
>>>> Hi Sreekanth,
>>>>
>>>> thanks for digging into this issue. Regarding the 4.5.0 after 4.1.20
>>>> boot statement, I will try to express myself better:
>>>>
>>>> I first started the system with the 4.1.20 kernel. Then I issued an
>>>> 'init 6' warm-reboot and chose to boot the 4.5.0 kernel. This procedure
>>>> often works (i.e. I'm able to boot the newer kernels (>=4.2) this way
>>>> with mpt2sas initializing just fine).
>>> One small addition: warm-rebooting kernel 4.5.0 from kernel 4.5.0
>>> reliably leads to this issue manifesting (mpt2sas driver not initializing).
>>>
>>>>
>>>> I will try the msix_disable parameter and report back. Maybe kernel
>>>> version 4.2 broke interrupt handling somehow?
>>> With the msix_disable parameter set kernels 4.5.0, 4.4.5, 4.3.3 and
>>> 4.2.1 boot without the mpt2sas init issue. I also tried the pci=nomsi
>>> paremeter instead, which also works fine.
>>>
>>> The igb ethernet-driver issue I mentioned as possibly related has not
>>> changed with pci=nomsi. So I'm assuming it is not the same issue, but
>>> possibly related, since it also occurs on kernels >=4.2.
>>>
>>>>
>>>> ---
>>>> Matthias
>>>>
>>>>
>>>> Am 21.03.2016 um 14:59 schrieb Sreekanth Reddy:
>>>>> Hi Matthias,
>>>>>
>>>>> Thanks for providing the logs. In these logs, I am not observing any
>>>>> such a prints which should be suspected for this issue.
>>>>>
>>>>> Can you please try once by setting mpt3sas driver's "msix_disable"
>>>>> module parameter to one.
>>>>>
>>>>> Also, can you please elaborate below statement for me, I am not able
>>>>> understand this statement
>>>>>  "I managed to boot the same 4.5.0 kernel successfully after warm
>>>>> rebooting from 4.1.20"
>>>>>
>>>>> Thanks,
>>>>> Sreekanth
>>>>>
>>>>> On Mon, Mar 21, 2016 at 2:48 PM, Matthias Prager
>>>>> <li...@matthiasprager.de> wrote:
>>>>>> Hi Sreekanth,
>>>>>>
>>>>>> thank you for replying so quic

Re: [BUG] mpt2sas: driver init fails on kernel >=4.2 for 9211-8i IT

2016-03-22 Thread Matthias Prager
Hi Sreekanth,

Am 22.03.2016 um 11:52 schrieb Sreekanth Reddy:
> Hi Matthias,
> 
> Thanks for providing detail explanation.
> 
> Currently I am trying to reproduce this issue locally. I have used the
> same HBA card but still I am not able to reproduce this issue on 4.5
> kernel.
> 
> I will try for few more times and I will also try to get the diff
> between 4.1 to 4.5 kernels w.r.t msix vectors support.
I will try to bisect this tomorrow. And I will try to reproduce this
outside the VM environment on bare metal.

> 
> Meanwhile can you please provide me the output of "lspci -vvv" command
> for LSI SAS2 HBA's and also hexdump of config sysfs parameter for SAS2
> pci device. I need this data to be collected before "timeout" print is
> observed, or in the mpt3sas_base.c file and in the
> mpt3sas_base_attach(). return from mpt3sas_base_attach() function if
> return value of mpt3sas_base_map_resources() is non zero instead of
> freeing the resources, as shown below
> 
> ---
> r = mpt3sas_base_map_resources(ioc);
> if (r)
> - goto out_free_resources;
> +return r;
> -
I will get you this data tomorrow too.

> 
> Also are you sure that this issue won't occur on less than 4.1 kernel?
Yes definitely - I never observed these timeouts with any kernel
<=4.1.20, and I started seeing them from kernel 4.2.1 onward (I haven't
actually tested 4.2.0 yet).

> 
> Thanks,
> Sreekanth
> 
> On Mon, Mar 21, 2016 at 10:00 PM, Matthias Prager
> <li...@matthiasprager.de> wrote:
>> Am 21.03.2016 um 16:52 schrieb Matthias Prager:
>>> Hi Sreekanth,
>>>
>>> thanks for digging into this issue. Regarding the 4.5.0 after 4.1.20
>>> boot statement, I will try to express myself better:
>>>
>>> I first started the system with the 4.1.20 kernel. Then I issued an
>>> 'init 6' warm-reboot and chose to boot the 4.5.0 kernel. This procedure
>>> often works (i.e. I'm able to boot the newer kernels (>=4.2) this way
>>> with mpt2sas initializing just fine).
>> One small addition: warm-rebooting kernel 4.5.0 from kernel 4.5.0
>> reliably leads to this issue manifesting (mpt2sas driver not initializing).
>>
>>>
>>> I will try the msix_disable parameter and report back. Maybe kernel
>>> version 4.2 broke interrupt handling somehow?
>> With the msix_disable parameter set kernels 4.5.0, 4.4.5, 4.3.3 and
>> 4.2.1 boot without the mpt2sas init issue. I also tried the pci=nomsi
>> paremeter instead, which also works fine.
>>
>> The igb ethernet-driver issue I mentioned as possibly related has not
>> changed with pci=nomsi. So I'm assuming it is not the same issue, but
>> possibly related, since it also occurs on kernels >=4.2.
>>
>>>
>>> ---
>>> Matthias
>>>
>>>
>>> Am 21.03.2016 um 14:59 schrieb Sreekanth Reddy:
>>>> Hi Matthias,
>>>>
>>>> Thanks for providing the logs. In these logs, I am not observing any
>>>> such a prints which should be suspected for this issue.
>>>>
>>>> Can you please try once by setting mpt3sas driver's "msix_disable"
>>>> module parameter to one.
>>>>
>>>> Also, can you please elaborate below statement for me, I am not able
>>>> understand this statement
>>>>  "I managed to boot the same 4.5.0 kernel successfully after warm
>>>> rebooting from 4.1.20"
>>>>
>>>> Thanks,
>>>> Sreekanth
>>>>
>>>> On Mon, Mar 21, 2016 at 2:48 PM, Matthias Prager
>>>> <li...@matthiasprager.de> wrote:
>>>>> Hi Sreekanth,
>>>>>
>>>>> thank you for replying so quickly. Here are the logs you requested
>>>>> (kernel 4.5.0):
>>>>>
>>>>>> [2.083177] mpt3sas version 09.102.00.00 loaded
>>>>>> [2.083757] mpt2sas_cm0: mpt3sas_base_attach
>>>>>> [2.083956] mpt2sas_cm0: mpt3sas_base_map_resources
>>>>>> [2.084708] mpt2sas_cm0: 32 BIT PCI BUS DMA ADDRESSING SUPPORTED, 
>>>>>> total mem (3074748 kB)
>>>>>> [2.084964] mpt2sas_cm0: _base_get_ioc_facts
>>>>>> [2.085154] mpt2sas_cm0: _base_wait_for_iocstate
>>>>>> [2.140893]offset:data
>>>>>> [2.141082][0x00]:03100200
>>>>>> [2.141257

Re: [BUG] mpt2sas: driver init fails on kernel >=4.2 for 9211-8i IT

2016-03-21 Thread Matthias Prager
Am 21.03.2016 um 16:52 schrieb Matthias Prager:
> Hi Sreekanth,
> 
> thanks for digging into this issue. Regarding the 4.5.0 after 4.1.20
> boot statement, I will try to express myself better:
> 
> I first started the system with the 4.1.20 kernel. Then I issued an
> 'init 6' warm-reboot and chose to boot the 4.5.0 kernel. This procedure
> often works (i.e. I'm able to boot the newer kernels (>=4.2) this way
> with mpt2sas initializing just fine).
One small addition: warm-rebooting kernel 4.5.0 from kernel 4.5.0
reliably leads to this issue manifesting (mpt2sas driver not initializing).

> 
> I will try the msix_disable parameter and report back. Maybe kernel
> version 4.2 broke interrupt handling somehow?
With the msix_disable parameter set kernels 4.5.0, 4.4.5, 4.3.3 and
4.2.1 boot without the mpt2sas init issue. I also tried the pci=nomsi
paremeter instead, which also works fine.

The igb ethernet-driver issue I mentioned as possibly related has not
changed with pci=nomsi. So I'm assuming it is not the same issue, but
possibly related, since it also occurs on kernels >=4.2.

> 
> ---
> Matthias
> 
> 
> Am 21.03.2016 um 14:59 schrieb Sreekanth Reddy:
>> Hi Matthias,
>>
>> Thanks for providing the logs. In these logs, I am not observing any
>> such a prints which should be suspected for this issue.
>>
>> Can you please try once by setting mpt3sas driver's "msix_disable"
>> module parameter to one.
>>
>> Also, can you please elaborate below statement for me, I am not able
>> understand this statement
>>  "I managed to boot the same 4.5.0 kernel successfully after warm
>> rebooting from 4.1.20"
>>
>> Thanks,
>> Sreekanth
>>
>> On Mon, Mar 21, 2016 at 2:48 PM, Matthias Prager
>> <li...@matthiasprager.de> wrote:
>>> Hi Sreekanth,
>>>
>>> thank you for replying so quickly. Here are the logs you requested
>>> (kernel 4.5.0):
>>>
>>>> [2.083177] mpt3sas version 09.102.00.00 loaded
>>>> [2.083757] mpt2sas_cm0: mpt3sas_base_attach
>>>> [2.083956] mpt2sas_cm0: mpt3sas_base_map_resources
>>>> [2.084708] mpt2sas_cm0: 32 BIT PCI BUS DMA ADDRESSING SUPPORTED, total 
>>>> mem (3074748 kB)
>>>> [2.084964] mpt2sas_cm0: _base_get_ioc_facts
>>>> [2.085154] mpt2sas_cm0: _base_wait_for_iocstate
>>>> [2.140893]offset:data
>>>> [2.141082][0x00]:03100200
>>>> [2.141257][0x04]:2300
>>>> [2.141432][0x08]:
>>>> [2.141606][0x0c]:
>>>> [2.141780][0x10]:
>>>> [2.141955][0x14]:00010480
>>>> [2.142129][0x18]:22130d68
>>>> [2.142303][0x1c]:0001285c
>>>> [2.142477][0x20]:14000400
>>>> [2.142651][0x24]:0020
>>>> [2.142825][0x28]:010f
>>>> [2.143000][0x2c]:000c000b
>>>> [2.143174][0x30]:003c0003
>>>> [2.143349][0x34]:0020ffe0
>>>> [2.143544][0x38]:00800122
>>>> [2.143719][0x3c]:0009
>>>> [2.143895] mpt2sas_cm0: hba queue depth(3432), max chains per io(128)
>>>> [2.144106] mpt2sas_cm0: request frame size(128), reply frame size(128)
>>>> [2.144397] mpt2sas_cm0: msix is supported, vector_count(1)
>>>> [2.144600] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 4, 
>>>> max_msix_vectors: -1
>>>> [2.145161] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 58
>>>> [2.145361] mpt2sas_cm0: iomem(0xfd4fc000), 
>>>> mapped(0xc90d), size(16384)
>>>> [2.145591] mpt2sas_cm0: ioport(0x4000), size(256)
>>>> [2.146206] mpt2sas_cm0: _base_get_ioc_facts
>>>> [2.146397] mpt2sas_cm0: _base_wait_for_iocstate
>>>> [2.202087]offset:data
>>>> [2.202281][0x00]:03100200
>>>> [2.202456][0x04]:2300
>>>> [2.202631][0x08]:
>>>> [2.202805][0x0c]:
>>>> [2.202980][0x10]:
>>>> [2.203154][0x14]:00010480
>>>> [2.203328][0x18]:22130d68
>>>> [2.203521][0x1c]:0001285c
>>>> [2.203695][0x20]:14000400
>>>> [2.203870][0x24]:0020
>>>> [2.204044][0x28]:010f
>>>> [2.204219]  

Re: [BUG] mpt2sas: driver init fails on kernel >=4.2 for 9211-8i IT

2016-03-21 Thread Matthias Prager
Hi Sreekanth,

thanks for digging into this issue. Regarding the 4.5.0 after 4.1.20
boot statement, I will try to express myself better:

I first started the system with the 4.1.20 kernel. Then I issued an
'init 6' warm-reboot and chose to boot the 4.5.0 kernel. This procedure
often works (i.e. I'm able to boot the newer kernels (>=4.2) this way
with mpt2sas initializing just fine).

I will try the msix_disable parameter and report back. Maybe kernel
version 4.2 broke interrupt handling somehow?

---
Matthias


Am 21.03.2016 um 14:59 schrieb Sreekanth Reddy:
> Hi Matthias,
> 
> Thanks for providing the logs. In these logs, I am not observing any
> such a prints which should be suspected for this issue.
> 
> Can you please try once by setting mpt3sas driver's "msix_disable"
> module parameter to one.
> 
> Also, can you please elaborate below statement for me, I am not able
> understand this statement
>  "I managed to boot the same 4.5.0 kernel successfully after warm
> rebooting from 4.1.20"
> 
> Thanks,
> Sreekanth
> 
> On Mon, Mar 21, 2016 at 2:48 PM, Matthias Prager
> <li...@matthiasprager.de> wrote:
>> Hi Sreekanth,
>>
>> thank you for replying so quickly. Here are the logs you requested
>> (kernel 4.5.0):
>>
>>> [2.083177] mpt3sas version 09.102.00.00 loaded
>>> [2.083757] mpt2sas_cm0: mpt3sas_base_attach
>>> [2.083956] mpt2sas_cm0: mpt3sas_base_map_resources
>>> [2.084708] mpt2sas_cm0: 32 BIT PCI BUS DMA ADDRESSING SUPPORTED, total 
>>> mem (3074748 kB)
>>> [2.084964] mpt2sas_cm0: _base_get_ioc_facts
>>> [2.085154] mpt2sas_cm0: _base_wait_for_iocstate
>>> [2.140893]offset:data
>>> [2.141082][0x00]:03100200
>>> [2.141257][0x04]:2300
>>> [2.141432][0x08]:
>>> [2.141606][0x0c]:
>>> [2.141780][0x10]:
>>> [2.141955][0x14]:00010480
>>> [2.142129][0x18]:22130d68
>>> [2.142303][0x1c]:0001285c
>>> [2.142477][0x20]:14000400
>>> [2.142651][0x24]:0020
>>> [2.142825][0x28]:010f
>>> [2.143000][0x2c]:000c000b
>>> [2.143174][0x30]:003c0003
>>> [2.143349][0x34]:0020ffe0
>>> [2.143544][0x38]:00800122
>>> [2.143719][0x3c]:0009
>>> [2.143895] mpt2sas_cm0: hba queue depth(3432), max chains per io(128)
>>> [2.144106] mpt2sas_cm0: request frame size(128), reply frame size(128)
>>> [2.144397] mpt2sas_cm0: msix is supported, vector_count(1)
>>> [2.144600] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 4, 
>>> max_msix_vectors: -1
>>> [2.145161] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 58
>>> [2.145361] mpt2sas_cm0: iomem(0xfd4fc000), 
>>> mapped(0xc90d), size(16384)
>>> [2.145591] mpt2sas_cm0: ioport(0x4000), size(256)
>>> [2.146206] mpt2sas_cm0: _base_get_ioc_facts
>>> [2.146397] mpt2sas_cm0: _base_wait_for_iocstate
>>> [2.202087]offset:data
>>> [2.202281][0x00]:03100200
>>> [2.202456][0x04]:2300
>>> [2.202631][0x08]:
>>> [2.202805][0x0c]:
>>> [2.202980][0x10]:
>>> [2.203154][0x14]:00010480
>>> [2.203328][0x18]:22130d68
>>> [2.203521][0x1c]:0001285c
>>> [2.203695][0x20]:14000400
>>> [2.203870][0x24]:0020
>>> [2.204044][0x28]:010f
>>> [2.204219][0x2c]:000c000b
>>> [2.204394][0x30]:003c0003
>>> [2.204571][0x34]:0020ffe0
>>> [2.204746][0x38]:00800122
>>> [2.204926][0x3c]:0009
>>> [2.205102] mpt2sas_cm0: hba queue depth(3432), max chains per io(128)
>>> [2.205314] mpt2sas_cm0: request frame size(128), reply frame size(128)
>>> [2.205528] mpt2sas_cm0: _base_make_ioc_ready
>>> [2.205719] mpt2sas_cm0: _base_get_port_facts
>>> [2.234205]offset:data
>>> [2.234397][0x00]:0507
>>> [2.234573][0x04]:
>>> [2.234748][0x08]:
>>> [2.234928][0x0c]:
>>> [2.235103][0x10]:
>>> [2.235278][0x14]:3000
>>> [2.235453][0x18]:0078
>>

Re: [BUG] mpt2sas: driver init fails on kernel >=4.2 for 9211-8i IT

2016-03-21 Thread Matthias Prager
s_device_init_add: handle(0x000b), 
> sas_addr(0x443322110300)
> [2.765147] mpt2sas_cm1: _scsih_sas_device_init_add: enclosure logical 
> id(0x500605b0026f79b0), slot( 0)
> [2.765891] mpt2sas_cm1: _scsih_sas_device_init_add: handle(0x000d), 
> sas_addr(0x443322110400)
> [2.767250] mpt2sas_cm1: _scsih_sas_device_init_add: enclosure logical 
> id(0x500605b0026f79b0), slot( 7)
> [2.768009] mpt2sas_cm1: _scsih_sas_device_init_add: handle(0x0009), 
> sas_addr(0x443322110500)
> [2.768394] mpt2sas_cm1: _scsih_sas_device_init_add: enclosure logical 
> id(0x500605b0026f79b0), slot( 6)
> [2.769136] mpt2sas_cm1: _scsih_sas_device_init_add: handle(0x000a), 
> sas_addr(0x443322110600)
> [2.769510] mpt2sas_cm1: _scsih_sas_device_init_add: enclosure logical 
> id(0x500605b0026f79b0), slot( 5)
> [2.769888] mpt2sas_cm1: _scsih_determine_boot_device: 
> current_boot_device(0x443322110600)
> [2.770641] mpt2sas_cm1: _scsih_sas_device_init_add: handle(0x000c), 
> sas_addr(0x443322110700)
> [2.771025] mpt2sas_cm1: _scsih_sas_device_init_add: enclosure logical 
> id(0x500605b0026f79b0), slot( 4)
> [2.771405] mpt2sas_cm1: discovery event: (stop)
> 
> [2.771788] mpt2sas_cm1: discovery event: (start)
> 
> [2.772172] mpt2sas_cm1: discovery event: (stop)
> 
> [2.772554] mpt2sas_cm1: discovery event: (start)
> 
> [2.772942] mpt2sas_cm1: discovery event: (stop)
> 
> [2.773326] mpt2sas_cm1: discovery event: (start)
> 
> [2.773709] mpt2sas_cm1: discovery event: (stop)
> 
> [2.774090] mpt2sas_cm1: discovery event: (start)
> 
> [2.774474] mpt2sas_cm1: discovery event: (stop)
> 
> [2.774855] mpt2sas_cm1: discovery event: (start)
> 
> [2.775237] mpt2sas_cm1: discovery event: (stop)
> 
> [2.775619] mpt2sas_cm1: port enable: complete from worker thread
> [2.780192] mpt2sas_cm1: port enable: SUCCESS
> [2.782404] scsi 1:0:0:0: Direct-Access ATA  Hitachi HDP72505 A5BA 
> PQ: 0 ANSI: 6
> [2.782651] scsi 1:0:0:0: SATA: handle(0x000a), 
> sas_addr(0x443322110600), phy(6), device_name(0x5000cca34dc05584)
> [2.783039] scsi 1:0:0:0: SATA: enclosure_logical_id(0x500605b0026f79b0), 
> slot(5)
> [2.783390] scsi 1:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), 
> fua(y), sw_preserve(y)
> [2.803032] scsi 1:0:1:0: Direct-Access ATA  ST8000AS0002-1NA AR15 
> PQ: 0 ANSI: 6
> [2.803272] scsi 1:0:1:0: SATA: handle(0x000e), 
> sas_addr(0x443322110200), phy(2), device_name(0x5000c500908f1d05)
> [2.803660] scsi 1:0:1:0: SATA: enclosure_logical_id(0x500605b0026f79b0), 
> slot(1)
> [2.803981] scsi 1:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), 
> fua(y), sw_preserve(y)
> [2.821712] scsi 1:0:2:0: Direct-Access ATA  ST3000DM001-1CH1 CC43 
> PQ: 0 ANSI: 6
> [2.821963] scsi 1:0:2:0: SATA: handle(0x000b), 
> sas_addr(0x443322110300), phy(3), device_name(0x5000c5004dfd2a8a)
> [2.822352] scsi 1:0:2:0: SATA: enclosure_logical_id(0x500605b0026f79b0), 
> slot(0)
> [2.822694] scsi 1:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), 
> fua(y), sw_preserve(y)
> [2.844233] scsi 1:0:3:0: Direct-Access ATA  ST3500320NS  SN16 
> PQ: 0 ANSI: 6
> [2.844483] scsi 1:0:3:0: SATA: handle(0x000d), 
> sas_addr(0x443322110400), phy(4), device_name(0x5000c50009dca65b)
> [2.844870] scsi 1:0:3:0: SATA: enclosure_logical_id(0x500605b0026f79b0), 
> slot(7)
> [2.845216] scsi 1:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), 
> fua(y), sw_preserve(y)
> [2.868963] scsi 1:0:4:0: Direct-Access ATA  SAMSUNG HD401LJ  0-15 
> PQ: 0 ANSI: 6
> [2.870994] scsi 1:0:4:0: SATA: handle(0x0009), 
> sas_addr(0x443322110500), phy(5), device_name(0x695d550daf678093)
> [2.871382] scsi 1:0:4:0: SATA: enclosure_logical_id(0x500605b0026f79b0), 
> slot(6)
> [2.871728] scsi 1:0:4:0: atapi(n), ncq(y), asyn_notify(n), smart(y), 
> fua(y), sw_preserve(y)
> [2.894004] scsi 1:0:5:0: Direct-Access ATA  ST3750330NS  SN06 
> PQ: 0 ANSI: 6
> [2.894257] scsi 1:0:5:0: SATA: handle(0x000c), 
> sas_addr(0x443322110700), phy(7), device_name(0x5000c5000266bb02)
> [2.894646] scsi 1:0:5:0: SATA: enclosure_logical_id(0x500605b0026f79b0), 
> slot(4)
> [2.894972] scsi 1:0:5:0: atapi(n), ncq(y), asyn_notify(n), smart(y), 
> fua(y), sw_preserve(y) 

I don't see a clear pattern yet, but I suspect warm rebooting >=4.2
after having run 4.1.x leads to sucessful driver inits. But I also
remember having had cold boots, with kernels >=4.2 that worked fine
(obviously not all of the time). <=4.1.20 kernels on the other hand boot
fine every single time.

Interesti

[BUG] mpt2sas: driver init fails on kernel >=4.2 for 9211-8i IT

2016-03-20 Thread Matthias Prager
Hello,

I don't know what's the correct procedure, whether I should file a bug or first 
report this issue on the kernel mailing-list. So please feel free to tell me to 
open a ticket in the bugtracker (bugzilla.kernel.org?).

But first let me present the issue I encounter:

Kernels >= 4.2 (4.2.1 was the first on I tried, but also 4.3, 4.4 and 4.5) fail 
to load the mpt2sas driver on most (but not all) boots. Kernels <= 4.1.x work 
fine every single time (4.1.19 was the latest I tried, also 3.18.29).

Here is the dmesg output for a failed driver init with Kernel 4.5.0:
> [2.068313] mpt3sas version 09.102.00.00 loaded
> [2.069412] mpt2sas_cm0: 32 BIT PCI BUS DMA ADDRESSING SUPPORTED, total 
> mem (3074748 kB)
> [2.125260] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 4, 
> max_msix_vectors: -1
> [2.125996] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 58
> [2.126199] mpt2sas_cm0: iomem(0xfd4fc000), 
> mapped(0xc90d), size(16384)
> [2.126430] mpt2sas_cm0: ioport(0x4000), size(256)
> [2.216371] mpt2sas_cm0: Allocated physical memory: size(4964 kB)
> [2.216600] mpt2sas_cm0: Current Controller Queue Depth(3307),Max 
> Controller Queue Depth(3432)
> [2.217004] mpt2sas_cm0: Scatter Gather Elements per IO(128)
> [5.086959] floppy0: no floppy controllers found
> [   32.256720] mpt2sas_cm0: _base_event_notification: timeout
> [   32.256940] mf:
> 
> [   32.257106] 0700
> [   32.257302] 
> [   32.257337] 
> [   32.257533] 
> [   32.257568] 
> [   32.257764] 0f2f7fff
> [   32.257800] ff7c
> [   32.257995] 
> [   32.258031]
> 
> [   32.258352] 
> [   32.258387] 
> [   32.258582] 
> 
> [   32.258950] mpt2sas_cm0: sending message unit reset !!
> [   32.260688] mpt2sas_cm0: message unit reset: SUCCESS
> [   32.325956] mpt2sas_cm0: failure at 
> drivers/scsi/mpt3sas/mpt3sas_scsih.c:8592/_scsih_probe()!

For comparison here is a dmesg output for a successful boot on Kernel 4.1.15:
> [2.035568] mpt2sas version 20.100.00.00 loaded
> [2.037243] mpt2sas0: 32 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem 
> (2046352 kB)
> [2.755374] mpt2sas0: MSI-X vectors supported: 1, no of cores: 3, 
> max_msix_vectors: 8
> [2.756377] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 58
> [2.756708] mpt2sas0: iomem(0xfd5fc000), 
> mapped(0xc90d), size(16384)
> [2.757083] mpt2sas0: ioport(0x4000), size(256)
> [3.842944] mpt2sas0: Allocated physical memory: size(4964 kB)
> [3.843303] mpt2sas0: Current Controller Queue Depth(3307), Max Controller 
> Queue Depth(3432)
> [3.843717] mpt2sas0: Scatter Gather Elements per IO(128)
> [4.415980] mpt2sas0: LSISAS2008: FWVersion(20.00.04.00), 
> ChipRevision(0x03), BiosVersion(07.39.00.00)
> [4.416618] mpt2sas0: Protocol=(Initiator,Target), 
> Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
> [4.417846] scsi host0: Fusion MPT SAS Host
> [4.436865] mpt2sas0: sending port enable !!
> [4.440460] mpt2sas0: host_add: handle(0x0001), 
> sas_addr(0x), phys(8)
> 
> ...
> [4.444045] scsi 0:0:0:0: Direct-Access ATA  Hitachi HDS5C302 A800 
> PQ: 0 ANSI: 6
> [4.444338] scsi 0:0:0:0: SATA: handle(0x0009), 
> sas_addr(0x), phy(1), device_name(0x)
> [4.444823] scsi 0:0:0:0: SATA: enclosure_logical_id(0x), 
> slot(2)
> [4.445197] scsi 0:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), 
> fua(y), sw_preserve(y)
> ... (n times for n drives)
> 
> [4.515443] mpt2sas0: port enable: SUCCESS


The Controller(s) are all Avego/LSI 9211-8i running the latest version of IT 
firmware (v20.00.04.00). These are passed through to a gentoo-linux VM on an 
ESXi 5.1 host (latest patchlevel). I have two of these systems (one production 
one testing) with nearly identical hardware (Supermicro X8DTi-F boards with 2x 
Intel Westmere CPUs). I also have an older set of systems running the same 
software config (Intel S3200SHLC boards with single Core2Quad) which exhibit 
the same buggy behavior.

The 'uname -a' line on the current testing system looks like this:
> Linux pserver2 4.1.19-gentoo #4 SMP Thu Mar 17 16:02:48 CET 2016 x86_64 
> Intel(R) Xeon(R) CPU E5506 @ 2.13GHz GenuineIntel GNU/Linux
p

There seem to be other people with similar issues:


Any thoughts?

---
Matthias
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JMicron JMB363 PCI SATA/IDE Card Support

2013-06-23 Thread Matthias Prager
I did some more digging and came up with a partial
workaround:
After adding the line:
   { PCI_VDEVICE(JMICRON, 0x236f), board_ahci_ign_iferr },
(at line 301 of drivers/ata/ahci.c)
The the sata ports of my two cards get detected and lspci -k shows
they are using the ahci driver.

My guess is the 'RAID bus controller [0104]' mode/class is keeping
my cards from beeing detected (they probably would be if they were
in PCI_CLASS_STORAGE_SATA_AHCI mode). Which would mean line 297
is just plain buggy.

This still leaves the problem of the missing IDE Ports (one master+slave
port per card are still not detected). I'm trying to
understand how the pata_jmicron driver is supposed to work
but haven't wrapped my head around it yet.

- Matthias

Am 19.06.2013 15:12, schrieb Matthias Prager:
 Hello everyone,
 
 I'm having a hard time getting my JMicron JMB363 PCI SATA/IDE Card
 to work under linux.
 The 'lspci -nn' output reads as follows:
 RAID bus controller [0104]: JMicron Technology Corp. JMB363 SATA/IDE
 Controller [197b:2363] (rev 03)
 
 I tried my own kernel (3.9.6) under gentoo with CONFIG_PATA_JMICRON and
 CONFIG_SATA_AHCI enabled. And I tried using
 the latest SystemrescueCD (with kernel 3.4.47) which has pretty much
 everything compiled into it.
 
 FreeBSD is recognizing the devices just fine. How do I get them
 to run on linux?
 
 - Matthias
 
 P.S. a side note: this is in an virtualized environment (VMWare ESXi)
 with pass-through.
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JMicron JMB363 PCI SATA/IDE Card Support

2013-06-23 Thread Matthias Prager
Looks like the RAID Mode is the default one and
quirk_jmicron_ata() in drivers/pci/quirks.c is supposed
to deal with it by changing the PCI device configuration ...
this does not happen or does not have the desired
result (maybe this is caused by working in a pass-trough
environment?).

- Matthias

Am 23.06.2013 16:28, schrieb Matthias Prager:
 I did some more digging and came up with a partial
 workaround:
 After adding the line:
  { PCI_VDEVICE(JMICRON, 0x236f), board_ahci_ign_iferr },
 (at line 301 of drivers/ata/ahci.c)
 The the sata ports of my two cards get detected and lspci -k shows
 they are using the ahci driver.
 
 My guess is the 'RAID bus controller [0104]' mode/class is keeping
 my cards from beeing detected (they probably would be if they were
 in PCI_CLASS_STORAGE_SATA_AHCI mode). Which would mean line 297
 is just plain buggy.
 
 This still leaves the problem of the missing IDE Ports (one master+slave
 port per card are still not detected). I'm trying to
 understand how the pata_jmicron driver is supposed to work
 but haven't wrapped my head around it yet.
 
 - Matthias
 
 Am 19.06.2013 15:12, schrieb Matthias Prager:
 Hello everyone,

 I'm having a hard time getting my JMicron JMB363 PCI SATA/IDE Card
 to work under linux.
 The 'lspci -nn' output reads as follows:
 RAID bus controller [0104]: JMicron Technology Corp. JMB363 SATA/IDE
 Controller [197b:2363] (rev 03)

 I tried my own kernel (3.9.6) under gentoo with CONFIG_PATA_JMICRON and
 CONFIG_SATA_AHCI enabled. And I tried using
 the latest SystemrescueCD (with kernel 3.4.47) which has pretty much
 everything compiled into it.

 FreeBSD is recognizing the devices just fine. How do I get them
 to run on linux?

 - Matthias

 P.S. a side note: this is in an virtualized environment (VMWare ESXi)
 with pass-through.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


JMicron JMB363 PCI SATA/IDE Card Support

2013-06-19 Thread Matthias Prager
Hello everyone,

I'm having a hard time getting my JMicron JMB363 PCI SATA/IDE Card
to work under linux.
The 'lspci -nn' output reads as follows:
 RAID bus controller [0104]: JMicron Technology Corp. JMB363 SATA/IDE
 Controller [197b:2363] (rev 03)

I tried my own kernel (3.9.6) under gentoo with CONFIG_PATA_JMICRON and
CONFIG_SATA_AHCI enabled. And I tried using
the latest SystemrescueCD (with kernel 3.4.47) which has pretty much
everything compiled into it.

FreeBSD is recognizing the devices just fine. How do I get them
to run on linux?

- Matthias

P.S. a side note: this is in an virtualized environment (VMWare ESXi)
with pass-through.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mpt2sas + raid10 goes boom

2013-04-09 Thread Matthias Prager
Thanks for your insights Baruch.
The crc count did not increase any further - so this was probably just
small oddity (was zero before when the write-same issue already
happened). The real issue however does persist. I found a way to
reliably trigger the log messages. Using a program called checksum over
a photo share (which does a lot of reads and one write per file). With
that in place I switched to the 3.4.38 kernel, with which I'm unable to
trigger the problem. I will leave the system at that for now, and try to
reproduce it on my testing machine to see, whether
  c8dc9c6 md: raid1,10: Handle REQ_WRITE_SAME flag in write bios
works for me.

If c8dc9c6 does the trick, it would still be interesting to know why and
how this triggered i/o errors, the strange log message from lsi and
'Resets Between Cmd Acceptance and Completion' as one of the drives
says. Would that mean the driver/firmware from lsi is issuing or passing
on commands to the drive which it does not understand or can't process?

---
Matthias
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mpt2sas + raid10 goes boom

2013-04-09 Thread Matthias Prager
Hello everyone,

an update: I was able to reproduce the problem on my testing
machine (at least sort of) and confirmed that
  c8dc9c6 md: raid1,10: Handle REQ_WRITE_SAME flag in write bios
fixes things.
Also applied c8dc9c6 to the main system's 3.8.6 kernel. Working without
any issues.

One interesting detail: on the testing system's 9240-8i lsi controller
(megaraid_sas instead of mpt2sas driver) the log messages where
different and seemingly not coming from controller errors:

Apr  9 12:59:17 kernel: bio too big device sdb (4096  256)
Apr  9 12:59:17 kernel: md12: WRITE SAME failed. Manually zeroing.

Whereas the lsi 9211-8i gave these messages:

[ 2772.726292] mpt2sas0: log_info(0x31120320): originator(PL), code(0x12), 
sub_code(0x0320)
[ 2772.726296] mpt2sas0: log_info(0x31120320): originator(PL), code(0x12), 
sub_code(0x0320)
[ 2772.940873] mpt2sas0: log_info(0x31120320): originator(PL), code(0x12), 
sub_code(0x0320)
[ 2773.205568] mpt2sas0: log_info(0x31120320): originator(PL), code(0x12), 
sub_code(0x0320)
[ 2773.953718] mpt2sas0: log_info(0x31120320): originator(PL), code(0x12), 
sub_code(0x0320)
[ 2774.203121] mpt2sas0: log_info(0x31120320): originator(PL), code(0x12), 
sub_code(0x0320)
[ 2774.452462] mpt2sas0: log_info(0x31120320): originator(PL), code(0x12), 
sub_code(0x0320)
[ 2774.452476] sd 0:0:4:0: [sde] Unhandled error code
[ 2774.452479] sd 0:0:4:0: [sde]  
[ 2774.452480] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[ 2774.452482] sd 0:0:4:0: [sde] CDB: 
[ 2774.452483] Write(10): 2a 00 a9 84 24 08 00 00 08 00
[ 2774.452491] end_request: I/O error, dev sde, sector 2844009480
[ 2774.452636] md3: WRITE SAME failed. Manually zeroing.

The former one looks a lot more friendly to me.

@lsi guys: this reveals a bug in the mpt2sas driver and/or
   the 9211co series firmware, correct?

Thanks everyone for your help so far,
Matthias
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: megaraid_sas: problem with specific hardware only with kernel 3.2.5 and above

2012-12-19 Thread Matthias Prager
Hello everyone, hello Michal,

I can confirm and reproduce this issue on a different set of hardware:

Intel S1200BTLR Mainboard
LSI MegaRAID 9266-4i Raid-Controller

I haven't tried using a kernel older than 3.2.5 so far, but with kernels
3.2.24 and 3.5.0 I get the following dmesg output:

[2.621164] megasas: 00.00.06.15-rc1 Mon. Mar. 19 17:00:00 PDT 2012
[2.621179] megasas: 0x1000:0x005b:0x1000:0x9269: bus 1:slot 0:func 0
[2.621599] megasas: Waiting for FW to come to ready state
[2.621601] megasas: FW in FAULT state!!

As a result I can't access any drives connected to the controller.

Tested Controller firmwares: 23.7.0-0031, 23.7.0-0035 and 23.9.0-0015
Tested Mainboard BIOSes: R0037 (BMC 1.14, FRUSDR 1.14)

I contacted Intel and LSI support a while ago. Intel shot me down by
telling me this wasn't their problem. LSI at least tried to resolve the
issue but thus far failed to do so.

Unfortunately I don't have access to the system on a day-to-day basis,
but I will test eventual fixes as soon as I get the chance.

---
Matthias Prager
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-08-16 Thread Matthias Prager
Am 16.08.2012 20:26, schrieb Robert Trace:
 On 07/25/2012 03:55 PM, James Bottomley wrote:

 Well, reading it, so do I.  Unfortunately, we get to deal with the world
 as it is rather than as we would wish it to be.  We likely have this
 problem with a lot of USB SATLs as well ...
 
 Has this patch made it into the main git trees yet?

Not yet, but it is in James scsi misc tree and last I heard was
scheduled for inclusion in the 3.6 kernel.

Anyways here is his commit:
http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb

 
 I haven't seen anything about it in nearly a month, but I've been using
 the James' patch since he posted it and the sleep/wakeup behavior seems
 improved/correct.

I have been running smoothly with the patch too - problem solved I'd say :-)

 
 -- Robert
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-25 Thread Matthias Prager
Hello James,

Am 25.07.2012 21:55, schrieb James Bottomley:
 It looks like a hack like this might be needed.

 James


SNIP

I don't yet understand all the code but I'm following your discussion
with Tejun: I've set up a minimal vm running gentoo with a mpt2sas
driven controller in passthrough mode. I've applied your proposed patch
against the vanilla 3.5.0 kernel (which includes Tejun's commit), and
I'm happy to report the problem does seem to get fixed by it.
Well at least sending the sata drive in standby using 'hdparm -y' now
works (according to 'hdparm -C') without these nasty i/o errors on later
i/o. That is to say the drive wakes up again (e.g. from a 'fdisk -l
/dev/sda' command) and returns data.

--
Matthias
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-24 Thread Matthias Prager
Hello everyone,

I retested with a new firmware (P14 - released today), since it contains
a bunch of sata and SATL fixes (according to the changelog).
Unfortunately the observed behavior is unchanged (tested on a 3.4.5 kernel).

Just wanted to let everyone know.

Cheers
Matthias
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-22 Thread Matthias Prager
Hello Tejun,

Am 22.07.2012 19:31, schrieb Tejun Heo:
 I haven't consulted SAT but it seems like a bug in SAS driver or
 firmware.  If it's a driver bug, we better fix it there.  If a
 firmware bug, working around those is one of major roles of drivers,
 so I think setting allow_restart is fine.

as it turns out my workaround (setting allow_restart=1) isn't all that
useful after all. There are no more i/o errors because the drive just
never goes to standby mode anymore (at least 'hdparm -y /dev/sda' does
not seem to have any effect anymore). I don't really understand why - do
sas drives ever get to standby mode? (they have allow_restart=1 set by
default) And is this desired or expected behavior for sata disk on sas
controllers?

For the moment the only way for me to have my sata drives sleeping
without i/o errors is to revert your original commit
(85ef06d1d252f6a2e73b678591ab71caad4667bb - tested with kernels 3.1.10,
3.4.4, 3.4.5, 3.4.6 and 3.5.0)

--
Matthias

P.S. I hope I'm not getting on everybody's nerves here (especially yours
Tejun)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-21 Thread Matthias Prager
Am 17.07.2012 22:01, schrieb Tejun Heo:
 On Tue, Jul 17, 2012 at 09:39:41PM +0200, Matthias Prager wrote:
 I could not however reproduce the issue on any other device than a LSI
 SAS controller (using SATA disks) - on a regular ICH10 using AHCI and a
 SATA drive I don't see these i/o errors. But since I'm experiencing
 these issues on two different systems (both with lsi controllers while
 running vmware-guests on them) and Robert sees them on his
 (non-virtualized) system with the same lsi controller (9211-8i), I'm
 inclined to make the following assumptions:
 Either it is an issue which is limited to this controller and possibly
 sata disks hanging off it or it is a more general issue with sas
 controllers and sata disks (again it could well affect sas disks too).
 Lacking other controllers or sas disks I can't be sure.
 
 So, nothing in the libata stack generates NOT_READY - initializing
 command required.  I suppose it's LSI firmware / driver translating
 TUR to CHECK_POWER_MODE and generating NOT_READY.  I don't know what
 SAT says about this but this can't be correct.  An ATA device in
 standby mode is ready to process any commands.  It should be able to
 come back to full operation on demand as necessary and that's why it
 can be transparently enabled from device side.  Eric?
 

While reading the linux-scsi mailing list I stumbled upon

'[Bug 16070] Fail to issue Start/Stop Unit'
http://marc.info/?l=linux-scsim=134278835822649w=2
(bugtracker: https://bugzilla.kernel.org/show_bug.cgi?id=16070)

which lead me to trying to enable the 'allow_restart' flag for my disks.
With this workaround a vanilla kernel 3.4.5 does not exhibit the i/o
errors on sleeping sata disks hanging off sas controllers.


I'm currently running one of my systems with a

'echo 1 | tee /sys/block/sd?/device/scsi_disk/*/allow_restart /dev/null'

line added to the init scripts. This way I can use the untouched kernel
sources and still get around the i/o errors. But I reckon this is no
solution.


I'm no expert on scsi/sas/ata internals, so please take the following
thoughts with a grain of salt:

As far as I can see (and Tejun confirmed that - I think) Tejun commit
85ef06d1d252f6a2e73b678591ab71caad4667bb somehow exposes a bug, which
lies deeper in the sas/ata code. The 'sas_slave_configure()' function in
'drivers/scsi/libsas/sas_scsi_host.c' sets the 'allow_restart' flag for
sas disks hanging off sas controllers. But if it encounters a sata disk
it calls 'ata_sas_slave_configure()' in 'drivers/ata/libata_scsi.c'
instead and returns without enabling the 'allow_restart' flag. A simple
fix would be to set allow_restart=1 after having called
'ata_sas_slave_configure()' but before returning (in
'sas_slave_configure()').

Now I'm not sure this isn't taping over another bug. Which leads me to
my question: What is the correct behavior?

#1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o
by setting allow_restart=1 for sata disks on sas controllers

or

#2 Teaching the sas drivers they do not need spin-up commands and can
simply start issuing i/o to sata disks

--
Matthias
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-17 Thread Matthias Prager
Hello Tejun,

Am 17.07.2012 20:09, schrieb Tejun Heo:
 Hello,
 
 On Wed, Jul 11, 2012 at 03:48:00PM +0200, Matthias Prager wrote:
 I'm trying to understand why this commit leads to the issue of i/o
 failing on spun down drives, in hopes of being able to fix it. Meanwhile
 maybe Tejun Heo (author of the commit) or Jens Axboe (the committer) are
 able to shed some light on this (I've included them in the CC list).
 
 Nothing rings a bell for me.  How does it fail?  The only thing it
 change is when and which media check commands are issued.

I will try to describe the issue as best as I can (please feel free to
point me to more helpful debugging steps or guides):
Whenever I put a drive to sleep (either via 'hdparm -y ...' or by
letting it run into standby timeout) and issue i/o's afterwards (like
with the help of 'fdisk -l') I get back i/o errors (along the lines of
'end_request: I/O error, ...' - see previous posts in this thread) and
the drive remains in standby (instead of waking up).

Robert (who also saw these errors) bisected the issue down to your
patch. And without it kernels 3.1.10 + 3.4.4 run smoothly for him and me.

I could not however reproduce the issue on any other device than a LSI
SAS controller (using SATA disks) - on a regular ICH10 using AHCI and a
SATA drive I don't see these i/o errors. But since I'm experiencing
these issues on two different systems (both with lsi controllers while
running vmware-guests on them) and Robert sees them on his
(non-virtualized) system with the same lsi controller (9211-8i), I'm
inclined to make the following assumptions:
Either it is an issue which is limited to this controller and possibly
sata disks hanging off it or it is a more general issue with sas
controllers and sata disks (again it could well affect sas disks too).
Lacking other controllers or sas disks I can't be sure.

Thank you for taking the time to look into this - it's much appreciated
Matthias
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-11 Thread Matthias Prager
Am 11.07.2012 01:27, schrieb Robert Trace:
 On 07/09/2012 09:51 PM, Robert Trace wrote:

 Huh..  I just retested this and I'm seeing really random behavior.
 
 Ok, with a refined test I've been able to reliably reproduce this and I
 bisected it back to commit 85ef06d1d252f6a2e73b678591ab71caad4667bb in
 Linus' tree (introduced between 3.0 and 3.1):
 
 commit 85ef06d1d252f6a2e73b678591ab71caad4667bb
 Author: Tejun Heo t...@kernel.org
 Date:   Fri Jul 1 16:17:47 2011 +0200
 
 block: flush MEDIA_CHANGE from drivers on close(2)
 
 Prior to the above commit, sleeping disks will spin up as a result of
 I/O sent to them.  With the above commit, they don't spin up and
 immediately return an I/O failure.
This is good news thank you. I can confirm your findings - omitting
commit 85ef06d1d252f6a2e73b678591ab71caad4667bb solves my initial issue
here (with 3.1.10).

 
 That's all the further I've gotten so far.  I'll be happy to test any
 patches or suggestions.
 
 -- Rob
 


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-11 Thread Matthias Prager
I just tested kernel version 3.4.4 without commit
85ef06d1d252f6a2e73b678591ab71caad4667bb and it also works fine (beware
of commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 as it conflicts with
reverting 85ef06d1d252f6a2e73b678591ab71caad4667bb).

I'm trying to understand why this commit leads to the issue of i/o
failing on spun down drives, in hopes of being able to fix it. Meanwhile
maybe Tejun Heo (author of the commit) or Jens Axboe (the committer) are
able to shed some light on this (I've included them in the CC list).

Matthias
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Matthias Prager
Hello linux-scsi and linux-raid,

I did some further research regarding my problem.
It appears to me the fault does not lie with the mpt2sas driver (not
that I can definitely exclude it), but with the md implementation.

I reproduced what I think is the same issue on a different machine (also
running Vmware ESXi 5 and an LSI 9211-8i in IR mode) with a different
set of hard-drives of the same model. Using systemrescuecd
(2.8.1-beta003) and booting the 64bit 3.4.4 kernel, I issued the
following commands:

1) 'hdparm -y /dev/sda' (to put the hard-drive to sleep)
2) 'mdadm --create /dev/md1 --metadata 1.2 --level=mirror
--raid-devices=2 --name=test1 /dev/sda missing'
3) 'fdisk -l /dev/md127' (for some reason /proc/mdstat indicates the md
is being created as md127)

2) gave me this feedback:
--
mdadm: super1.x cannot open /dev/sda: Device or resource busy
mdadm: /dev/sda is not suitable for this array.
mdadm: create aborted
---
Even though it says creating aborted it still created md127.

And 3) lead to these lines in dmesg:
---
[  604.838640] sd 2:0:0:0: [sda] Device not ready
[  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
20 00
[  604.838680] end_request: I/O error, dev sda, sector 2048
[  604.838688] Buffer I/O error on device md127, logical block 0
[  604.838695] Buffer I/O error on device md127, logical block 1
[  604.838699] Buffer I/O error on device md127, logical block 2
[  604.838702] Buffer I/O error on device md127, logical block 3
[  604.838783] sd 2:0:0:0: [sda] Device not ready
[  604.838785] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.838789] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.838793] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.838797] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
08 00
[  604.838805] end_request: I/O error, dev sda, sector 2048
[  604.838808] Buffer I/O error on device md127, logical block 0
[  604.838983] sd 2:0:0:0: [sda] Device not ready
[  604.838986] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.838989] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.838993] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.838998] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
08 00
[  604.839006] end_request: I/O error, dev sda, sector 146514
[  604.839009] Buffer I/O error on device md127, logical block 183143355
[  604.839087] sd 2:0:0:0: [sda] Device not ready
[  604.839090] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.839093] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.839097] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.839102] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
08 00
[  604.839110] end_request: I/O error, dev sda, sector 146514
[  604.839113] Buffer I/O error on device md127, logical block 183143355
[  604.839271] sd 2:0:0:0: [sda] Device not ready
[  604.839274] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.839278] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.839282] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.839286] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
20 00
[  604.839321] end_request: I/O error, dev sda, sector 2048
[  604.839324] Buffer I/O error on device md127, logical block 0
[  604.839330] Buffer I/O error on device md127, logical block 1
[  604.840494] sd 2:0:0:0: [sda] Device not ready
[  604.840497] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.840504] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.840512] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.840516] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
08 00
[  604.840526] end_request: I/O error, dev sda, sector 2048
--

This excludes hardware-errors (different physical machine and devices)
as cause and also ext4 which the other system was using as filesystem.
Maybe Neil Brown (who scripts/get_maintainer.pl identified as the
maintainer of the md-code) can make bits and pieces of this. It may well
be this is the same problem but a different error-path - I don't know.

I will try to make the scenario more generic, but I don't have a
non-virtual machine to spare atm. Also please do let me know if I'm
posting this to the wrong lists (linux-scsi and linux-raid) or if there
is anything which might not be helpful with the way I'm reporting this.

Regards,
Matthias Prager
--
To unsubscribe from this list: send the line

Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Matthias Prager
Am 10.07.2012 00:08, schrieb NeilBrown:
 On Mon, 09 Jul 2012 16:40:15 +0200 Matthias Prager li...@matthiasprager.de
 wrote:
 
 Even though it says creating aborted it still created md127.
 
 One of my pet peeves in when people interpret the observations wrongly and
 then report their interpretation instead of their observation.  However
 sometimes it is very hard to separate the two.  You comment above looks
 perfectly reasonable and looks like a clean observation and not and
 interpretation.  Yet it is an interpretation :-)
 
 The observation would be
Even though it says creating abort, md127 was still created.
 
 You see, it wasn't this mdadm that created md127 - it certainly shouldn't
 have as you asked it to create md1.
Sry - I jumped to conclusions without knowing what was actually going on.

 
 I don't know the exact sequence of events, but something - possibly relating
 to the error messages reported below - caused udev to notice /dev/sda.
 udev then ran mdadm -I /dev/sda and as it had some metadata on it, it
 created an array with it.  As the name information in that metadata was
 probably test1 or similar, rather than 1, mdadm didn't know what number
 was wanted for the array, so it chose a free high number - 127.
 
 This metadata must have been left over from an earlier experiment.
That is correct (as am just realizing now). There is metadata of an
raid1 array left on the disk even though it was used (for a short time)
with zfs on freebsd before doing these experiments.

 
 So it might have been something like.
 
 - you run mdadm (call this mdadm-1).
 - mdadm tries to open sda
 - driver notices that device is asleep, and wakes it up
 - the waking up of the device causes a CHANGE uevent to udev
 - this cause udev to run a new mdadm - mdadm-2
 - mdadm-2 reads the metadata, sees old metadata, assembled sda in a new md127
 - mdadm-1 gets scheduled again, tries to get O_EXCL access to sda and fails, 
   because sda is now part of md127
 
 Clearly undesirable behaviour.  I'm not sure which bit is wrong.
As it turns out mdadm is doing everything right. md127 is actually
already present (though inactive) at boot-time. So mdadm is absolutly
correct in saying sda is busy and refusing to do anything further.

 
 NeilBrown
 

The real problem seems to be located in some layer below md, which is
not waking up the disk for any i/o (at all - not even for fdisk -l).

Matthias
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Matthias Prager
Am 09.07.2012 21:37, schrieb Robert Trace:
 I did some further research regarding my problem.
 It appears to me the fault does not lie with the mpt2sas driver (not
 that I can definitely exclude it), but with the md implementation.
 
 I'm actually discovering some of the same issues (LSI 9211-8i w/ SATA
 disks), but I've come to a slightly different conclusion.
 
 I noticed that when my SATA disks are on a SATA controller and they spin
 down (or are spun down via hdparm -y), then they response to TUR (TEST
 UNIT READY) commands with an OK.  Any I/O sent to these disks simply
 wait while the disks spin up and then complete as usual.
 
 However, my SATA disks on the SAS controller respond to TUR with the
 sense error Not Ready/Initializing command required.  Any I/O sent to
 these disks immediately fails.  You saw this in your logging:
 
 [  604.838640] sd 2:0:0:0: [sda] Device not ready
 [  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
 driverbyte=DRIVER_SENSE
 [  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
 [  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
 initializing command required
 [  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
 20 00
 [  604.838680] end_request: I/O error, dev sda, sector 2048
 [  604.838688] Buffer I/O error on device md127, logical block 0
 [  604.838695] Buffer I/O error on device md127, logical block 1
 [  604.838699] Buffer I/O error on device md127, logical block 2
 [  604.838702] Buffer I/O error on device md127, logical block 3
 
 Sending an explicit START UNIT command to these sleeping disks will wake
 them up and then they behave normally.  (BTW, you can issue TURs and
 START UNITs via the sg_turs and sg_start commands).
Thanks for these pointers.

 
 I've reproduced this behavior on the raw disks themselves, no MD layer
 involved (although the freak-out by my MD layer is what alerted me to
 this issue too... Having your entire array punted the first time you
 access it is a little scary :-).  I'm also on raw hardware and I've seen
 this behavior on kernels 3.0.33 through 3.4.4.
This is interesting - are you sure about 3.0.33? I'm running this kernel
atm for it gives me no trouble (as opposed to =3.1.10). The SATA disks
are spun up when I access data on them.

 
 So, SATA disks respond differently depending on the controller they're
 on.  I don't know if this is a SCSI thing, a SAS thing or a
 firmware/driver thing for the 9211.
 
 Now, whether or not the MD layer should be assembling arrays from
 failed disks is, I think, a separate issue.
I realize now in my cases the MD layer behaved correctly.

 
 -- Rob
 


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Device not ready' issue on mpt2sas since 3.1.10

2012-07-09 Thread Matthias Prager
Am 10.07.2012 00:24, schrieb Robert Trace:
 
 Also, TURs don't appear to actually wake the disk up (should they?).
 The only thing I've found that'll wake the disk up is an explicit START
 UNIT command.

I haven't checked the scsi logging side, but about the only commands
that wake up the disks are 'smartctl -a /dev/sda' and 'sg_start'
(smartcl maybe issuing a START UNIT command on it's own).

Matthias
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html