On 2014/8/12 11:18, Jiang Liu wrote:
> On 2014/8/12 9:37, Yijing Wang wrote:
>> On 2014/8/11 22:59, Linda Knippers wrote:
>>> On 8/11/2014 12:43 AM, Alex Williamson wrote:
>>>> On Mon, 2014-08-11 at 10:54 +0800, Yijing Wang wrote:
>>>>> We found some strange devices in HP C7000 and Huawei Server. These devices
>>>>> can not be enumerated by OS, but they still did DMA read/write without OS 
>>>>> management. Because iommu will not create the DMA mapping for these 
>>>>> devices,
>>>>> the DMA read/write will be blocked by iommu hardware.
>>>>>
>>>>> Eg.
>>>>>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>>>>>              +-01.0-[11]--
>>>>>                    +-01.1-[02]--
>>>>>                    +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 
>>>>> 10Gb NIC (be3)
>>>>>            |            +-00.1  Emulex Corporation OneConnect 10Gb NIC 
>>>>> (be3)
>>>>>            |            +-00.2  Emulex Corporation OneConnect 10Gb iSCSI 
>>>>> Initiator (be3)
>>>>>            |            \-00.3  Emulex Corporation OneConnect 10Gb iSCSI 
>>>>> Initiator (be3)
>>>>>            +-02.1-[12]--
>>>>> Kernel only found four devices in bus 0x04, but we found following DMA 
>>>>> errors in dmesg.
>>>>>
>>>>> [ 1438.477262] DRHD: handling fault status reg 402
>>>>> [ 1438.498278] DMAR:[DMA Write] Request device [04:00.4] fault addr 
>>>>> bdf70000 
>>>>> [ 1438.498280] DMAR:[fault reason 02] Present bit in context entry is 
>>>>> clear
>>>>> [ 1438.566458] DMAR:[DMA Write] Request device [04:00.5] fault addr 
>>>>> bdf70000 
>>>>> [ 1438.566460] DMAR:[fault reason 02] Present bit in context entry is 
>>>>> clear
>>>>> [ 1438.635211] DMAR:[DMA Write] Request device [04:00.6] fault addr 
>>>>> bdf70000 
>>>>> [ 1438.635213] DMAR:[fault reason 02] Present bit in context entry is 
>>>>> clear
>>>>> [ 1438.703849] DMAR:[DMA Write] Request device [04:00.7] fault addr 
>>>>> bdf70000 
>>>>> [ 1438.703851] DMAR:[fault reason 02] Present bit in context entry is 
>>>>> clear
>>>>>
>>>>> Signed-off-by: Yijing Wang <wangyij...@huawei.com>
>>>>> ---
>>>>>  arch/x86/include/asm/iommu.h |    2 ++
>>>>>  arch/x86/kernel/pci-dma.c    |    8 ++++++++
>>>>>  drivers/iommu/intel-iommu.c  |   41 
>>>>> +++++++++++++++++++++++++++++++++++++++++
>>>>>  3 files changed, 51 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
>>>>> index 345c99c..5e3a2d8 100644
>>>>> --- a/arch/x86/include/asm/iommu.h
>>>>> +++ b/arch/x86/include/asm/iommu.h
>>>>> @@ -5,6 +5,8 @@ extern struct dma_map_ops nommu_dma_ops;
>>>>>  extern int force_iommu, no_iommu;
>>>>>  extern int iommu_detected;
>>>>>  extern int iommu_pass_through;
>>>>> +extern int iommu_pt_force_bus;
>>>>> +extern int iommu_pt_force_domain;
>>>>>  
>>>>>  /* 10 seconds */
>>>>>  #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
>>>>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>>>>> index a25e202..bf21d97 100644
>>>>> --- a/arch/x86/kernel/pci-dma.c
>>>>> +++ b/arch/x86/kernel/pci-dma.c
>>>>> @@ -44,6 +44,8 @@ int iommu_detected __read_mostly = 0;
>>>>>   * guests and not for driver dma translation.
>>>>>   */
>>>>>  int iommu_pass_through __read_mostly;
>>>>> +int iommu_pt_force_bus = -1;
>>>>> +int iommu_pt_force_domain = -1;
>>>>>  
>>>>>  extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];
>>>>>  
>>>>> @@ -146,6 +148,7 @@ void dma_generic_free_coherent(struct device *dev, 
>>>>> size_t size, void *vaddr,
>>>>>   */
>>>>>  static __init int iommu_setup(char *p)
>>>>>  {
>>>>> + char *end;
>>>>>   iommu_merge = 1;
>>>>>  
>>>>>   if (!p)
>>>>> @@ -192,6 +195,11 @@ static __init int iommu_setup(char *p)
>>>>>  #endif
>>>>>           if (!strncmp(p, "pt", 2))
>>>>>                   iommu_pass_through = 1;
>>>>> +         if (!strncmp(p, "pt_force=", 9)) {
>>>>> +                 iommu_pass_through = 1;
>>>>> +                 iommu_pt_force_domain = simple_strtol(p+9, &end, 0);
>>>>> +                 iommu_pt_force_bus = simple_strtol(end+1, NULL, 0);
>>>>
>>>> Documentation/kernel-parameters.txt?
>>>>
>>>>> +         }
>>>>>  
>>>>>           gart_parse_options(p);
>>>>>  
>>>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>>>> index d1f5caa..49757f1 100644
>>>>> --- a/drivers/iommu/intel-iommu.c
>>>>> +++ b/drivers/iommu/intel-iommu.c
>>>>> @@ -2705,6 +2705,47 @@ static int __init 
>>>>> iommu_prepare_static_identity_mapping(int hw)
>>>>>                           return ret;
>>>>>           }
>>>>>  
>>>>> + /* We found some strange devices in HP c7000 and other platforms that
>>>>> +  * can not be enumerated by OS, but they did DMA read/write without
>>>>> +  * driver management, so we should create the pt mapping for these
>>>>> +  * devices to avoid DMA errors. Add iommu=pt_force=segment:busnum to
>>>>> +  * force to do pt context mapping in the bus number.
>>>>> +  */
>>>>
>>>> So best case with this patch is that the user needs to discover that
>>>> this option exists, figure out the undocumented parameters, be running
>>>> on VT-d, permanently add a kernel commandline option, and never have any
>>>> intention of assigning the device to userspace or a VM...
>>>>
>>>> Can't we handle this with the DMA alias quirks that are now in 3.17?  Or
>>>> can the vendor fix this with a firmware update?  This device behavior is
>>>> really quite broken for this kind of server class product.  
>>>
>>> Yeah, something doesn't sound right here.
>>>
>>> I would like to hear more about this configuration, off list if you prefer.
>>> What servers?  What firmware revisions?
>>
>> Hi Linda, we found this issue in HP C7000 server. I attached the dmesg and 
>> lspci info,
>> because the machine is in product department, so I don't know the firmware 
>> revision.
>>
>> Thanks!
>> Yijing.
> Hi Yijing,
>       I still suspect something is wrong with ARI support
> instead of Phantom Function.
>       According to lspci output:
> 1) Root port 00:02.0 has ARIFwd enabled in DevCtl2
> 2) Function 04:00.[0-3] all have Alternative Routing-ID Interpretation
>    capability.
> So could you please try to clear ARIFwd bit in devctl2 when enumerating
> root port 00:02.0?
> 
> BTW, do function 04:00.[0-3] encounter any other issues except the
> IOMMU warnings?

Hi Gerry, I cleared the ARIFwd bit and rescan pci device(echo 1 > 
/sys/bus/pci/rescan), but nothing changed.
Because the 04:00.0/1/2/3 are ARI devices, so the root port will be forced to 
set ARIFwd bit. There has some
problem to change and rebuild the kernel now.

Other, 04:00.0-3 are 10Ge net devices, I guess no one uses it now, so no other 
errors found yet.

Gerry, what ARI problem do you suspect ?

> 
> Thanks!
> 
> 
>>
>>
>>>>
>>>>> + if (iommu_pt_force_bus >= 0 && iommu_pt_force_bus >= 0) {
>>>>> +         int found = 0;
>>>>> +
>>>>> +         iommu = NULL;
>>>>> +         for_each_active_iommu(iommu, drhd) {
>>>>> +                 if (iommu_pt_force_domain != drhd->segment)
>>>>> +                         continue;
>>>>> +
>>>>> +                 for_each_active_dev_scope(drhd->devices, 
>>>>> drhd->devices_cnt, i, dev) {
>>>>> +                         if (!dev_is_pci(dev))
>>>>> +                                 continue;
>>>>> +
>>>>> +                         pdev = to_pci_dev(dev);
>>>>> +                         if (pdev->bus->number == iommu_pt_force_bus ||
>>>>> +                                         (pdev->subordinate
>>>>> +                                          && pdev->subordinate->number 
>>>>> <= iommu_pt_force_bus
>>>>> +                                          && 
>>>>> pdev->subordinate->busn_res.end >= iommu_pt_force_bus)) {
>>>>> +                                 found = 1;
>>>>> +                                 break;
>>>>> +                         }
>>>>> +                 }
>>>>> +
>>>>> +                 if (drhd->include_all) {
>>>>> +                         found = 1;
>>>>> +                         break;
>>>>> +                 }
>>>>> +         }
>>>>> +
>>>>> +         if (found && iommu)
>>>>> +                 for (i = 0; i < 256; i++)
>>>>> +                         domain_context_mapping_one(si_domain, iommu, 
>>>>> iommu_pt_force_bus,
>>>>> +                                         i,  hw ? 
>>>>> CONTEXT_TT_PASS_THROUGH :
>>>>> +                                         CONTEXT_TT_MULTI_LEVEL);
>>>>> + }
>>>>> +
>>>>>   return 0;
>>>>>  }
>>>>>  
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> iommu mailing list
>>>> iommu@lists.linux-foundation.org
>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>
>>>
>>>
>>> .
>>>
>>
>>
> 
> .
> 


-- 
Thanks!
Yijing

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to