On 2022-12-15 06:53, Robin Murphy wrote:
> On 2022-12-15 11:40, Luben Tuikov wrote:
>> On 2022-12-15 06:27, Christian König wrote:
>>> Am 15.12.22 um 11:19 schrieb Luben Tuikov:
>>>> On 2022-12-15 04:46, Christian König wrote:
>>>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>>>>>> On 2022-12-15 03:07, Christian König wrote:
>>>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.mur...@arm.com>
>>>>>>>>> wrote:
>>>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>>>>>>
>>>>>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>>>>>>> AGP chip,
>>>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>>>>>>> 0x7FFFFFF, and
>>>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 
>>>>>>>>>>> 0x7FFFFFFF,
>>>>>>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>>>>>>> the last
>>>>>>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>>>>>>> use GFP_DMA32
>>>>>>>>>>> when allocating DMA buffers.
>>>>>>>>>> This sounds wrong to me. If the issues happen on systems without PAE 
>>>>>>>>>> it
>>>>>>>>>> clearly can't have anything to with the actual DMA address size. Not 
>>>>>>>>>> to
>>>>>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>>>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>>>>>>> going to the wrong place, that is also equally consistent with a
>>>>>>>>>> loss of
>>>>>>>>>> cache coherency.
>>>>>>>>>>
>>>>>>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>>>>>>> alias
>>>>>>>>>> memory to a second physical address, so I could well believe that
>>>>>>>>>> something somewhere in the driver stack needs to perform some cache
>>>>>>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>>>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>>>>>>> and thus going wrong for highmem pages.
>>>>>>>>>>
>>>>>>>>>> So as I said before, I really think this is not about using
>>>>>>>>>> GFP_DMA32 at
>>>>>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>>>>>> One of the wonderful features of AGP is that it has to be used with
>>>>>>>>> uncached memory.  The aperture basically just provides a remapping of
>>>>>>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>>>>>>> has to jump through quite a few hoops to get uncached memory in the
>>>>>>>>> first place, so it's likely that that somehow isn't compatible with
>>>>>>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>>>>>>> I guess in principle yes, if you're careful not to use regular
>>>>>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>>>>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>>>>>>> slipping up.
>>>>>>> I theory we should do exactly that in TTM, but we have very few users
>>>>>>> who actually still exercise that functionality.
>>>>>>>
>>>>>>>> Working backwards from primitives like set_memory_uc(), I see various
>>>>>>>> paths in TTM where manipulating the caching state is skipped for
>>>>>>>> highmem pages, but I wouldn't even know where to start looking for
>>>>>>>> whether the right state is propagated to all the places where they
>>>>>>>> might eventually be mapped somewhere.
>>>>>>> The tt object has the caching state for the pages and
>>>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>>>>>>> userspace/vmalloc mappings.
>>>>>>>
>>>>>> The point of this patch is that dma_addressing_limited() is unsuitable as
>>>>>> the last parameter to ttm_pool_init(), since if it is "false"--as it is 
>>>>>> in this
>>>>>> particular case--then TTM ends up using HIGHUSER, and we get the screen 
>>>>>> corruption.
>>>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
>>>>> Well I would rather say that dma_addressing_limited() works, but the
>>>>> default value from dma_get_required_mask() is broken.
>>>>>
>>>> dma_get_required_mask() for his setup of 1.5 GiB of memory returns 
>>>> 0x7FFFFFF.
>>>
>>> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB
>>> addressable memory (27 bits set)? Or is there another F missing?
>>
>> Yeah, I'm missing an F--it is correctly described at the top of the thread 
>> above,
>> i.e. in the commit of v2 patch.
>>
>> 0x7FFF_FFFF, which seems correct, no?
>>
>>>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in 
>>>> radeon_device_init().
>>>>
>>>>> 32 bits only work with bounce buffers and we can't use those on graphics
>>>>> hardware.
>>>>>
>>>>>> Is there an objection to this patch, if it fixes the screen corruption?
>>>>> Not from my side, but fixing the underlying issues would be better I 
>>>>> think.
>>>>>
>>>> Have they been identified?
>>>
>>> I'm not 100% sure. I think by using GFP_DMA32 we just work around the
>>> issue somehow.
>>
>> Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM
>> code trying to understand what we do when GFP_DMA32 is not set, and the 
>> immediate
>> thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the 
>> device struct.
>> (Then I got down to the caching attributes...)
>>
>> It's be nice if we can find the actual issue--what else would it show us 
>> that needs fixing...?
>>
>> So what do we do with this patch?
>>
>> Shouldn't leave it in a limbo--some OSes ship their kernel
>> with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing 
>> limitations") wholly
>> reverted.
> 
> Removing dma_addressing_limited() is still wrong, for the reasons given 
> in that commit. What we need is an *additional* condition that 
> encapsulates "also pass use_dma32 for AGP devices because it avoids some 
> weird coherency issue with 32-bit highmem that isn't worth trying to 
> debug further".

Yes, you had a patch earlier which did exactly that--why not push that patch?

Q: If host memory is 1.5 GiB, i.e. mask of 0x7FFF_FFFF, but the device's mask 
is 0xFFFF_FFFF,
shouldn't we use GFP_DMA32, instead of GFP_HIGHUSER?

Regards,
Luben

Reply via email to