On 2018-02-13 02:18 PM, Felix Kuehling wrote:
> On 2018-02-13 01:15 PM, Christian König wrote:
>> Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
>>> On 2018-02-13 12:06 PM, Christian König wrote:
>>>> Ah, yeah that is also a point I wanted to to talk about with you.
>>>> The approach of using the same buffer object with multiple amdgpu
>>>> devices doesn't work in general.
>>>> We need separate TTM object for each BO in each device or otherwise we
>>>> break A+A laptops.
>>> I think it broke for VRAM BOs because we enabled P2P on systems that
>>> didn't support it properly. But at least system memory BOs can be shared
>>> quite well between devices and we do it all the time.
>> Sharing VRAM BOs is one issue, but the problem goes deeper than just
>> Starting with Carizzo we can scanout from system memory to avoid the
>> extra copy on A+A laptops. For this to work we need the BO mapped to
>> GART (and I mean a real VMID0 mapping, not just in the GTT domain).
>> And for this to work in turn we need a TTM object per device and not a
>> global one.
> I still don't understand. I think what you're talking about applies only
> to BOs used for scan-out. Every BO is allocated from a specific device
> and can only be GART-mapped on that device. What we do is map the same
> BO in VMs on other devices. It has no effect on GART mappings.
>>> I don't see how you can have separate TTM objects referring to the
>>> same memory.
>> Well that is trivial, we do this all the time with prime and I+A laptops.
> As I understand it, you use DMABuf to export/import buffers on multiple
> devices. I believe all devices share a single amdgpu_bo, which contains
> the ttm_buffer_object. The only way you can have a new TTM buffer object
> per device is by using SG tables and pinning the BOs. But I think we
> want to avoid pinning BOs.
> What we do does not involve pinning of BOs, even when they're shared
> between multiple devices' VMs.
>>>> That is also the reason we had to disable this feature again in the
>>>> hybrid branches.
>>> What you disabled on hybrid branches was P2P, which only affects
>>> large-BAR systems. Sharing of system memory BOs between devices still
>>> works fine.
>> No, it doesn't. It completely breaks any scanout on Carizzo, Stoney
>> and Raven. Additional to that we found that it breaks some aspects of
>> the user space interface.
> Let me check that with my current patch series. The patches I submitted
> here shouldn't include anything that breaks the use cases you describe.
> But I'm quite sure it will support sharing BOs between multiple devices'
Confirmed with the current patch series. I can allocate buffers on GPU 1
and map them into a VM on GPU 2.
BTW, this is the normal mode of operation for system memory BOs on
multi-GPU systems. Logically system memory BOs belong to the CPU node in
the KFD topology. So the Thunk API isn't told which GPU to allocate the
system memory BO from. It just picks the first one. Then we can map
those BOs on any GPUs we want. If we want to use them on GPU2, that's
where they get mapped.
So I just put two GPUs in a system and ran a test on GPU2. The system
memory buffers are allocated from the GPU1 device, but mapped into the
GPU2 VM. The tests work normally.
If this is enabled by any changes that break existing buffer sharing for
A+A or A+I systems, please point it out to me. I'm not aware that this
patch series does anything to that effect.
>> So end result is that we probably need to revert it and find a
>> different solution. I'm already working on this for a couple of weeks
>> now and should have something ready after I'm done with the PASID
amd-gfx mailing list