Hi Tomasz,
On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> Hi Dmitry,
>
> On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
>
> <[email protected]> wrote:
> > Hello,
> >
> > We at OpenSynergy are also working on an abstract paravirtualized video
> > streaming device that operates input and/or output data buffers and can be
> > used as a generic video decoder/encoder/input/output device.
> >
> > We would be glad to share our thoughts and contribute to the discussion.
> > Please see some comments regarding buffer allocation inline.
> >
> > Best regards,
> > Dmitry.
> >
> > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > Hi Gerd,
> > >
> > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <[email protected]> wrote:
> > > > Hi,
> > > >
> > > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > > device to use buffers allocated by virtio-gpu device.
> > > > >
> > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > >
> > > First of all, thanks for taking a look at this RFC and for valuable
> > > feedback. Sorry for the late reply.
> > >
> > > For reference, Keiichi is working with me and David Stevens on
> > > accelerated video support for virtual machines and integration with
> > > other virtual devices, like virtio-gpu for rendering or our
> > > currently-downstream virtio-wayland for display (I believe there is
> > > ongoing work to solve this problem in upstream too).
> > >
> > > > Well. I think before even discussing the protocol details we need a
> > > > reasonable plan for buffer handling. I think using virtio-gpu buffers
> > > > should be an optional optimization and not a requirement. Also the
> > > > motivation for that should be clear (Let the host decoder write
> > > > directly
> > > > to virtio-gpu resources, to display video without copying around the
> > > > decoded framebuffers from one device to another).
> > >
> > > Just to make sure we're on the same page, what would the buffers come
> > > from if we don't use this optimization?
> > >
> > > I can imagine a setup like this;
> > >
> > > 1) host device allocates host memory appropriate for usage with host
> > >
> > > video decoder,
> > >
> > > 2) guest driver allocates arbitrary guest pages for storage
> > >
> > > accessible to the guest software,
> > >
> > > 3) guest userspace writes input for the decoder to guest pages,
> > > 4) guest driver passes the list of pages for the input and output
> > >
> > > buffers to the host device
> > >
> > > 5) host device copies data from input guest pages to host buffer
> > > 6) host device runs the decoding
> > > 7) host device copies decoded frame to output guest pages
> > > 8) guest userspace can access decoded frame from those pages; back to 3
> > >
> > > Is that something you have in mind?
> >
> > While GPU side allocations can be useful (especially in case of decoder),
> > it could be more practical to stick to driver side allocations. This is
> > also due to the fact that paravirtualized encoders and cameras are not
> > necessarily require a GPU device.
> >
> > Also, the v4l2 framework already features convenient helpers for CMA and
> > SG
> > allocations. The buffers can be used in the same manner as in virtio-gpu:
> > buffers are first attached to an already allocated buffer/resource
> > descriptor and then are made available for processing by the device using
> > a dedicated command from the driver.
>
> First of all, thanks a lot for your input. This is a relatively new
> area of virtualization and we definitely need to collect various
> possible perspectives in the discussion.
>
> From Chrome OS point of view, there are several aspects for which the
> guest side allocation doesn't really work well:
> 1) host-side hardware has a lot of specific low level allocation
> requirements, like alignments, paddings, address space limitations and
> so on, which is not something that can be (easily) taught to the guest
> OS,
I couldn't agree more. There are some changes by Greg to add support for
querying GPU buffer metadata. Probably those changes could be integrated with
'a framework for cross-device buffer sharing' (something that Greg mentioned
earlier in the thread and that would totally make sense).
> 2) allocation system is designed to be centralized, like Android
> gralloc, because there is almost never a case when a buffer is to be
> used only with 1 specific device. 99% of the cases are pipelines like
> decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> encoder and so on, which means that allocations need to take into
> account multiple hardware constraints.
> 3) protected content decoding: the memory for decoded video frames
> must not be accessible to the guest at all
This looks like a valid use case. Would it also be possible for instance to
allocate mem from a secure ION heap on the guest and then to provide the sgt
to the device? We don't necessarily need to map that sgt for guest access.
Best regards,
Dmitry.
>
> That said, the common desktop Linux model bases on allocating from the
> producer device (which is why videobuf2 has allocation capability) and
> we definitely need to consider this model, even if we just think about
> Linux V4L2 compliance. That's why I'm suggesting the unified memory
> handling based on guest physical addresses, which would handle both
> guest-allocated and host-allocated memory.
>
> Best regards,
> Tomasz
>
> > > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > > virtio-gpu resource handles. The handles are device-specific. What
> > > > if
> > > > there are multiple virtio-gpu devices present in the guest?
> > > >
> > > > I think we need a framework for cross-device buffer sharing. One
> > > > possible option would be to have some kind of buffer registry, where
> > > > buffers can be registered for cross-device sharing and get a unique
> > > > id (a uuid maybe?). Drivers would typically register buffers on
> > > > dma-buf export.
> > >
> > > This approach could possibly let us handle this transparently to
> > > importers, which would work for guest kernel subsystems that rely on
> > > the ability to handle buffers like native memory (e.g. having a
> > > sgtable or DMA address) for them.
> > >
> > > How about allocating guest physical addresses for memory corresponding
> > > to those buffers? On the virtio-gpu example, that could work like
> > >
> > > this:
> > > - by default a virtio-gpu buffer has only a resource handle,
> > > - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > >
> > > virtio-gpu device export the buffer to a host framework (inside the
> > > VMM) that would allocate guest page addresses for it, which the
> > > command would return in a response to the guest,
> > >
> > > - virtio-gpu driver could then create a regular DMA-buf object for
> > >
> > > such memory, because it's just backed by pages (even though they may
> > > not be accessible to the guest; just like in the case of TrustZone
> > > memory protection on bare metal systems),
> > >
> > > - any consumer would be able to handle such buffer like a regular
> > >
> > > guest memory, passing low-level scatter-gather tables to the host as
> > > buffer descriptors - this would nicely integrate with the basic case
> > > without buffer sharing, as described above.
> > >
> > > Another interesting side effect of the above approach would be the
> > > ease of integration with virtio-iommu. If the virtio master device is
> > > put behind a virtio-iommu, the guest page addresses become the input
> > > to iommu page tables and IOVA addresses go to the host via the virtio
> > > master device protocol, inside the low-level scatter-gather tables.
> > >
> > > What do you think?
> > >
> > > Best regards,
> > > Tomasz
> > >
> > > > Another option would be to pass around both buffer handle and buffer
> > > > owner, i.e. instead of "u32 handle" have something like this:
> > > >
> > > > struct buffer_reference {
> > > >
> > > > enum device_type; /* pci, virtio-mmio, ... */
> > > > union device_address {
> > > >
> > > > struct pci_address pci_addr;
> > > > u64 virtio_mmio_addr;
> > > > [ ... ]
> > > >
> > > > };
> > > > u64 device_buffer_handle; /* device-specific, virtio-gpu could
> > > > use
> > > > resource ids here */>
> > > >
> > > > };
> > > >
> > > > cheers,
> > > >
> > > > Gerd
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
--
Dmitry Morozov
Senior Software Engineer
OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin
Phone: +49 30 60 98 54 0 - 910
Fax: +49 30 60 98 54 0 - 99