Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

Mauro Carvalho Chehab Tue, 17 May 2011 05:49:17 -0700

Em 15-05-2011 18:10, Hans Verkuil escreveu:
> On Saturday, May 14, 2011 13:46:03 Mauro Carvalho Chehab wrote:
>> Em 14-05-2011 13:02, Hans Verkuil escreveu:
>>> On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:
>>
>>>> So, based at all I've seen, I'm pretty much convinced that the normal MMAP
>>>> way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
>>>> are not the best way to share data with framebuffers.
>>>
>>> I agree with that, but it is a different story between two V4L2 devices. 
>>> There
>>> you obviously want to use the streaming ioctls and still share buffers.
>>
>> I don't think so. the requirement for syncing the framebuffer between the two
>> V4L2 devices is pretty much the same as we have with one V4L2 device and one 
>> GPU.
>>
>> On both cases, the requirement is to pass a framebuffer between two 
>> entities, 
>> and not a video stream.
>>
>> For example, imagine something like:
>>
>>      V4L2 camera =====> V4L2 encoder t MPEG2
>>                   ||
>>                   LL==> GPU


For the sake of clarity on my next comments, I'm naming the "V4L2 camera" buffer
write endpoint as "producer" and the 2 buffer read endpoints as "consumers". 
>>
>> Both GPU and the V4L2 encoder should use the same logic to be sure that they 
>> will
>> use a buffer that were filled already by the camera. Also, the V4L2 camera
>> driver can't re-use such framebuffer before being sure that both consumers 
>> has already stopped using it.
> 
> No. A camera whose output is sent to a resizer and then to a SW/FW/HW encoder
> is a typical example where you want to queue/dequeue buffers.

Why? On a framebuffer-oriented set of ioctl's, some kernel internal calls will
need to take care of the buffer usage, in order to be sure when a buffer can
be rewritten, as userspace has no way to know when a buffer needs to be 
queued/dequeued.

In other words, the framebuffer kernel API will probably be using a kernel 
structure like:

struct v4l2_fb_handler {
        bool has_finished;                              /* Marks when a handler 
finishes to handle the buffer */
        bool is_producer;                               /* Used by the handler 
that writes data into the buffer */

        struct list_head *handlers;                     /* List with all 
handlers */

        void (*qbuf)(struct v4l2_fb_handler *handler);  /* qbuf-like callback, 
called after having a buffer filled */

        v4l2_buffer_ID  buf;                            /* Buffer ID (or 
filehandler?) - In practice, it will probably be a list with the available 
buffers */

        void *priv;                                     /* handler priv data */
}

While stream is on, a kernel logic will run a loop, doing basically the steps 
bellow:

        1) Wait for the producer to rise the has_finished flag;

        2) call qbuf() for all consumers. The qbuf() call shouldn't block; it 
just calls 
           a per-handler logic to start using that buffer;

        3) When each fb handler finishes using its buffer, it will rise 
has_finished flag;

        4) After having all buffer handlers marked as has_finished, cleans the 
has_finished
          flags and re-queue the buffer.

Step (2) is equivalent to VIDIOC_QBUF, and step (4) is equivalent to 
VIDIOC_DQBUF.

PS.: The above is just a simplified view of such handler. We'll probably need 
more steps. For
example, between (1) and (2) it may probably need some logic to check if is 
there an available
empty buffer. If not, create a new one and use it.

What happens with REQBUF/QBUF/DQBUF is that:
        - with those calls, there's just one buffer consumer, and just one 
buffer producer;
        - either the producer or the consumer is on userspace, and the other 
pair is
          at kernelspace;
        - buffers are allocated before the start of a process, via an explicit 
call;
        - buffers need to be mmapped, in order to be visible at userspace.

None of the above applies to a framebuffer-oriented API:
        - more than one buffer consumer is allowed;
        - consumers and producers are on kernelspace (it might be needed to 
have an
an API for handling such buffers also on userspace, although it doesn't sound a 
good
idea to me, IMHO);
        - buffers can be dynamically allocated/de-allocated;
        - buffers don't need to be mmapped to userspace.

> Especially since
> the various parts of the pipeline may stall for a bit so you don't want to 
> lose
> frames. That's not what the overlay API is for, that's what our streaming API
> gives us.
> 
> The use case above isn't even possible without copying. At least, I don't see 
> a
> way, unless the GPU buffer is non-destructive. In that case you can give the
> frame to the GPU, and when the GPU is finished you can give it to the encoder.
> I suspect that might become quite complex though.

Well, if some fb consumers would also be rewriting the buffers, serializing 
them is
needed, as you can't allow another process to access a memory that CPU is 
destroying 
at the same time, as you'll have unpredicted images being produced. The easiest
way is to make qbuf() callback block until the end of buffer rewrite, but I
don't think that this is a good idea.

On such situations, it is probably faster and cleaner to just copy data into a
second buffer, keeping the original one preserved.

> Note that many video receivers cannot stall. You can't tell them to wait until
> the last buffer finished processing. This is different from some/most? 
> sensors.
> 
> So if you try to send the input of a video receiver to some device that 
> requires
> syncing which can cause stalls, then that will not work without losing frames.
> Which especially for video encoding is not desirable.

If you're sharing a buffer, kernel should be sure that the shared buffer won't
be rewritten before every shared-buffer consumer doesn't finish handling it.

So, assuming that the producer is generating frames at a rate of, let's say, 30
fps, the slowest consumer should be faster than 1/30 s, otherwise, it will loose
frames.

Yet, if, under certain circumstances (like, for example, input switch from one
source to another, requiring an mpeg2 encoder to re-encode the new scena),
one of the consumer is needing more than 1/30 s, but, at most of the time it 
runs
bellow the 1/30 s, by using dynamic buffer allocation it is still possible of
using shared buffers without loosing frames, if the machine has enough memory
to handle the worse case.

There's one problem with dynamic buffers, however: audio and video sync becomes 
a more complex task. So, we'll end by needing to add audio timestamps at 
kernelspace, at the alsa driver.

Cheers,
Mauro.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

Reply via email to