Hi Marco,

On Friday 20 June 2008, Marco Argiolas wrote:
> Hi Richard,
>
> > You shouldn't have allocated extra memory in the first place. Instead of
> > modifying the whole frame at once, you should instead replace the call to
> > memcpy() in uvc_video_decode_data() and revert the image there.
> > Computation of boundaries and memory offsets might be a bit more
> > difficult, but you will
> > speed up the process and you'll avoid allocating an extra buffer.
>
> Thank you for this suggestion,
> I haven't thought about this solution before.
>
> > The purpose of GFP_ATOMIC is not to speed up memory allocation, but to
> > make sure that memory allocation will not sleep. This is required here, as
> > uvc_video_decode_data() is called in interrupt context where you're not
> > allowed to sleep.
>
> So, in this way I'm sure memory allocation won't sleep and I can say it is
> the fastest way (and also the only one suggested in this case)  to allocate
> memory. Is it right?

A call to kmalloc with GFP_ATOMIC will probably faster than with GFP_KERNEL 
(although I'm not sure there are not some corner cases where the second could 
be faster than the first). This is however not the reason why you can't use 
GFP_KERNEL here: code running in interrupt context is not allowed to sleep, 
so must use GFP_ATOMIC. This is not a speed issue.

> > As explained above, you should modify the existing memcpy() instead of
> > adding a new one. The statement your mention will then not have to be
> > modified. 
>
> Here it is the new patch of "uvc_video_decode_data()" (I was going crazy
> with the computation of all offset) and as attachement you can find the
> modded "uvc_video.c" file:
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> $ diff -u original modded
> --- original    2008-06-20 19:15:28.000000000 +0200
> +++ modded      2008-06-20 19:11:31.000000000 +0200
> @@ -4,6 +4,11 @@
>         struct uvc_video_queue *queue = &video->queue;
>         unsigned int maxlen, nbytes;
>         void *mem;
> +       unsigned int
> +               pixel_size,
> +               to_be_copied,
> +               shift_right,
> +               buf_offset;
>
>         if (len <= 0)
>                 return;
> @@ -12,11 +17,29 @@
>         maxlen = buf->buf.length - buf->buf.bytesused;
>         mem = queue->mem + buf->buf.m.offset + buf->buf.bytesused;
>         nbytes = min((unsigned int)len, maxlen);
> -       memcpy(mem, data, nbytes);
> -       buf->buf.bytesused += nbytes;
>
> -       if (len > maxlen) {
> +       pixel_size = video->streaming->cur_frame->wWidth *
> video->streaming->format->bpp / 8;

pixel_size is misnamed as it stores the size of a whole length of video data.

> +       to_be_copied = 0;
> +       while( nbytes > 0 ) { // Are there other bytes to be copied?

Please respect the kernel coding style (Documentation/CodingStyle). Don't put 
spaces inside parentheses, add a space between 'while' and the opening 
parenthesis, don't put any statement on the same line as the 'else' keyword 
and keep you code 80 characters wide.

> +               shift_right = buf->buf.bytesused % pixel_size;
> +               if ( nbytes > pixel_size - shift_right)
> +                       to_be_copied = pixel_size - shift_right ;
> +               else    to_be_copied = nbytes;
> +
> +               if (shift_right > 0 )
> +                       buf_offset = to_be_copied - shift_right ;
> +               else    buf_offset = pixel_size;
> +               memcpy( queue->mem + buf->buf.m.offset + pixel_size *
> video->streaming->cur_frame->wHeight - buf->buf.bytesused - buf_offset,
> +                       data,
> +                       to_be_copied ) ;

If I'm not mistaken this will only work when the data packets contain entire 
lines of image data. This is not guaranteed. To reverse the image you 
shouldn't use memcpy but instead roll your own memcpy version that will 
decrement the destination pointer instead of incrementing it. It shouldn't be 
more complex than that. Just fill the video buffer backwards starting at the 
end instead of forward starting at the beginning.

> +
> +               data += to_be_copied;
> +               buf->buf.bytesused += to_be_copied ;

In the end all of nbytes will have been copied, so you can increment 
buf->buf.bytesused in one go outside the loop.

> +               nbytes -= to_be_copied;
> +       }
>
> +       if (len >= maxlen) {

Don't modify this (see my previous e-mail).

>                 uvc_trace(UVC_TRACE_FRAME, "Frame complete (overflow).\n");
>                 buf->state = UVC_BUF_STATE_DONE;
>         }
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Unluckly also if now I don't allocate any memory, with bigger resolutions
> the final video seems to be a little bit slower than with the allocation
> (previous solution).

Can you benchmark that ? I haven't 

> Please if you can/need try both my solutions and let me know your
> results!!! Which one is the fastest?
> Please let also me know if you think I made some mistakes that slow down
> this last version of uvcvideo without memory allocation.
>
> Thank you a lot for your help!!
>
> Marco Argiolas


_______________________________________________
Linux-uvc-devel mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/linux-uvc-devel

Reply via email to