Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling

2016-09-17 Thread Steve Longerbeam



On 09/16/2016 07:16 AM, Philipp Zabel wrote:

Hi Steve,

thanks for the update.

Am Mittwoch, den 14.09.2016, 18:45 -0700 schrieb Steve Longerbeam:

I added comment headers for all the image conversion prototypes.
It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
include/video/imx-image-convert.h, but let me know if we should put
this somewhere else and/or under Documentation/ somewhere.

I think that is the right place already. imx-image-convert.h could be
renamed to imx-ipu-image-convert.h, to make clear that this is about the
IPU image converter.


Ok, I'll send another update with the name change in the next
version (v7).



+#define MIN_W 128
+#define MIN_H 128

Where does this minimum come from?

Nowhere really :) This is just some sane minimums, to pass
to clamp_align() when aligning input/output width/height in
ipu_image_convert_adjust().

Let's use hardware minimum in the low level code. Sane defaults are for
the V4L2 API. Would that be 8x2 pixels per input tile?


I searched the imx6 reference manual, I can't find any mention
of width/height minimums for the IC. So I suppose 8x2 would be fine,
or maybe 16x8, to account for planar and IRT conversions.




+   if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+   /* this is a rotation operation, just ignore */
+   spin_unlock_irqrestore(>irqlock, flags);
+   return IRQ_HANDLED;
+   }

Why enable the out_chan EOF irq at all when using the IRT mode?

Because (see above), all the IPU resources that might be needed
for any conversion context that is queued to a image conversion
channel (IC task) are acquired when the first context is queued,
including rotation resources. So by acquiring the non-rotation EOF
irq, it will get fielded even for rotation conversions, so we have to
handle it.

There is nothing wrong with acquiring the irq. It could still be
disabled while it is not needed.


It would be difficult to disable the irq. Remember the irq handlers must
field all EOF interrupts in an ipu_image_convert_chan (IC task). If one
context in that channel disables the irq, it will break other runnings
contexts in that channel that are using it.




+/* Adjusts input/output images to IPU restrictions */
+int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
+enum ipu_rotate_mode rot_mode)
+{
+   const struct ipu_ic_pixfmt *infmt, *outfmt;
+   unsigned int num_in_rows, num_in_cols;
+   unsigned int num_out_rows, num_out_cols;
+   u32 w_align, h_align;
+
+   infmt = ipu_ic_get_format(in->pix.pixelformat);
+   outfmt = ipu_ic_get_format(out->pix.pixelformat);
+
+   /* set some defaults if needed */

Is this our task at all?

ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
which should never return EINVAL but should return a supported format
when the passed format is not supported. So I added this here to return
some default pixel formats and width/heights if needed.

I'd prefer to move this into the mem2mem driver try_format, then.


We could move the 0 width/height checks to v4l2, but the pixel
format defaults should probably remain in ipu-image-convert, since
it knows what formats it supports converting to/from.

Steve



Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling

2016-09-17 Thread Steve Longerbeam



On 09/16/2016 07:16 AM, Philipp Zabel wrote:

Hi Steve,

thanks for the update.

Am Mittwoch, den 14.09.2016, 18:45 -0700 schrieb Steve Longerbeam:

I added comment headers for all the image conversion prototypes.
It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
include/video/imx-image-convert.h, but let me know if we should put
this somewhere else and/or under Documentation/ somewhere.

I think that is the right place already. imx-image-convert.h could be
renamed to imx-ipu-image-convert.h, to make clear that this is about the
IPU image converter.


Ok, I'll send another update with the name change in the next
version (v7).



+#define MIN_W 128
+#define MIN_H 128

Where does this minimum come from?

Nowhere really :) This is just some sane minimums, to pass
to clamp_align() when aligning input/output width/height in
ipu_image_convert_adjust().

Let's use hardware minimum in the low level code. Sane defaults are for
the V4L2 API. Would that be 8x2 pixels per input tile?


I searched the imx6 reference manual, I can't find any mention
of width/height minimums for the IC. So I suppose 8x2 would be fine,
or maybe 16x8, to account for planar and IRT conversions.




+   if (ipu_rot_mode_is_irt(ctx->rot_mode)) {
+   /* this is a rotation operation, just ignore */
+   spin_unlock_irqrestore(>irqlock, flags);
+   return IRQ_HANDLED;
+   }

Why enable the out_chan EOF irq at all when using the IRT mode?

Because (see above), all the IPU resources that might be needed
for any conversion context that is queued to a image conversion
channel (IC task) are acquired when the first context is queued,
including rotation resources. So by acquiring the non-rotation EOF
irq, it will get fielded even for rotation conversions, so we have to
handle it.

There is nothing wrong with acquiring the irq. It could still be
disabled while it is not needed.


It would be difficult to disable the irq. Remember the irq handlers must
field all EOF interrupts in an ipu_image_convert_chan (IC task). If one
context in that channel disables the irq, it will break other runnings
contexts in that channel that are using it.




+/* Adjusts input/output images to IPU restrictions */
+int ipu_image_convert_adjust(struct ipu_image *in, struct ipu_image *out,
+enum ipu_rotate_mode rot_mode)
+{
+   const struct ipu_ic_pixfmt *infmt, *outfmt;
+   unsigned int num_in_rows, num_in_cols;
+   unsigned int num_out_rows, num_out_cols;
+   u32 w_align, h_align;
+
+   infmt = ipu_ic_get_format(in->pix.pixelformat);
+   outfmt = ipu_ic_get_format(out->pix.pixelformat);
+
+   /* set some defaults if needed */

Is this our task at all?

ipu_image_convert_adjust() is meant to be called by v4l2 try_format(),
which should never return EINVAL but should return a supported format
when the passed format is not supported. So I added this here to return
some default pixel formats and width/heights if needed.

I'd prefer to move this into the mem2mem driver try_format, then.


We could move the 0 width/height checks to v4l2, but the pixel
format defaults should probably remain in ipu-image-convert, since
it knows what formats it supports converting to/from.

Steve



Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling

2016-09-16 Thread Philipp Zabel
Hi Steve,

thanks for the update.

Am Mittwoch, den 14.09.2016, 18:45 -0700 schrieb Steve Longerbeam:
> Hi Philipp,
> 
> 
> On 09/06/2016 02:26 AM, Philipp Zabel wrote:
> > Hi Steve,
> >
> > Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> >> This patch implements complete image conversion support to ipu-ic,
> >> with tiling to support scaling to and from images up to 4096x4096.
> >> Image rotation is also supported.
> >>
> >> The internal API is subsystem agnostic (no V4L2 dependency except
> >> for the use of V4L2 fourcc pixel formats).
> >>
> >> Callers prepare for image conversion by calling
> >> ipu_image_convert_prepare(), which initializes the parameters of
> >> the conversion.
> > ... and possibly allocates intermediate buffers for rotation support.
> > This should be documented somewhere, with a node that v4l2 users should
> > be doing this during REQBUFS.
> 
> I added comment headers for all the image conversion prototypes.
> It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
> include/video/imx-image-convert.h, but let me know if we should put
> this somewhere else and/or under Documentation/ somewhere.

I think that is the right place already. imx-image-convert.h could be
renamed to imx-ipu-image-convert.h, to make clear that this is about the
IPU image converter.

> >>   The caller passes in the ipu_ic task to use for
> >> the conversion, the input and output image formats, a rotation mode,
> >> and a completion callback and completion context pointer:
> >>
> >> struct image_converter_ctx *
> >> ipu_image_convert_prepare(struct ipu_ic *ic,
> >>struct ipu_image *in, struct ipu_image *out,
> >>enum ipu_rotate_mode rot_mode,
> >>image_converter_cb_t complete,
> >>void *complete_context);
> > As I commented on the other patch, I think the image_convert functions
> > should use a separate handle for the image conversion queues that sit on
> > top of the ipu_ic task handles.
> 
> Here is a new prototype I came up with:
> 
> struct ipu_image_convert_ctx *
> ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
>struct ipu_image *in, struct ipu_image *out,
>enum ipu_rotate_mode rot_mode,
>ipu_image_convert_cb_t complete,
>void *complete_context);
> 
> In other words, the ipu_ic handle is replaced by the IPU handle and IC task
> that are requested for carrying out the conversion.

Looks good to me for now.

> The image converter will acquire the ipu_ic handle internally, whenever 
> there
> are queued contexts to that IC task (which I am calling a 'struct 
> ipu_image_convert_chan').
> This way the IC handle can be shared by all contexts using that IC task. 
> After all
> contexts have been freed from the (struct 
> ipu_image_convert_chan)->ctx_list queue,
> the ipu_ic handle is freed.
> 
> The ipu_ic handle is acquired in get_ipu_resources() and freed in 
> release_ipu_resources(),
> along with all the other IPU resources that *could possibly be needed* 
> in that
> ipu_image_convert_chan by future contexts (*all* idmac channels, *all* 
> irqs).

Ok.

[...]
> >> +#define MIN_W 128
> >> +#define MIN_H 128
> > Where does this minimum come from?
> 
> Nowhere really :) This is just some sane minimums, to pass
> to clamp_align() when aligning input/output width/height in
> ipu_image_convert_adjust().

Let's use hardware minimum in the low level code. Sane defaults are for
the V4L2 API. Would that be 8x2 pixels per input tile?

> >> +struct ic_task_channels {
> >> +  int in;
> >> +  int out;
> >> +  int rot_in;
> >> +  int rot_out;
> >> +  int vdi_in_p;
> >> +  int vdi_in;
> >> +  int vdi_in_n;
> > The vdi channels are unused.
> 
> Well, I'd prefer to keep the VDI channels. It's quite possible we
> can add motion compensated deinterlacing support using the
> PRP_VF task to the image converter in the future.

Indeed.

> >> +struct image_converter_ctx {
> >> +  struct image_converter *cvt;
> >> +
> >> +  image_converter_cb_t complete;
> >> +  void *complete_context;
> >> +
> >> +  /* Source/destination image data and rotation mode */
> >> +  struct ipu_ic_image in;
> >> +  struct ipu_ic_image out;
> >> +  enum ipu_rotate_mode rot_mode;
> >> +
> >> +  /* intermediate buffer for rotation */
> >> +  struct ipu_ic_dma_buf rot_intermediate[2];
> > No need to change it now, but I assume these could be per IC task
> > instead of per context.
> 
> Actually no. The rotation intermediate buffers have the dimension
> of a single tile, so they must remain in the context struct.

I see. The per task intermediate buffer would have to be the maximum
size, so this would only ever make sense when rotating multiple large
RGB streams simultaneously. I think we can reasonably ignore this use
case.

[...]
> >> +  .fourcc = V4L2_PIX_FMT_RGB565,
> >> +  .bpp= 16,
> > bpp is only 

Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling

2016-09-16 Thread Philipp Zabel
Hi Steve,

thanks for the update.

Am Mittwoch, den 14.09.2016, 18:45 -0700 schrieb Steve Longerbeam:
> Hi Philipp,
> 
> 
> On 09/06/2016 02:26 AM, Philipp Zabel wrote:
> > Hi Steve,
> >
> > Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> >> This patch implements complete image conversion support to ipu-ic,
> >> with tiling to support scaling to and from images up to 4096x4096.
> >> Image rotation is also supported.
> >>
> >> The internal API is subsystem agnostic (no V4L2 dependency except
> >> for the use of V4L2 fourcc pixel formats).
> >>
> >> Callers prepare for image conversion by calling
> >> ipu_image_convert_prepare(), which initializes the parameters of
> >> the conversion.
> > ... and possibly allocates intermediate buffers for rotation support.
> > This should be documented somewhere, with a node that v4l2 users should
> > be doing this during REQBUFS.
> 
> I added comment headers for all the image conversion prototypes.
> It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
> include/video/imx-image-convert.h, but let me know if we should put
> this somewhere else and/or under Documentation/ somewhere.

I think that is the right place already. imx-image-convert.h could be
renamed to imx-ipu-image-convert.h, to make clear that this is about the
IPU image converter.

> >>   The caller passes in the ipu_ic task to use for
> >> the conversion, the input and output image formats, a rotation mode,
> >> and a completion callback and completion context pointer:
> >>
> >> struct image_converter_ctx *
> >> ipu_image_convert_prepare(struct ipu_ic *ic,
> >>struct ipu_image *in, struct ipu_image *out,
> >>enum ipu_rotate_mode rot_mode,
> >>image_converter_cb_t complete,
> >>void *complete_context);
> > As I commented on the other patch, I think the image_convert functions
> > should use a separate handle for the image conversion queues that sit on
> > top of the ipu_ic task handles.
> 
> Here is a new prototype I came up with:
> 
> struct ipu_image_convert_ctx *
> ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
>struct ipu_image *in, struct ipu_image *out,
>enum ipu_rotate_mode rot_mode,
>ipu_image_convert_cb_t complete,
>void *complete_context);
> 
> In other words, the ipu_ic handle is replaced by the IPU handle and IC task
> that are requested for carrying out the conversion.

Looks good to me for now.

> The image converter will acquire the ipu_ic handle internally, whenever 
> there
> are queued contexts to that IC task (which I am calling a 'struct 
> ipu_image_convert_chan').
> This way the IC handle can be shared by all contexts using that IC task. 
> After all
> contexts have been freed from the (struct 
> ipu_image_convert_chan)->ctx_list queue,
> the ipu_ic handle is freed.
> 
> The ipu_ic handle is acquired in get_ipu_resources() and freed in 
> release_ipu_resources(),
> along with all the other IPU resources that *could possibly be needed* 
> in that
> ipu_image_convert_chan by future contexts (*all* idmac channels, *all* 
> irqs).

Ok.

[...]
> >> +#define MIN_W 128
> >> +#define MIN_H 128
> > Where does this minimum come from?
> 
> Nowhere really :) This is just some sane minimums, to pass
> to clamp_align() when aligning input/output width/height in
> ipu_image_convert_adjust().

Let's use hardware minimum in the low level code. Sane defaults are for
the V4L2 API. Would that be 8x2 pixels per input tile?

> >> +struct ic_task_channels {
> >> +  int in;
> >> +  int out;
> >> +  int rot_in;
> >> +  int rot_out;
> >> +  int vdi_in_p;
> >> +  int vdi_in;
> >> +  int vdi_in_n;
> > The vdi channels are unused.
> 
> Well, I'd prefer to keep the VDI channels. It's quite possible we
> can add motion compensated deinterlacing support using the
> PRP_VF task to the image converter in the future.

Indeed.

> >> +struct image_converter_ctx {
> >> +  struct image_converter *cvt;
> >> +
> >> +  image_converter_cb_t complete;
> >> +  void *complete_context;
> >> +
> >> +  /* Source/destination image data and rotation mode */
> >> +  struct ipu_ic_image in;
> >> +  struct ipu_ic_image out;
> >> +  enum ipu_rotate_mode rot_mode;
> >> +
> >> +  /* intermediate buffer for rotation */
> >> +  struct ipu_ic_dma_buf rot_intermediate[2];
> > No need to change it now, but I assume these could be per IC task
> > instead of per context.
> 
> Actually no. The rotation intermediate buffers have the dimension
> of a single tile, so they must remain in the context struct.

I see. The per task intermediate buffer would have to be the maximum
size, so this would only ever make sense when rotating multiple large
RGB streams simultaneously. I think we can reasonably ignore this use
case.

[...]
> >> +  .fourcc = V4L2_PIX_FMT_RGB565,
> >> +  .bpp= 16,
> > bpp is only 

Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling

2016-09-14 Thread Steve Longerbeam

Hi Philipp,


On 09/06/2016 02:26 AM, Philipp Zabel wrote:

Hi Steve,

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:

This patch implements complete image conversion support to ipu-ic,
with tiling to support scaling to and from images up to 4096x4096.
Image rotation is also supported.

The internal API is subsystem agnostic (no V4L2 dependency except
for the use of V4L2 fourcc pixel formats).

Callers prepare for image conversion by calling
ipu_image_convert_prepare(), which initializes the parameters of
the conversion.

... and possibly allocates intermediate buffers for rotation support.
This should be documented somewhere, with a node that v4l2 users should
be doing this during REQBUFS.


I added comment headers for all the image conversion prototypes.
It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
include/video/imx-image-convert.h, but let me know if we should put
this somewhere else and/or under Documentation/ somewhere.



  The caller passes in the ipu_ic task to use for
the conversion, the input and output image formats, a rotation mode,
and a completion callback and completion context pointer:

struct image_converter_ctx *
ipu_image_convert_prepare(struct ipu_ic *ic,
   struct ipu_image *in, struct ipu_image *out,
   enum ipu_rotate_mode rot_mode,
   image_converter_cb_t complete,
   void *complete_context);

As I commented on the other patch, I think the image_convert functions
should use a separate handle for the image conversion queues that sit on
top of the ipu_ic task handles.


Here is a new prototype I came up with:

struct ipu_image_convert_ctx *
ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
  struct ipu_image *in, struct ipu_image *out,
  enum ipu_rotate_mode rot_mode,
  ipu_image_convert_cb_t complete,
  void *complete_context);

In other words, the ipu_ic handle is replaced by the IPU handle and IC task
that are requested for carrying out the conversion.

The image converter will acquire the ipu_ic handle internally, whenever 
there
are queued contexts to that IC task (which I am calling a 'struct 
ipu_image_convert_chan').
This way the IC handle can be shared by all contexts using that IC task. 
After all
contexts have been freed from the (struct 
ipu_image_convert_chan)->ctx_list queue,

the ipu_ic handle is freed.

The ipu_ic handle is acquired in get_ipu_resources() and freed in 
release_ipu_resources(),
along with all the other IPU resources that *could possibly be needed* 
in that
ipu_image_convert_chan by future contexts (*all* idmac channels, *all* 
irqs).


I should have done this from the start, instead of allowing multiple 
handles the the IC tasks.

Thanks for pointing this out.




+
+#define MIN_W 128
+#define MIN_H 128

Where does this minimum come from?


Nowhere really :) This is just some sane minimums, to pass
to clamp_align() when aligning input/output width/height in
ipu_image_convert_adjust().


+struct ic_task_channels {
+   int in;
+   int out;
+   int rot_in;
+   int rot_out;
+   int vdi_in_p;
+   int vdi_in;
+   int vdi_in_n;

The vdi channels are unused.


Well, I'd prefer to keep the VDI channels. It's quite possible we
can add motion compensated deinterlacing support using the
PRP_VF task to the image converter in the future.


+struct image_converter_ctx {
+   struct image_converter *cvt;
+
+   image_converter_cb_t complete;
+   void *complete_context;
+
+   /* Source/destination image data and rotation mode */
+   struct ipu_ic_image in;
+   struct ipu_ic_image out;
+   enum ipu_rotate_mode rot_mode;
+
+   /* intermediate buffer for rotation */
+   struct ipu_ic_dma_buf rot_intermediate[2];

No need to change it now, but I assume these could be per IC task
instead of per context.


Actually no. The rotation intermediate buffers have the dimension
of a single tile, so they must remain in the context struct.


+static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
+   {
+   .name   = "RGB565",

Please drop the names, keeping a list of user readable format names is
the v4l2 core's business, not ours.


done.


+   .fourcc = V4L2_PIX_FMT_RGB565,
+   .bpp= 16,

bpp is only ever used in bytes, not bits (always divided by 8).
Why not make this bytes_per_pixel or pixel_stride = 2.


Actually bpp is used to calculate *total* tile sizes and *total* bytes
per line. For the planar 4:2:0 formats that means it must be specified
in bits.



+   }, {
+   .name   = "4:2:0 planar, YUV",
+   .fourcc = V4L2_PIX_FMT_YUV420,
+   .bpp= 12,
+   .y_depth = 8,

y_depth is only ever used in bytes, not bits (always divided by 8).
Why not make this bool planar instead.


sure why not, although I think 

Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling

2016-09-14 Thread Steve Longerbeam

Hi Philipp,


On 09/06/2016 02:26 AM, Philipp Zabel wrote:

Hi Steve,

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:

This patch implements complete image conversion support to ipu-ic,
with tiling to support scaling to and from images up to 4096x4096.
Image rotation is also supported.

The internal API is subsystem agnostic (no V4L2 dependency except
for the use of V4L2 fourcc pixel formats).

Callers prepare for image conversion by calling
ipu_image_convert_prepare(), which initializes the parameters of
the conversion.

... and possibly allocates intermediate buffers for rotation support.
This should be documented somewhere, with a node that v4l2 users should
be doing this during REQBUFS.


I added comment headers for all the image conversion prototypes.
It caused bloat in imx-ipu-v3.h, so I moved it to a new header:
include/video/imx-image-convert.h, but let me know if we should put
this somewhere else and/or under Documentation/ somewhere.



  The caller passes in the ipu_ic task to use for
the conversion, the input and output image formats, a rotation mode,
and a completion callback and completion context pointer:

struct image_converter_ctx *
ipu_image_convert_prepare(struct ipu_ic *ic,
   struct ipu_image *in, struct ipu_image *out,
   enum ipu_rotate_mode rot_mode,
   image_converter_cb_t complete,
   void *complete_context);

As I commented on the other patch, I think the image_convert functions
should use a separate handle for the image conversion queues that sit on
top of the ipu_ic task handles.


Here is a new prototype I came up with:

struct ipu_image_convert_ctx *
ipu_image_convert_prepare(struct ipu_soc *ipu, enum ipu_ic_task ic_task,
  struct ipu_image *in, struct ipu_image *out,
  enum ipu_rotate_mode rot_mode,
  ipu_image_convert_cb_t complete,
  void *complete_context);

In other words, the ipu_ic handle is replaced by the IPU handle and IC task
that are requested for carrying out the conversion.

The image converter will acquire the ipu_ic handle internally, whenever 
there
are queued contexts to that IC task (which I am calling a 'struct 
ipu_image_convert_chan').
This way the IC handle can be shared by all contexts using that IC task. 
After all
contexts have been freed from the (struct 
ipu_image_convert_chan)->ctx_list queue,

the ipu_ic handle is freed.

The ipu_ic handle is acquired in get_ipu_resources() and freed in 
release_ipu_resources(),
along with all the other IPU resources that *could possibly be needed* 
in that
ipu_image_convert_chan by future contexts (*all* idmac channels, *all* 
irqs).


I should have done this from the start, instead of allowing multiple 
handles the the IC tasks.

Thanks for pointing this out.




+
+#define MIN_W 128
+#define MIN_H 128

Where does this minimum come from?


Nowhere really :) This is just some sane minimums, to pass
to clamp_align() when aligning input/output width/height in
ipu_image_convert_adjust().


+struct ic_task_channels {
+   int in;
+   int out;
+   int rot_in;
+   int rot_out;
+   int vdi_in_p;
+   int vdi_in;
+   int vdi_in_n;

The vdi channels are unused.


Well, I'd prefer to keep the VDI channels. It's quite possible we
can add motion compensated deinterlacing support using the
PRP_VF task to the image converter in the future.


+struct image_converter_ctx {
+   struct image_converter *cvt;
+
+   image_converter_cb_t complete;
+   void *complete_context;
+
+   /* Source/destination image data and rotation mode */
+   struct ipu_ic_image in;
+   struct ipu_ic_image out;
+   enum ipu_rotate_mode rot_mode;
+
+   /* intermediate buffer for rotation */
+   struct ipu_ic_dma_buf rot_intermediate[2];

No need to change it now, but I assume these could be per IC task
instead of per context.


Actually no. The rotation intermediate buffers have the dimension
of a single tile, so they must remain in the context struct.


+static const struct ipu_ic_pixfmt ipu_ic_formats[] = {
+   {
+   .name   = "RGB565",

Please drop the names, keeping a list of user readable format names is
the v4l2 core's business, not ours.


done.


+   .fourcc = V4L2_PIX_FMT_RGB565,
+   .bpp= 16,

bpp is only ever used in bytes, not bits (always divided by 8).
Why not make this bytes_per_pixel or pixel_stride = 2.


Actually bpp is used to calculate *total* tile sizes and *total* bytes
per line. For the planar 4:2:0 formats that means it must be specified
in bits.



+   }, {
+   .name   = "4:2:0 planar, YUV",
+   .fourcc = V4L2_PIX_FMT_YUV420,
+   .bpp= 12,
+   .y_depth = 8,

y_depth is only ever used in bytes, not bits (always divided by 8).
Why not make this bool planar instead.


sure why not, although I think 

Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling

2016-09-06 Thread Philipp Zabel
Hi Steve,

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> This patch implements complete image conversion support to ipu-ic,
> with tiling to support scaling to and from images up to 4096x4096.
> Image rotation is also supported.
> 
> The internal API is subsystem agnostic (no V4L2 dependency except
> for the use of V4L2 fourcc pixel formats).
> 
> Callers prepare for image conversion by calling
> ipu_image_convert_prepare(), which initializes the parameters of
> the conversion.

... and possibly allocates intermediate buffers for rotation support.
This should be documented somewhere, with a node that v4l2 users should
be doing this during REQBUFS.

>  The caller passes in the ipu_ic task to use for
> the conversion, the input and output image formats, a rotation mode,
> and a completion callback and completion context pointer:
> 
> struct image_converter_ctx *
> ipu_image_convert_prepare(struct ipu_ic *ic,
>   struct ipu_image *in, struct ipu_image *out,
>   enum ipu_rotate_mode rot_mode,
>   image_converter_cb_t complete,
>   void *complete_context);

As I commented on the other patch, I think the image_convert functions
should use a separate handle for the image conversion queues that sit on
top of the ipu_ic task handles.

> The caller is given a new conversion context that must be passed to
> the further APIs:
> 
> struct image_converter_run *
> ipu_image_convert_run(struct image_converter_ctx *ctx,
>   dma_addr_t in_phys, dma_addr_t out_phys);
> 
> This queues a new image conversion request to a run queue, and
> starts the conversion immediately if the run queue is empty. Only
> the physaddr's of the input and output image buffers are needed,
> since the conversion context was created previously with
> ipu_image_convert_prepare(). Returns a new run object pointer. When
> the conversion completes, the run pointer is returned to the
> completion callback.
>
> void image_convert_abort(struct image_converter_ctx *ctx);
> 
> This will abort any active or pending conversions for this context.
> Any currently active or pending runs belonging to this context are
> returned via the completion callback with an error status.
>
> void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
> 
> Unprepares the conversion context. Any active or pending runs will
> be aborted by calling image_convert_abort().
> 
> Signed-off-by: Steve Longerbeam 
> 
> ---
> 
> v4:
> - do away with struct ipu_ic_tile_off, and move tile offsets into
>   struct ipu_ic_tile. This paves the way for possibly allowing for
>   each tile to have different dimensions in the future.

Thanks, this looks a lot better to me.

> v3: no changes
> v2: no changes
> ---
>  drivers/gpu/ipu-v3/ipu-ic.c | 1694 
> ++-
>  include/video/imx-ipu-v3.h  |   57 +-
>  2 files changed, 1739 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
> index 1a37afc..01b1b56 100644
> --- a/drivers/gpu/ipu-v3/ipu-ic.c
> +++ b/drivers/gpu/ipu-v3/ipu-ic.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include "ipu-prv.h"
>  
>  /* IC Register Offsets */
> @@ -82,6 +84,40 @@
>  #define IC_IDMAC_3_PP_WIDTH_MASK(0x3ff << 20)
>  #define IC_IDMAC_3_PP_WIDTH_OFFSET  20
>  
> +/*
> + * The IC Resizer has a restriction that the output frame from the
> + * resizer must be 1024 or less in both width (pixels) and height
> + * (lines).
> + *
> + * The image conversion support attempts to split up a conversion when
> + * the desired output (converted) frame resolution exceeds the IC resizer
> + * limit of 1024 in either dimension.
> + *
> + * If either dimension of the output frame exceeds the limit, the
> + * dimension is split into 1, 2, or 4 equal stripes, for a maximum
> + * of 4*4 or 16 tiles. A conversion is then carried out for each
> + * tile (but taking care to pass the full frame stride length to
> + * the DMA channel's parameter memory!). IDMA double-buffering is used
> + * to convert each tile back-to-back when possible (see note below
> + * when double_buffering boolean is set).
> + *
> + * Note that the input frame must be split up into the same number
> + * of tiles as the output frame.
> + */
> +#define MAX_STRIPES_W4
> +#define MAX_STRIPES_H4
> +#define MAX_TILES (MAX_STRIPES_W * MAX_STRIPES_H)
> +
> +#define MIN_W 128
> +#define MIN_H 128

Where does this minimum come from?

> +#define MAX_W 4096
> +#define MAX_H 4096
> +
> +enum image_convert_type {
> + IMAGE_CONVERT_IN = 0,
> + IMAGE_CONVERT_OUT,
> +};
> +
>  struct ic_task_regoffs {
>   u32 rsc;
>   u32 tpmem_csc[2];
> @@ -96,6 +132,16 @@ struct ic_task_bitfields {
>   u32 ic_cmb_galpha_bit;
>  };
>  
> +struct ic_task_channels {
> + int in;
> + 

Re: [PATCH v4 3/4] gpu: ipu-ic: Add complete image conversion support with tiling

2016-09-06 Thread Philipp Zabel
Hi Steve,

Am Mittwoch, den 17.08.2016, 17:50 -0700 schrieb Steve Longerbeam:
> This patch implements complete image conversion support to ipu-ic,
> with tiling to support scaling to and from images up to 4096x4096.
> Image rotation is also supported.
> 
> The internal API is subsystem agnostic (no V4L2 dependency except
> for the use of V4L2 fourcc pixel formats).
> 
> Callers prepare for image conversion by calling
> ipu_image_convert_prepare(), which initializes the parameters of
> the conversion.

... and possibly allocates intermediate buffers for rotation support.
This should be documented somewhere, with a node that v4l2 users should
be doing this during REQBUFS.

>  The caller passes in the ipu_ic task to use for
> the conversion, the input and output image formats, a rotation mode,
> and a completion callback and completion context pointer:
> 
> struct image_converter_ctx *
> ipu_image_convert_prepare(struct ipu_ic *ic,
>   struct ipu_image *in, struct ipu_image *out,
>   enum ipu_rotate_mode rot_mode,
>   image_converter_cb_t complete,
>   void *complete_context);

As I commented on the other patch, I think the image_convert functions
should use a separate handle for the image conversion queues that sit on
top of the ipu_ic task handles.

> The caller is given a new conversion context that must be passed to
> the further APIs:
> 
> struct image_converter_run *
> ipu_image_convert_run(struct image_converter_ctx *ctx,
>   dma_addr_t in_phys, dma_addr_t out_phys);
> 
> This queues a new image conversion request to a run queue, and
> starts the conversion immediately if the run queue is empty. Only
> the physaddr's of the input and output image buffers are needed,
> since the conversion context was created previously with
> ipu_image_convert_prepare(). Returns a new run object pointer. When
> the conversion completes, the run pointer is returned to the
> completion callback.
>
> void image_convert_abort(struct image_converter_ctx *ctx);
> 
> This will abort any active or pending conversions for this context.
> Any currently active or pending runs belonging to this context are
> returned via the completion callback with an error status.
>
> void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
> 
> Unprepares the conversion context. Any active or pending runs will
> be aborted by calling image_convert_abort().
> 
> Signed-off-by: Steve Longerbeam 
> 
> ---
> 
> v4:
> - do away with struct ipu_ic_tile_off, and move tile offsets into
>   struct ipu_ic_tile. This paves the way for possibly allowing for
>   each tile to have different dimensions in the future.

Thanks, this looks a lot better to me.

> v3: no changes
> v2: no changes
> ---
>  drivers/gpu/ipu-v3/ipu-ic.c | 1694 
> ++-
>  include/video/imx-ipu-v3.h  |   57 +-
>  2 files changed, 1739 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
> index 1a37afc..01b1b56 100644
> --- a/drivers/gpu/ipu-v3/ipu-ic.c
> +++ b/drivers/gpu/ipu-v3/ipu-ic.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include "ipu-prv.h"
>  
>  /* IC Register Offsets */
> @@ -82,6 +84,40 @@
>  #define IC_IDMAC_3_PP_WIDTH_MASK(0x3ff << 20)
>  #define IC_IDMAC_3_PP_WIDTH_OFFSET  20
>  
> +/*
> + * The IC Resizer has a restriction that the output frame from the
> + * resizer must be 1024 or less in both width (pixels) and height
> + * (lines).
> + *
> + * The image conversion support attempts to split up a conversion when
> + * the desired output (converted) frame resolution exceeds the IC resizer
> + * limit of 1024 in either dimension.
> + *
> + * If either dimension of the output frame exceeds the limit, the
> + * dimension is split into 1, 2, or 4 equal stripes, for a maximum
> + * of 4*4 or 16 tiles. A conversion is then carried out for each
> + * tile (but taking care to pass the full frame stride length to
> + * the DMA channel's parameter memory!). IDMA double-buffering is used
> + * to convert each tile back-to-back when possible (see note below
> + * when double_buffering boolean is set).
> + *
> + * Note that the input frame must be split up into the same number
> + * of tiles as the output frame.
> + */
> +#define MAX_STRIPES_W4
> +#define MAX_STRIPES_H4
> +#define MAX_TILES (MAX_STRIPES_W * MAX_STRIPES_H)
> +
> +#define MIN_W 128
> +#define MIN_H 128

Where does this minimum come from?

> +#define MAX_W 4096
> +#define MAX_H 4096
> +
> +enum image_convert_type {
> + IMAGE_CONVERT_IN = 0,
> + IMAGE_CONVERT_OUT,
> +};
> +
>  struct ic_task_regoffs {
>   u32 rsc;
>   u32 tpmem_csc[2];
> @@ -96,6 +132,16 @@ struct ic_task_bitfields {
>   u32 ic_cmb_galpha_bit;
>  };
>  
> +struct ic_task_channels {
> + int in;
> + int out;
> + int rot_in;
>