Tegra DRM device tree bindings

2012-07-08 Thread Thierry Reding
On Fri, Jul 06, 2012 at 01:59:13PM -0600, Stephen Warren wrote:
> On 07/05/2012 06:15 AM, Thierry Reding wrote:
> > Here's a new proposal that should address all issues collected
> > during the discussion of the previous one:
> > 
> > From tegra20.dtsi:
> ...
> 
> At a quick glance, that all seems pretty reasonable now.
> 
> > One problem I've come across when trying to get some rudimentary
> > code working with this is that there's no longer a device which the
> > DRM driver can bind to, because the top-level device (host1x) now
> > has a separate driver.
> 
> Can't you just have the host1x driver trigger the instantiation of the
> DRM driver? In other words, the host1x node is both the plain host1x
> driver and the DRM driver. So, after initializing anything for host1x
> itself, just end host1x's probe() with something a call to
> drm_platform_init(), passing in host1x's own pdev.
> 
> After all, it seems like the whole point of representing host1x in the
> DT is to expose the graphics HW, so you'd need the DRM device in that
> case.

The reason that I've been hesitating is that host1x isn't related only
to graphics HW but is also required to make f.e. video input work. So I
can imagine a setup where for instance you need only video input and
send captured data via the network. If host1x registered the DRM driver
it would imply that setups that don't require any output would still
have the whole DRM initialized.

I did post a proposal to solve this problem, which doesn't seem Tegra
specific, to dri-devel last week. But if nobody else thinks it's a
problem to use host1x then I'll just use that as the easy way out. If
this happens to cause problems at some point in the future we can always
address it then. If host1x is tightly coupled to DRM then I think we can
also keep the driver with the rest of the DRM code.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Re: Tegra DRM device tree bindings

2012-07-08 Thread Thierry Reding
On Fri, Jul 06, 2012 at 01:59:13PM -0600, Stephen Warren wrote:
 On 07/05/2012 06:15 AM, Thierry Reding wrote:
  Here's a new proposal that should address all issues collected
  during the discussion of the previous one:
  
  From tegra20.dtsi:
 ...
 
 At a quick glance, that all seems pretty reasonable now.
 
  One problem I've come across when trying to get some rudimentary
  code working with this is that there's no longer a device which the
  DRM driver can bind to, because the top-level device (host1x) now
  has a separate driver.
 
 Can't you just have the host1x driver trigger the instantiation of the
 DRM driver? In other words, the host1x node is both the plain host1x
 driver and the DRM driver. So, after initializing anything for host1x
 itself, just end host1x's probe() with something a call to
 drm_platform_init(), passing in host1x's own pdev.
 
 After all, it seems like the whole point of representing host1x in the
 DT is to expose the graphics HW, so you'd need the DRM device in that
 case.

The reason that I've been hesitating is that host1x isn't related only
to graphics HW but is also required to make f.e. video input work. So I
can imagine a setup where for instance you need only video input and
send captured data via the network. If host1x registered the DRM driver
it would imply that setups that don't require any output would still
have the whole DRM initialized.

I did post a proposal to solve this problem, which doesn't seem Tegra
specific, to dri-devel last week. But if nobody else thinks it's a
problem to use host1x then I'll just use that as the easy way out. If
this happens to cause problems at some point in the future we can always
address it then. If host1x is tightly coupled to DRM then I think we can
also keep the driver with the rest of the DRM code.

Thierry


pgpRdiZzQIp5W.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Tegra DRM device tree bindings

2012-07-06 Thread Stephen Warren
On 07/05/2012 06:15 AM, Thierry Reding wrote:
> Here's a new proposal that should address all issues collected
> during the discussion of the previous one:
> 
> From tegra20.dtsi:
...

At a quick glance, that all seems pretty reasonable now.

> One problem I've come across when trying to get some rudimentary
> code working with this is that there's no longer a device which the
> DRM driver can bind to, because the top-level device (host1x) now
> has a separate driver.

Can't you just have the host1x driver trigger the instantiation of the
DRM driver? In other words, the host1x node is both the plain host1x
driver and the DRM driver. So, after initializing anything for host1x
itself, just end host1x's probe() with something a call to
drm_platform_init(), passing in host1x's own pdev.

After all, it seems like the whole point of representing host1x in the
DT is to expose the graphics HW, so you'd need the DRM device in that
case.


Tegra DRM device tree bindings

2012-07-06 Thread Mark Zhang
> From: linux-tegra-owner at vger.kernel.org [mailto:linux-tegra-
> owner at vger.kernel.org] On Behalf Of Thierry Reding
> Sent: Thursday, July 05, 2012 8:15 PM
> To: linux-tegra at vger.kernel.org
> Cc: devicetree-discuss at lists.ozlabs.org; dri-devel at lists.freedesktop.org
> Subject: Re: Tegra DRM device tree bindings
> 
> * PGP Signed by an unknown key
> 
> Here's a new proposal that should address all issues collected during the
> discussion of the previous one:
> 
> From tegra20.dtsi:
> 
>   host1x {
>   compatible = "nvidia,tegra20-host1x", "simple-bus";
>   reg = <0x5000 0x00024000>;
>   interrupts = <0 65 0x04   /* mpcore syncpt */
> 0 67 0x04>; /* mpcore general */
> 
>   #address-cells = <1>;
>   #size-cells = <1>;
> 
>   ranges = <0x5400 0x5400 0x0400>;
> 
>   /* video-encoding/decoding */
>   mpe {
>   reg = <0x5404 0x0004>;
>   interrupts = <0 68 0x04>;
>   };
> 
>   /* video input */
>   vi {
>   reg = <0x5408 0x0004>;
>   interrupts = <0 69 0x04>;
>   };
> 
>   /* EPP */
>   epp {
>   reg = <0x540c 0x0004>;
>   interrupts = <0 70 0x04>;
>   };
> 
>   /* ISP */
>   isp {
>   reg = <0x5410 0x0004>;
>   interrupts = <0 71 0x04>;
>   };
> 
>   /* 2D engine */
>   gr2d {
>   reg = <0x5414 0x0004>;
>   interrupts = <0 72 0x04>;
>   };
> 
>   /* 3D engine */
>   gr3d {
>   reg = <0x5418 0x0004>;
>   };
> 
>   /* display controllers */
>   dc at 5420 {
>   compatible = "nvidia,tegra20-dc";
>   reg = <0x5420 0x0004>;
>   interrupts = <0 73 0x04>;
> 
>   rgb {
>   status = "disabled";
>   };
>   };
> 
>   dc at 5424 {
>   compatible = "nvidia,tegra20-dc";
>   reg = <0x5424 0x0004>;
>   interrupts = <0 74 0x04>;
> 
>   rgb {
>   status = "disabled";
>   };
>   };
> 
>   /* outputs */
>   hdmi {
>   compatible = "nvidia,tegra20-hdmi";
>   reg = <0x5428 0x0004>;
>   interrupts = <0 75 0x04>;
>   status = "disabled";
>   };
> 
>   tvo {
>   compatible = "nvidia,tegra20-tvo";
>   reg = <0x542c 0x0004>;
>   interrupts = <0 76 0x04>;
>   status = "disabled";
>   };
> 
>   dsi {
>   compatible = "nvidia,tegra20-dsi";
>   reg = <0x5430 0x0004>;
>   status = "disabled";
>   };
>   };
> 
> From tegra20-medcom.dts:
> 
>   host1x {
>   dc at 5420 {
>   rgb {
>   nvidia,edid = /incbin/("tegra20-medcom.edid");
>   status = "okay";
>   };
>   };
>   };
> 
> From tegra20-plutux.dts:
> 
>   host1x {
>   hdmi {
>   vdd-supply = <_reg>;
>   pll-supply = <_reg>;
> 
>   nvidia,hpd-gpio = < 111 0>; /* PN7 */
>   nvidia,ddc-i2c-bus = <>;
> 
>   status = "okay";
>   };
>   };
> 
> One problem I've come across when trying to get some rudimentary code
> working with this is that there's no longer a device which the DRM driver can 
> bind
> to, because the top-level device (host1x) now has a separate driver.

Re: Tegra DRM device tree bindings

2012-07-06 Thread Stephen Warren
On 07/05/2012 06:15 AM, Thierry Reding wrote:
 Here's a new proposal that should address all issues collected
 during the discussion of the previous one:
 
 From tegra20.dtsi:
...

At a quick glance, that all seems pretty reasonable now.

 One problem I've come across when trying to get some rudimentary
 code working with this is that there's no longer a device which the
 DRM driver can bind to, because the top-level device (host1x) now
 has a separate driver.

Can't you just have the host1x driver trigger the instantiation of the
DRM driver? In other words, the host1x node is both the plain host1x
driver and the DRM driver. So, after initializing anything for host1x
itself, just end host1x's probe() with something a call to
drm_platform_init(), passing in host1x's own pdev.

After all, it seems like the whole point of representing host1x in the
DT is to expose the graphics HW, so you'd need the DRM device in that
case.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Tegra DRM device tree bindings

2012-07-05 Thread Thierry Reding
Here's a new proposal that should address all issues collected during
the discussion of the previous one:


Re: Tegra DRM device tree bindings

2012-07-05 Thread Thierry Reding
Here's a new proposal that should address all issues collected during
the discussion of the previous one:

From tegra20.dtsi:

host1x {
compatible = nvidia,tegra20-host1x, simple-bus;
reg = 0x5000 0x00024000;
interrupts = 0 65 0x04   /* mpcore syncpt */
  0 67 0x04; /* mpcore general */

#address-cells = 1;
#size-cells = 1;

ranges = 0x5400 0x5400 0x0400;

/* video-encoding/decoding */
mpe {
reg = 0x5404 0x0004;
interrupts = 0 68 0x04;
};

/* video input */
vi {
reg = 0x5408 0x0004;
interrupts = 0 69 0x04;
};

/* EPP */
epp {
reg = 0x540c 0x0004;
interrupts = 0 70 0x04;
};

/* ISP */
isp {
reg = 0x5410 0x0004;
interrupts = 0 71 0x04;
};

/* 2D engine */
gr2d {
reg = 0x5414 0x0004;
interrupts = 0 72 0x04;
};

/* 3D engine */
gr3d {
reg = 0x5418 0x0004;
};

/* display controllers */
dc@5420 {
compatible = nvidia,tegra20-dc;
reg = 0x5420 0x0004;
interrupts = 0 73 0x04;

rgb {
status = disabled;
};
};

dc@5424 {
compatible = nvidia,tegra20-dc;
reg = 0x5424 0x0004;
interrupts = 0 74 0x04;

rgb {
status = disabled;
};
};

/* outputs */
hdmi {
compatible = nvidia,tegra20-hdmi;
reg = 0x5428 0x0004;
interrupts = 0 75 0x04;
status = disabled;
};

tvo {
compatible = nvidia,tegra20-tvo;
reg = 0x542c 0x0004;
interrupts = 0 76 0x04;
status = disabled;
};

dsi {
compatible = nvidia,tegra20-dsi;
reg = 0x5430 0x0004;
status = disabled;
};
};

From tegra20-medcom.dts:

host1x {
dc@5420 {
rgb {
nvidia,edid = /incbin/(tegra20-medcom.edid);
status = okay;
};
};
};

From tegra20-plutux.dts:

host1x {
hdmi {
vdd-supply = ldo7_reg;
pll-supply = ldo8_reg;

nvidia,hpd-gpio = gpio 111 0; /* PN7 */
nvidia,ddc-i2c-bus = ddc;

status = okay;
};
};

One problem I've come across when trying to get some rudimentary code
working with this is that there's no longer a device which the DRM
driver can bind to, because the top-level device (host1x) now has a
separate driver.

I've briefly discussed this with Dave Airlie on IRC and there are
basically two solutions:

  a) Use a fake platform device, instantiated from the device tree or at
 module initialization time, so that the drm_platform_init()
 function can be used.

  b) Implement a custom drm_bus and reimplement most of the
 functionality provided by drm_platform_init().

Alternative a) is what Sascha has been doing for i.MX and is the most
straightforward. One variant of this would be to add another virtual
device node below host1x that groups gr2d, gr3d, both dcs and the three
output nodes. The down-side of this being that it doesn't describe the
hardware of course, and therefore really shouldn't go into the device
tree. Instantiating at module initalization time has the disadvantage
that the parent-child relationship is lost.

Using b) isn't optimal either because DRM requires at least a struct
device which can be associated with the drm_device. So the question is
which struct device that should be?

Thierry


pgp2yZUyYBbMq.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: Tegra DRM device tree bindings

2012-07-05 Thread Mark Zhang
 From: linux-tegra-ow...@vger.kernel.org [mailto:linux-tegra-
 ow...@vger.kernel.org] On Behalf Of Thierry Reding
 Sent: Thursday, July 05, 2012 8:15 PM
 To: linux-te...@vger.kernel.org
 Cc: devicetree-disc...@lists.ozlabs.org; dri-devel@lists.freedesktop.org
 Subject: Re: Tegra DRM device tree bindings
 
 * PGP Signed by an unknown key
 
 Here's a new proposal that should address all issues collected during the
 discussion of the previous one:
 
 From tegra20.dtsi:
 
   host1x {
   compatible = nvidia,tegra20-host1x, simple-bus;
   reg = 0x5000 0x00024000;
   interrupts = 0 65 0x04   /* mpcore syncpt */
 0 67 0x04; /* mpcore general */
 
   #address-cells = 1;
   #size-cells = 1;
 
   ranges = 0x5400 0x5400 0x0400;
 
   /* video-encoding/decoding */
   mpe {
   reg = 0x5404 0x0004;
   interrupts = 0 68 0x04;
   };
 
   /* video input */
   vi {
   reg = 0x5408 0x0004;
   interrupts = 0 69 0x04;
   };
 
   /* EPP */
   epp {
   reg = 0x540c 0x0004;
   interrupts = 0 70 0x04;
   };
 
   /* ISP */
   isp {
   reg = 0x5410 0x0004;
   interrupts = 0 71 0x04;
   };
 
   /* 2D engine */
   gr2d {
   reg = 0x5414 0x0004;
   interrupts = 0 72 0x04;
   };
 
   /* 3D engine */
   gr3d {
   reg = 0x5418 0x0004;
   };
 
   /* display controllers */
   dc@5420 {
   compatible = nvidia,tegra20-dc;
   reg = 0x5420 0x0004;
   interrupts = 0 73 0x04;
 
   rgb {
   status = disabled;
   };
   };
 
   dc@5424 {
   compatible = nvidia,tegra20-dc;
   reg = 0x5424 0x0004;
   interrupts = 0 74 0x04;
 
   rgb {
   status = disabled;
   };
   };
 
   /* outputs */
   hdmi {
   compatible = nvidia,tegra20-hdmi;
   reg = 0x5428 0x0004;
   interrupts = 0 75 0x04;
   status = disabled;
   };
 
   tvo {
   compatible = nvidia,tegra20-tvo;
   reg = 0x542c 0x0004;
   interrupts = 0 76 0x04;
   status = disabled;
   };
 
   dsi {
   compatible = nvidia,tegra20-dsi;
   reg = 0x5430 0x0004;
   status = disabled;
   };
   };
 
 From tegra20-medcom.dts:
 
   host1x {
   dc@5420 {
   rgb {
   nvidia,edid = /incbin/(tegra20-medcom.edid);
   status = okay;
   };
   };
   };
 
 From tegra20-plutux.dts:
 
   host1x {
   hdmi {
   vdd-supply = ldo7_reg;
   pll-supply = ldo8_reg;
 
   nvidia,hpd-gpio = gpio 111 0; /* PN7 */
   nvidia,ddc-i2c-bus = ddc;
 
   status = okay;
   };
   };
 
 One problem I've come across when trying to get some rudimentary code
 working with this is that there's no longer a device which the DRM driver can 
 bind
 to, because the top-level device (host1x) now has a separate driver.
 
 I've briefly discussed this with Dave Airlie on IRC and there are basically 
 two
 solutions:
 
   a) Use a fake platform device, instantiated from the device tree or at
  module initialization time, so that the drm_platform_init()
  function can be used.
 
If we change the dts to this version, that means we should register all 
drivers(dc/rgb/dsi...) during host1x driver initialization.
In that case, I think we can register drm driver as well.
It's not a good idea to add drm into dts cause just as you said, it's not a 
real device at all.

   b) Implement a custom drm_bus and reimplement most of the
  functionality provided by drm_platform_init().
 
 Alternative a) is what Sascha has been doing for i.MX and is the most
 straightforward. One variant of this would be to add another virtual device 
 node
 below host1x that groups gr2d, gr3d, both dcs and the three output nodes. The
 down-side of this being that it doesn't

Tegra DRM device tree bindings

2012-07-01 Thread Lucas Stach
Am Samstag, den 30.06.2012, 20:01 +0200 schrieb Thierry Reding:
> On Fri, Jun 29, 2012 at 04:20:31PM +0300, Terje Bergstr?m wrote:
> > On 28.06.2012 20:19, Lucas Stach wrote:
> > > TTM though solves more advanced matters, like buffer synchronisation
> > > between 3D and 2D block of hardware or syncing buffer access between GPU
> > > and CPU.
> > > One of the most interesting things of TTM is the ability to purge the
> > > GPU DMA buffers to scattered sysmem or even swap them out, if they are
> > > not currently used by the GPU. It then makes sure to move them in the
> > > contig space again when the GPU really needs them and fix up the GPU
> > > command stream with the new buffer address.
> > 
> > We preferably should choose dma_buf as a common interface towards
> > buffers. That way whatever we choose as the memory manager, all dma_buf
> > aware drivers will be able to use buffers allocated by other drivers.
> > 
> > We probably need to accommodate multiple memory managers to take care of
> > legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we
> > have the possibility to do zero-copy video without forcing everybody to
> > use the same memory manager.
> 
> I agree. Supporting DMA BUF also doesn't seem very difficult.
> 
> > As I understand, TTM is good for platforms where we have a separate
> > frame buffer memory, as is the case with most of the graphics cards. In
> > Tegra, graphics and CPU occupy the same memory, so I'm not sure if we
> > require the level of functionality that TTM provides. I guess the level
> > of functionality and the complexity that it brings is one reason why TTM
> > hasn't really caught on in the ARM world.
> > 
> > The synchronization primitives attached to TTM are slightly confusing.
> > At the bottom level, it's operations which need to be synchronized
> > between each other. That's the API level that we should to export from
> > kernel to user space. It's then up to libdrm level (or whatever is doing
> > the rendering in user space) to decide which operations it wants to have
> > completed before a buffer can be reused/read/passed on to the next stage.
> > 
> > Anyway, if we hide the memory manager behind dma_buf, we're free to muck
> > around with multiple of them and see what works best.
> 
> Exactly. Other subthreads echo this as well. Using CMA seems the easiest
> and most flexible for now but still covers everything we need. If it
> turns out that it isn't suited for more advanced stuff once we start
> supporting 3D then we can still opt for something like TTM.
> 
As working code is the primary goal, I would say go for it. I still
think TTM is the way to go, even for simple things like a DRM
framebuffer driver, but as CMA and TTM won't collide in their goals it
should be easy to put TTM in there after we have something going with
CMA.

Lucas



Tegra DRM device tree bindings

2012-07-01 Thread Lucas Stach
Am Freitag, den 29.06.2012, 16:20 +0300 schrieb Terje Bergstr?m:
> On 28.06.2012 20:19, Lucas Stach wrote:
> > TTM though solves more advanced matters, like buffer synchronisation
> > between 3D and 2D block of hardware or syncing buffer access between GPU
> > and CPU.
> > One of the most interesting things of TTM is the ability to purge the
> > GPU DMA buffers to scattered sysmem or even swap them out, if they are
> > not currently used by the GPU. It then makes sure to move them in the
> > contig space again when the GPU really needs them and fix up the GPU
> > command stream with the new buffer address.
> 
> We preferably should choose dma_buf as a common interface towards
> buffers. That way whatever we choose as the memory manager, all dma_buf
> aware drivers will be able to use buffers allocated by other drivers.
> 
> We probably need to accommodate multiple memory managers to take care of
> legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we
> have the possibility to do zero-copy video without forcing everybody to
> use the same memory manager.
> 
> As I understand, TTM is good for platforms where we have a separate
> frame buffer memory, as is the case with most of the graphics cards. In
> Tegra, graphics and CPU occupy the same memory, so I'm not sure if we
> require the level of functionality that TTM provides. I guess the level
> of functionality and the complexity that it brings is one reason why TTM
> hasn't really caught on in the ARM world.
> 
I understand that TTM looks like a big complex beast at first sight, but
trying to understand how it works avoids reinventing the wheel over and
over again. We still have to solve problems like cache invalidation,
synchronization and swap-out of dma buffers, which is a lot easier if we
go with a common framework. 

> The synchronization primitives attached to TTM are slightly confusing.
> At the bottom level, it's operations which need to be synchronized
> between each other. That's the API level that we should to export from
> kernel to user space. It's then up to libdrm level (or whatever is doing
> the rendering in user space) to decide which operations it wants to have
> completed before a buffer can be reused/read/passed on to the next stage.
> 
That's exactly the level we are providing to userspace from other
drivers using TTM like radeon or nouveau.

> Anyway, if we hide the memory manager behind dma_buf, we're free to muck
> around with multiple of them and see what works best.
> 
dma_buf at the current level is only a way to share buffers and does not
provide enough information about the buffer to be useful as an
abstraction level on top of multiple memory managers. But I agree that
we should try to get dma_buf integration right from the start, as the
zero-copy share a very useful thing to have.

Lucas



Re: Tegra DRM device tree bindings

2012-07-01 Thread Lucas Stach
Am Freitag, den 29.06.2012, 16:20 +0300 schrieb Terje Bergström:
 On 28.06.2012 20:19, Lucas Stach wrote:
  TTM though solves more advanced matters, like buffer synchronisation
  between 3D and 2D block of hardware or syncing buffer access between GPU
  and CPU.
  One of the most interesting things of TTM is the ability to purge the
  GPU DMA buffers to scattered sysmem or even swap them out, if they are
  not currently used by the GPU. It then makes sure to move them in the
  contig space again when the GPU really needs them and fix up the GPU
  command stream with the new buffer address.
 
 We preferably should choose dma_buf as a common interface towards
 buffers. That way whatever we choose as the memory manager, all dma_buf
 aware drivers will be able to use buffers allocated by other drivers.
 
 We probably need to accommodate multiple memory managers to take care of
 legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we
 have the possibility to do zero-copy video without forcing everybody to
 use the same memory manager.
 
 As I understand, TTM is good for platforms where we have a separate
 frame buffer memory, as is the case with most of the graphics cards. In
 Tegra, graphics and CPU occupy the same memory, so I'm not sure if we
 require the level of functionality that TTM provides. I guess the level
 of functionality and the complexity that it brings is one reason why TTM
 hasn't really caught on in the ARM world.
 
I understand that TTM looks like a big complex beast at first sight, but
trying to understand how it works avoids reinventing the wheel over and
over again. We still have to solve problems like cache invalidation,
synchronization and swap-out of dma buffers, which is a lot easier if we
go with a common framework. 

 The synchronization primitives attached to TTM are slightly confusing.
 At the bottom level, it's operations which need to be synchronized
 between each other. That's the API level that we should to export from
 kernel to user space. It's then up to libdrm level (or whatever is doing
 the rendering in user space) to decide which operations it wants to have
 completed before a buffer can be reused/read/passed on to the next stage.
 
That's exactly the level we are providing to userspace from other
drivers using TTM like radeon or nouveau.

 Anyway, if we hide the memory manager behind dma_buf, we're free to muck
 around with multiple of them and see what works best.
 
dma_buf at the current level is only a way to share buffers and does not
provide enough information about the buffer to be useful as an
abstraction level on top of multiple memory managers. But I agree that
we should try to get dma_buf integration right from the start, as the
zero-copy share a very useful thing to have.

Lucas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-07-01 Thread Lucas Stach
Am Samstag, den 30.06.2012, 20:01 +0200 schrieb Thierry Reding:
 On Fri, Jun 29, 2012 at 04:20:31PM +0300, Terje Bergström wrote:
  On 28.06.2012 20:19, Lucas Stach wrote:
   TTM though solves more advanced matters, like buffer synchronisation
   between 3D and 2D block of hardware or syncing buffer access between GPU
   and CPU.
   One of the most interesting things of TTM is the ability to purge the
   GPU DMA buffers to scattered sysmem or even swap them out, if they are
   not currently used by the GPU. It then makes sure to move them in the
   contig space again when the GPU really needs them and fix up the GPU
   command stream with the new buffer address.
  
  We preferably should choose dma_buf as a common interface towards
  buffers. That way whatever we choose as the memory manager, all dma_buf
  aware drivers will be able to use buffers allocated by other drivers.
  
  We probably need to accommodate multiple memory managers to take care of
  legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we
  have the possibility to do zero-copy video without forcing everybody to
  use the same memory manager.
 
 I agree. Supporting DMA BUF also doesn't seem very difficult.
 
  As I understand, TTM is good for platforms where we have a separate
  frame buffer memory, as is the case with most of the graphics cards. In
  Tegra, graphics and CPU occupy the same memory, so I'm not sure if we
  require the level of functionality that TTM provides. I guess the level
  of functionality and the complexity that it brings is one reason why TTM
  hasn't really caught on in the ARM world.
  
  The synchronization primitives attached to TTM are slightly confusing.
  At the bottom level, it's operations which need to be synchronized
  between each other. That's the API level that we should to export from
  kernel to user space. It's then up to libdrm level (or whatever is doing
  the rendering in user space) to decide which operations it wants to have
  completed before a buffer can be reused/read/passed on to the next stage.
  
  Anyway, if we hide the memory manager behind dma_buf, we're free to muck
  around with multiple of them and see what works best.
 
 Exactly. Other subthreads echo this as well. Using CMA seems the easiest
 and most flexible for now but still covers everything we need. If it
 turns out that it isn't suited for more advanced stuff once we start
 supporting 3D then we can still opt for something like TTM.
 
As working code is the primary goal, I would say go for it. I still
think TTM is the way to go, even for simple things like a DRM
framebuffer driver, but as CMA and TTM won't collide in their goals it
should be easy to put TTM in there after we have something going with
CMA.

Lucas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Tegra DRM device tree bindings

2012-06-30 Thread Thierry Reding
On Thu, Jun 28, 2012 at 10:46:58AM -0600, Stephen Warren wrote:
> On 06/28/2012 12:18 AM, Hiroshi Doyu wrote:
> > On Wed, 27 Jun 2012 16:44:14 +0200
> > Thierry Reding  wrote:
> > 
> >>> I think that "coherent_pool" can be used only when the amount of
> >>> contiguous memory is short in your system. Otherwise even unnecessary.
> >>>
> >>> Could you explain a bit more why you want carveout size on per-board 
> >>> basis?
> >>
> >> In the ideal case I would want to not have a carveout size at all.
> >> However there may be situations where you need to make sure some driver
> >> can allocate a given amount of memory. Having to specify this using a
> >> kernel command-line parameter is cumbersome because it may require
> >> changes to the bootloader or whatever. So if you know that a particular
> >> board always needs 128 MiB of carveout, then it makes sense to specify
> >> it on a per-board basis.
> > 
> > Hm...I could understand somewhat;) but DT can also specify "bootargs"
> > in dts file, which can support per-board-wide spec too, like the above
> > sum of carveout needed from all drivers. I just want to avoid
> > introducing a new parameter additionaly if we can make use of the
> > existing mechanism.
> 
> The bootargs in the DT file is usually provided by (over-written by) the
> bootloader. If we start requiring lots of random kernel command-line
> arguments, that makes it more effort for the user of the bootloader
> (e.g. distribution bootloader scripts, etc.) to create the kernel
> command-line. I'd prefer to avoid that as much as possible. That said,
> using a standardized command-line option that is (or will be) used by
> all (ARM?) SoCs for the same purpose is reasonable, because there's
> commonality there.

I propose we stick with the default CMA settings for now and see where
we end up. No need to jump the gun.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-30 Thread Thierry Reding
On Fri, Jun 29, 2012 at 04:20:31PM +0300, Terje Bergstr?m wrote:
> On 28.06.2012 20:19, Lucas Stach wrote:
> > TTM though solves more advanced matters, like buffer synchronisation
> > between 3D and 2D block of hardware or syncing buffer access between GPU
> > and CPU.
> > One of the most interesting things of TTM is the ability to purge the
> > GPU DMA buffers to scattered sysmem or even swap them out, if they are
> > not currently used by the GPU. It then makes sure to move them in the
> > contig space again when the GPU really needs them and fix up the GPU
> > command stream with the new buffer address.
> 
> We preferably should choose dma_buf as a common interface towards
> buffers. That way whatever we choose as the memory manager, all dma_buf
> aware drivers will be able to use buffers allocated by other drivers.
> 
> We probably need to accommodate multiple memory managers to take care of
> legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we
> have the possibility to do zero-copy video without forcing everybody to
> use the same memory manager.

I agree. Supporting DMA BUF also doesn't seem very difficult.

> As I understand, TTM is good for platforms where we have a separate
> frame buffer memory, as is the case with most of the graphics cards. In
> Tegra, graphics and CPU occupy the same memory, so I'm not sure if we
> require the level of functionality that TTM provides. I guess the level
> of functionality and the complexity that it brings is one reason why TTM
> hasn't really caught on in the ARM world.
> 
> The synchronization primitives attached to TTM are slightly confusing.
> At the bottom level, it's operations which need to be synchronized
> between each other. That's the API level that we should to export from
> kernel to user space. It's then up to libdrm level (or whatever is doing
> the rendering in user space) to decide which operations it wants to have
> completed before a buffer can be reused/read/passed on to the next stage.
> 
> Anyway, if we hide the memory manager behind dma_buf, we're free to muck
> around with multiple of them and see what works best.

Exactly. Other subthreads echo this as well. Using CMA seems the easiest
and most flexible for now but still covers everything we need. If it
turns out that it isn't suited for more advanced stuff once we start
supporting 3D then we can still opt for something like TTM.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-30 Thread Thierry Reding
On Wed, Jun 27, 2012 at 10:13:56AM +0200, Lucas Stach wrote:
> Hi all,
> 
> I'm not sure what your exact plans are for the direction in which the
> DRM driver should head, as I'm still a bit out of the loop as many of
> those matters were only discussed internally at NVIDIA or with some NDA
> developers. But I'll still try to get into the discussion.
> 
> Am Mittwoch, den 27.06.2012, 07:14 +0200 schrieb Thierry Reding:
> > On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > > On Tue, 26 Jun 2012 12:55:13 +0200
> > > > Thierry Reding  wrote:
> > > >> ...
> > >  I'm not sure I understand how information about the carveout would be
> > >  obtained from the IOMMU API, though.
> > > >>>
> > > >>> I think that can be similar with current gart implementation. Define 
> > > >>> carveout as:
> > > >>>
> > > >>> carveout {
> > > >>> compatible = "nvidia,tegra20-carveout";
> > > >>> size = <0x1000>;
> > > >>> };
> > > >>>
> > > >>> Then create a file such like "tegra-carveout.c" to get these 
> > > >>> definitions and
> > > >> register itself as platform device's iommu instance.
> > > >>
> > > >> The carveout isn't a HW object, so it doesn't seem appropriate to 
> > > >> define a DT
> > > >> node to represent it.
> > > > 
> > > > Yes. But I think it's better to export the size of carveout as a 
> > > > configurable item.
> > > > So we need to define this somewhere. How about define carveout as a 
> > > > property of gart?
> > > 
> > > There already exists a way of preventing Linux from using certain chunks
> > > of memory; the /memreserve/ syntax. From a brief look at the dtc source,
> > > it looks like /memreserve/ entries can have labels, which implies that a
> > > property in the GART node could refer to the /memreserve/ entry by
> > > phandle in order to know what memory regions to use.
> > 
> > Wasn't the whole point of using a carveout supposed to be a replacement
> > for the GART? As such I'd think the carveout should rather be a property
> > of the host1x device.
> > 
> In my understanding carveout is neither a hardware nor software
> component. It's just a somewhat special pool of memory. As I pointed out
> in one of the older mails, carveout can not completely replace GART.
> While normal allocations for graphics use should be done contiguous,
> GART allows us to link normal scattered sysram buffers into GPU address
> space, which is a nice thing to have.

Do you agree that this will likely not be a problem with a more or less
stupid framebuffer DRM driver? I recall you mention that in a 3D context
where the scattered buffers are for example geometry provided by OpenGL,
right?

I think we need to put some thoughts into that once we start to
implement more advanced features. At that point we may also want to
think about how to integrate that with TTM.

> IMHO if carveout is to be used exclusively by the GPU (i.e. the DRM
> driver) it should be a property of the host1x device.
> 
> > AIUI what we want to do is have a large contiguous region of memory that
> > a central component (host1x) manages as a pool from which clients (DRM,
> > V4L, ...) can allocate buffers as needed. Since all of this memory will
> > be contiguous anyway there isn't much use for the GART anymore.
> > 
> I think this is the wrong way to go. Having a special memory pool
> managed by some driver adds one more allocator to the kernel, which is
> clearly not desirable. If we want a special mem region for GPU use, we
> should not share this memory pool with other components.
> 
> But if we want a mem region for contig allocations used by many
> components, which seems to be consensus here, CMA is the way to go. In
> this case I think we don't want to bother with the carveout property at
> all at the DRM driver level. Such a shared mem region managed by CMA
> should be defined at a higher level of the device tree.

Okay, this pretty much matches what we've agreed on in another
subthread. CMA looks like the best option for now as it should
cover everything we need at present.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Re: Tegra DRM device tree bindings

2012-06-30 Thread Thierry Reding
On Wed, Jun 27, 2012 at 10:13:56AM +0200, Lucas Stach wrote:
 Hi all,
 
 I'm not sure what your exact plans are for the direction in which the
 DRM driver should head, as I'm still a bit out of the loop as many of
 those matters were only discussed internally at NVIDIA or with some NDA
 developers. But I'll still try to get into the discussion.
 
 Am Mittwoch, den 27.06.2012, 07:14 +0200 schrieb Thierry Reding:
  On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
   On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200
Thierry Reding thierry.red...@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be
obtained from the IOMMU API, though.
   
I think that can be similar with current gart implementation. Define 
carveout as:
   
carveout {
compatible = nvidia,tegra20-carveout;
size = 0x1000;
};
   
Then create a file such like tegra-carveout.c to get these 
definitions and
register itself as platform device's iommu instance.
   
The carveout isn't a HW object, so it doesn't seem appropriate to 
define a DT
node to represent it.

Yes. But I think it's better to export the size of carveout as a 
configurable item.
So we need to define this somewhere. How about define carveout as a 
property of gart?
   
   There already exists a way of preventing Linux from using certain chunks
   of memory; the /memreserve/ syntax. From a brief look at the dtc source,
   it looks like /memreserve/ entries can have labels, which implies that a
   property in the GART node could refer to the /memreserve/ entry by
   phandle in order to know what memory regions to use.
  
  Wasn't the whole point of using a carveout supposed to be a replacement
  for the GART? As such I'd think the carveout should rather be a property
  of the host1x device.
  
 In my understanding carveout is neither a hardware nor software
 component. It's just a somewhat special pool of memory. As I pointed out
 in one of the older mails, carveout can not completely replace GART.
 While normal allocations for graphics use should be done contiguous,
 GART allows us to link normal scattered sysram buffers into GPU address
 space, which is a nice thing to have.

Do you agree that this will likely not be a problem with a more or less
stupid framebuffer DRM driver? I recall you mention that in a 3D context
where the scattered buffers are for example geometry provided by OpenGL,
right?

I think we need to put some thoughts into that once we start to
implement more advanced features. At that point we may also want to
think about how to integrate that with TTM.

 IMHO if carveout is to be used exclusively by the GPU (i.e. the DRM
 driver) it should be a property of the host1x device.
 
  AIUI what we want to do is have a large contiguous region of memory that
  a central component (host1x) manages as a pool from which clients (DRM,
  V4L, ...) can allocate buffers as needed. Since all of this memory will
  be contiguous anyway there isn't much use for the GART anymore.
  
 I think this is the wrong way to go. Having a special memory pool
 managed by some driver adds one more allocator to the kernel, which is
 clearly not desirable. If we want a special mem region for GPU use, we
 should not share this memory pool with other components.
 
 But if we want a mem region for contig allocations used by many
 components, which seems to be consensus here, CMA is the way to go. In
 this case I think we don't want to bother with the carveout property at
 all at the DRM driver level. Such a shared mem region managed by CMA
 should be defined at a higher level of the device tree.

Okay, this pretty much matches what we've agreed on in another
subthread. CMA looks like the best option for now as it should
cover everything we need at present.

Thierry


pgpixFkLDEvvQ.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-30 Thread Thierry Reding
On Fri, Jun 29, 2012 at 04:20:31PM +0300, Terje Bergström wrote:
 On 28.06.2012 20:19, Lucas Stach wrote:
  TTM though solves more advanced matters, like buffer synchronisation
  between 3D and 2D block of hardware or syncing buffer access between GPU
  and CPU.
  One of the most interesting things of TTM is the ability to purge the
  GPU DMA buffers to scattered sysmem or even swap them out, if they are
  not currently used by the GPU. It then makes sure to move them in the
  contig space again when the GPU really needs them and fix up the GPU
  command stream with the new buffer address.
 
 We preferably should choose dma_buf as a common interface towards
 buffers. That way whatever we choose as the memory manager, all dma_buf
 aware drivers will be able to use buffers allocated by other drivers.
 
 We probably need to accommodate multiple memory managers to take care of
 legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we
 have the possibility to do zero-copy video without forcing everybody to
 use the same memory manager.

I agree. Supporting DMA BUF also doesn't seem very difficult.

 As I understand, TTM is good for platforms where we have a separate
 frame buffer memory, as is the case with most of the graphics cards. In
 Tegra, graphics and CPU occupy the same memory, so I'm not sure if we
 require the level of functionality that TTM provides. I guess the level
 of functionality and the complexity that it brings is one reason why TTM
 hasn't really caught on in the ARM world.
 
 The synchronization primitives attached to TTM are slightly confusing.
 At the bottom level, it's operations which need to be synchronized
 between each other. That's the API level that we should to export from
 kernel to user space. It's then up to libdrm level (or whatever is doing
 the rendering in user space) to decide which operations it wants to have
 completed before a buffer can be reused/read/passed on to the next stage.
 
 Anyway, if we hide the memory manager behind dma_buf, we're free to muck
 around with multiple of them and see what works best.

Exactly. Other subthreads echo this as well. Using CMA seems the easiest
and most flexible for now but still covers everything we need. If it
turns out that it isn't suited for more advanced stuff once we start
supporting 3D then we can still opt for something like TTM.

Thierry


pgpNCZdcoTvzv.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-30 Thread Thierry Reding
On Thu, Jun 28, 2012 at 10:46:58AM -0600, Stephen Warren wrote:
 On 06/28/2012 12:18 AM, Hiroshi Doyu wrote:
  On Wed, 27 Jun 2012 16:44:14 +0200
  Thierry Reding thierry.red...@avionic-design.de wrote:
  
  I think that coherent_pool can be used only when the amount of
  contiguous memory is short in your system. Otherwise even unnecessary.
 
  Could you explain a bit more why you want carveout size on per-board 
  basis?
 
  In the ideal case I would want to not have a carveout size at all.
  However there may be situations where you need to make sure some driver
  can allocate a given amount of memory. Having to specify this using a
  kernel command-line parameter is cumbersome because it may require
  changes to the bootloader or whatever. So if you know that a particular
  board always needs 128 MiB of carveout, then it makes sense to specify
  it on a per-board basis.
  
  Hm...I could understand somewhat;) but DT can also specify bootargs
  in dts file, which can support per-board-wide spec too, like the above
  sum of carveout needed from all drivers. I just want to avoid
  introducing a new parameter additionaly if we can make use of the
  existing mechanism.
 
 The bootargs in the DT file is usually provided by (over-written by) the
 bootloader. If we start requiring lots of random kernel command-line
 arguments, that makes it more effort for the user of the bootloader
 (e.g. distribution bootloader scripts, etc.) to create the kernel
 command-line. I'd prefer to avoid that as much as possible. That said,
 using a standardized command-line option that is (or will be) used by
 all (ARM?) SoCs for the same purpose is reasonable, because there's
 commonality there.

I propose we stick with the default CMA settings for now and see where
we end up. No need to jump the gun.

Thierry


pgprLlakryab1.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Tegra DRM device tree bindings

2012-06-29 Thread Terje Bergström
On 28.06.2012 20:19, Lucas Stach wrote:
> TTM though solves more advanced matters, like buffer synchronisation
> between 3D and 2D block of hardware or syncing buffer access between GPU
> and CPU.
> One of the most interesting things of TTM is the ability to purge the
> GPU DMA buffers to scattered sysmem or even swap them out, if they are
> not currently used by the GPU. It then makes sure to move them in the
> contig space again when the GPU really needs them and fix up the GPU
> command stream with the new buffer address.

We preferably should choose dma_buf as a common interface towards
buffers. That way whatever we choose as the memory manager, all dma_buf
aware drivers will be able to use buffers allocated by other drivers.

We probably need to accommodate multiple memory managers to take care of
legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we
have the possibility to do zero-copy video without forcing everybody to
use the same memory manager.

As I understand, TTM is good for platforms where we have a separate
frame buffer memory, as is the case with most of the graphics cards. In
Tegra, graphics and CPU occupy the same memory, so I'm not sure if we
require the level of functionality that TTM provides. I guess the level
of functionality and the complexity that it brings is one reason why TTM
hasn't really caught on in the ARM world.

The synchronization primitives attached to TTM are slightly confusing.
At the bottom level, it's operations which need to be synchronized
between each other. That's the API level that we should to export from
kernel to user space. It's then up to libdrm level (or whatever is doing
the rendering in user space) to decide which operations it wants to have
completed before a buffer can be reused/read/passed on to the next stage.

Anyway, if we hide the memory manager behind dma_buf, we're free to muck
around with multiple of them and see what works best.

Terje


Tegra DRM device tree bindings

2012-06-29 Thread Mark Zhang
> > Am Donnerstag, den 28.06.2012, 10:51 -0600 schrieb Stephen Warren:
> > > On 06/28/2012 05:12 AM, Thierry Reding wrote:
> > > > On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
> > > >> Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
> > > ...
> > > >>> In the ideal case I would want to not have a carveout size at all.
> > > >>> However there may be situations where you need to make sure some
> > > >>> driver can allocate a given amount of memory. Having to specify
> > > >>> this using a kernel command-line parameter is cumbersome because
> > > >>> it may require changes to the bootloader or whatever. So if you
> > > >>> know that a particular board always needs
> > > >>> 128 MiB of carveout, then it makes sense to specify it on a
> > > >>> per-board basis.
> > > >>
> > > >> If we go with CMA, this is a non-issue, as CMA allows to use the
> > > >> contig area for normal allocations and only purges them if it
> > > >> really needs the space for contig allocs.
> > > >
> > > > CMA certainly sounds like the most simple approach. While it may
> > > > not be suited for 3D graphics or multimedia processing later on, I
> > > > think we could use it at a starting point to get basic framebuffer
> > > > and X support up and running. We can always move to something more
> > > > advanced like TTM later.
> > >
> > > I thought the whole purpose of CMA was to act as the infra-structure
> > > to provide buffers to 3D, camera, etc. in particular allowing
> > > sharing of buffers between them. In other words, isn't CMA the memory
> manager?
> > > If there's some deficiency with CMA for 3D graphics, it seems like
> > > that should be raised with those designing CMA. Or, am I way off
> > > base with my expectations of CMA?
> > >
> > CMA is just a way of providing large contiguous address space blocks
> > in a dynamic fashion. The problem CMA solves is: we have a system with
> > relatively low amounts of sysmem (like 512MB), now to ensure we can
> > always get large contiguous buffers for use by GPU or VIDEO blocks, we
> > need to set aside a relatively large contiguous pool (like 128MB). So
> > we are stealing 128MB of memory from the system while we may or may not
> use it, which is bad.
> > Now CMA allows to say: I may need 128MB of contig space, but the
> > system is free to use it as normal memory as long as I don't really
> > need it. If the space is really needed, CMA purges pages from the area
> > and may even swap them out. So yes CMA is a memory allocator for contig
> memory.
> >
> > TTM though solves more advanced matters, like buffer synchronisation
> > between 3D and 2D block of hardware or syncing buffer access between GPU
> and CPU.
> > One of the most interesting things of TTM is the ability to purge the
> > GPU DMA buffers to scattered sysmem or even swap them out, if they are
> > not currently used by the GPU. It then makes sure to move them in the
> > contig space again when the GPU really needs them and fix up the GPU
> > command stream with the new buffer address.
> >
> > IMHO the best solution would be to use CMA as a flexible replacement
> > of the static carveout area and put TTM on top of this to solve the
> > needs of graphics drivers. We certainly don't want to reinvent the
> > wheel inside CMA. We have solutions for all those things in the kernel
> > right now, we just have to glue them together in a sane way.
> >
> 
> That is a great explanation. So could you explain what's the relation between
> IOMMU api and TTM(or GEM)?
> Terje said DMABUF api sits on top of IOMMU api. So for normal device
> drivers(such as drm), can forget iommu apis, just use dmabuf api is OK. If 
> so, I
> wanna know does TTM/GEM and IOMMU are related? Or TTM/GEM uses
> dmabuf apis which calls iommu api to do memory allocation/mapping?
>  

Sorry for my stupid question. DMA mapping api sits on top of IOMMU api, not 
DMABUF.
DMABUF is used to share buffers.
So please ignore what I said.

> > Thanks,
> > Lucas
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-tegra"
> > in the body of a message to majordomo at vger.kernel.org More majordomo
> > info at http://vger.kernel.org/majordomo-info.html


Tegra DRM device tree bindings

2012-06-29 Thread Mark Zhang
> Am Donnerstag, den 28.06.2012, 10:51 -0600 schrieb Stephen Warren:
> > On 06/28/2012 05:12 AM, Thierry Reding wrote:
> > > On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
> > >> Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
> > ...
> > >>> In the ideal case I would want to not have a carveout size at all.
> > >>> However there may be situations where you need to make sure some
> > >>> driver can allocate a given amount of memory. Having to specify
> > >>> this using a kernel command-line parameter is cumbersome because
> > >>> it may require changes to the bootloader or whatever. So if you
> > >>> know that a particular board always needs
> > >>> 128 MiB of carveout, then it makes sense to specify it on a
> > >>> per-board basis.
> > >>
> > >> If we go with CMA, this is a non-issue, as CMA allows to use the
> > >> contig area for normal allocations and only purges them if it
> > >> really needs the space for contig allocs.
> > >
> > > CMA certainly sounds like the most simple approach. While it may not
> > > be suited for 3D graphics or multimedia processing later on, I think
> > > we could use it at a starting point to get basic framebuffer and X
> > > support up and running. We can always move to something more
> > > advanced like TTM later.
> >
> > I thought the whole purpose of CMA was to act as the infra-structure
> > to provide buffers to 3D, camera, etc. in particular allowing sharing
> > of buffers between them. In other words, isn't CMA the memory manager?
> > If there's some deficiency with CMA for 3D graphics, it seems like
> > that should be raised with those designing CMA. Or, am I way off base
> > with my expectations of CMA?
> >
> CMA is just a way of providing large contiguous address space blocks in a 
> dynamic
> fashion. The problem CMA solves is: we have a system with relatively low
> amounts of sysmem (like 512MB), now to ensure we can always get large
> contiguous buffers for use by GPU or VIDEO blocks, we need to set aside a
> relatively large contiguous pool (like 128MB). So we are stealing 128MB of
> memory from the system while we may or may not use it, which is bad.
> Now CMA allows to say: I may need 128MB of contig space, but the system is 
> free
> to use it as normal memory as long as I don't really need it. If the space is 
> really
> needed, CMA purges pages from the area and may even swap them out. So yes
> CMA is a memory allocator for contig memory.
> 
> TTM though solves more advanced matters, like buffer synchronisation between
> 3D and 2D block of hardware or syncing buffer access between GPU and CPU.
> One of the most interesting things of TTM is the ability to purge the GPU DMA
> buffers to scattered sysmem or even swap them out, if they are not currently
> used by the GPU. It then makes sure to move them in the contig space again
> when the GPU really needs them and fix up the GPU command stream with the
> new buffer address.
> 
> IMHO the best solution would be to use CMA as a flexible replacement of the
> static carveout area and put TTM on top of this to solve the needs of graphics
> drivers. We certainly don't want to reinvent the wheel inside CMA. We have
> solutions for all those things in the kernel right now, we just have to glue 
> them
> together in a sane way.
> 

That is a great explanation. So could you explain what's the relation between 
IOMMU api 
and TTM(or GEM)? 
Terje said DMABUF api sits on top of IOMMU api. So for normal device 
drivers(such as drm), 
can forget iommu apis, just use dmabuf api is OK. If so, I wanna know does 
TTM/GEM and IOMMU are related? Or TTM/GEM uses dmabuf apis which calls iommu 
api 
to do memory allocation/mapping?


> Thanks,
> Lucas
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the 
> body
> of a message to majordomo at vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


RE: Tegra DRM device tree bindings

2012-06-29 Thread Mark Zhang
  Am Donnerstag, den 28.06.2012, 10:51 -0600 schrieb Stephen Warren:
   On 06/28/2012 05:12 AM, Thierry Reding wrote:
On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
   ...
In the ideal case I would want to not have a carveout size at all.
However there may be situations where you need to make sure some
driver can allocate a given amount of memory. Having to specify
this using a kernel command-line parameter is cumbersome because
it may require changes to the bootloader or whatever. So if you
know that a particular board always needs
128 MiB of carveout, then it makes sense to specify it on a
per-board basis.
   
If we go with CMA, this is a non-issue, as CMA allows to use the
contig area for normal allocations and only purges them if it
really needs the space for contig allocs.
   
CMA certainly sounds like the most simple approach. While it may
not be suited for 3D graphics or multimedia processing later on, I
think we could use it at a starting point to get basic framebuffer
and X support up and running. We can always move to something more
advanced like TTM later.
  
   I thought the whole purpose of CMA was to act as the infra-structure
   to provide buffers to 3D, camera, etc. in particular allowing
   sharing of buffers between them. In other words, isn't CMA the memory
 manager?
   If there's some deficiency with CMA for 3D graphics, it seems like
   that should be raised with those designing CMA. Or, am I way off
   base with my expectations of CMA?
  
  CMA is just a way of providing large contiguous address space blocks
  in a dynamic fashion. The problem CMA solves is: we have a system with
  relatively low amounts of sysmem (like 512MB), now to ensure we can
  always get large contiguous buffers for use by GPU or VIDEO blocks, we
  need to set aside a relatively large contiguous pool (like 128MB). So
  we are stealing 128MB of memory from the system while we may or may not
 use it, which is bad.
  Now CMA allows to say: I may need 128MB of contig space, but the
  system is free to use it as normal memory as long as I don't really
  need it. If the space is really needed, CMA purges pages from the area
  and may even swap them out. So yes CMA is a memory allocator for contig
 memory.
 
  TTM though solves more advanced matters, like buffer synchronisation
  between 3D and 2D block of hardware or syncing buffer access between GPU
 and CPU.
  One of the most interesting things of TTM is the ability to purge the
  GPU DMA buffers to scattered sysmem or even swap them out, if they are
  not currently used by the GPU. It then makes sure to move them in the
  contig space again when the GPU really needs them and fix up the GPU
  command stream with the new buffer address.
 
  IMHO the best solution would be to use CMA as a flexible replacement
  of the static carveout area and put TTM on top of this to solve the
  needs of graphics drivers. We certainly don't want to reinvent the
  wheel inside CMA. We have solutions for all those things in the kernel
  right now, we just have to glue them together in a sane way.
 
 
 That is a great explanation. So could you explain what's the relation between
 IOMMU api and TTM(or GEM)?
 Terje said DMABUF api sits on top of IOMMU api. So for normal device
 drivers(such as drm), can forget iommu apis, just use dmabuf api is OK. If 
 so, I
 wanna know does TTM/GEM and IOMMU are related? Or TTM/GEM uses
 dmabuf apis which calls iommu api to do memory allocation/mapping?
  

Sorry for my stupid question. DMA mapping api sits on top of IOMMU api, not 
DMABUF.
DMABUF is used to share buffers.
So please ignore what I said.

  Thanks,
  Lucas
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-tegra
  in the body of a message to majord...@vger.kernel.org More majordomo
  info at http://vger.kernel.org/majordomo-info.html
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-29 Thread Stephen Warren
On 06/28/2012 11:19 AM, Lucas Stach wrote:
...
 CMA is just a way of providing large contiguous address space blocks in
 a dynamic fashion. ...
 
 TTM though solves more advanced matters, like buffer synchronisation
 between 3D and 2D block of hardware ...
 
 IMHO the best solution would be to use CMA as a flexible replacement of
 the static carveout area and put TTM on top of this ...

Ah right, thanks for the explanation. That makes sense to me now.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-29 Thread Terje Bergström
On 28.06.2012 20:19, Lucas Stach wrote:
 TTM though solves more advanced matters, like buffer synchronisation
 between 3D and 2D block of hardware or syncing buffer access between GPU
 and CPU.
 One of the most interesting things of TTM is the ability to purge the
 GPU DMA buffers to scattered sysmem or even swap them out, if they are
 not currently used by the GPU. It then makes sure to move them in the
 contig space again when the GPU really needs them and fix up the GPU
 command stream with the new buffer address.

We preferably should choose dma_buf as a common interface towards
buffers. That way whatever we choose as the memory manager, all dma_buf
aware drivers will be able to use buffers allocated by other drivers.

We probably need to accommodate multiple memory managers to take care of
legacy and new drivers. If V4L2 and DRM projects all move to dma_buf, we
have the possibility to do zero-copy video without forcing everybody to
use the same memory manager.

As I understand, TTM is good for platforms where we have a separate
frame buffer memory, as is the case with most of the graphics cards. In
Tegra, graphics and CPU occupy the same memory, so I'm not sure if we
require the level of functionality that TTM provides. I guess the level
of functionality and the complexity that it brings is one reason why TTM
hasn't really caught on in the ARM world.

The synchronization primitives attached to TTM are slightly confusing.
At the bottom level, it's operations which need to be synchronized
between each other. That's the API level that we should to export from
kernel to user space. It's then up to libdrm level (or whatever is doing
the rendering in user space) to decide which operations it wants to have
completed before a buffer can be reused/read/passed on to the next stage.

Anyway, if we hide the memory manager behind dma_buf, we're free to muck
around with multiple of them and see what works best.

Terje
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Tegra DRM device tree bindings

2012-06-28 Thread Thierry Reding
On Thu, Jun 28, 2012 at 11:33:56AM -0600, Stephen Warren wrote:
> On 06/28/2012 11:19 AM, Lucas Stach wrote:
> ...
> > CMA is just a way of providing large contiguous address space blocks in
> > a dynamic fashion. ...
> > 
> > TTM though solves more advanced matters, like buffer synchronisation
> > between 3D and 2D block of hardware ...
> > 
> > IMHO the best solution would be to use CMA as a flexible replacement of
> > the static carveout area and put TTM on top of this ...
> 
> Ah right, thanks for the explanation. That makes sense to me now.

Okay. I think that resolves the last open issue. I'll try to get some
more work done on the DT and corresponding code soonish.

For those who don't know yet I've requested the creation of a project
on freedesktop.org for Tegra graphics drivers[0]. I plan to have the DRM
code hosted there once the project has been approved. Furthermore if we
ever get to write a corresponding X driver it can be hosted there as
well. We should also use the wiki for coordination once things get
started.

Thierry

[0]: https://bugs.freedesktop.org/show_bug.cgi?id=51505
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-28 Thread Lucas Stach
Am Donnerstag, den 28.06.2012, 10:51 -0600 schrieb Stephen Warren:
> On 06/28/2012 05:12 AM, Thierry Reding wrote:
> > On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
> >> Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
> ...
> >>> In the ideal case I would want to not have a carveout size at
> >>> all. However there may be situations where you need to make
> >>> sure some driver can allocate a given amount of memory. Having
> >>> to specify this using a kernel command-line parameter is
> >>> cumbersome because it may require changes to the bootloader or
> >>> whatever. So if you know that a particular board always needs
> >>> 128 MiB of carveout, then it makes sense to specify it on a
> >>> per-board basis.
> >> 
> >> If we go with CMA, this is a non-issue, as CMA allows to use the
> >> contig area for normal allocations and only purges them if it
> >> really needs the space for contig allocs.
> > 
> > CMA certainly sounds like the most simple approach. While it may
> > not be suited for 3D graphics or multimedia processing later on, I
> > think we could use it at a starting point to get basic framebuffer
> > and X support up and running. We can always move to something more
> > advanced like TTM later.
> 
> I thought the whole purpose of CMA was to act as the infra-structure
> to provide buffers to 3D, camera, etc. in particular allowing sharing
> of buffers between them. In other words, isn't CMA the memory manager?
> If there's some deficiency with CMA for 3D graphics, it seems like
> that should be raised with those designing CMA. Or, am I way off base
> with my expectations of CMA?
> 
CMA is just a way of providing large contiguous address space blocks in
a dynamic fashion. The problem CMA solves is: we have a system with
relatively low amounts of sysmem (like 512MB), now to ensure we can
always get large contiguous buffers for use by GPU or VIDEO blocks, we
need to set aside a relatively large contiguous pool (like 128MB). So we
are stealing 128MB of memory from the system while we may or may not use
it, which is bad.
Now CMA allows to say: I may need 128MB of contig space, but the system
is free to use it as normal memory as long as I don't really need it. If
the space is really needed, CMA purges pages from the area and may even
swap them out. So yes CMA is a memory allocator for contig memory.

TTM though solves more advanced matters, like buffer synchronisation
between 3D and 2D block of hardware or syncing buffer access between GPU
and CPU.
One of the most interesting things of TTM is the ability to purge the
GPU DMA buffers to scattered sysmem or even swap them out, if they are
not currently used by the GPU. It then makes sure to move them in the
contig space again when the GPU really needs them and fix up the GPU
command stream with the new buffer address.

IMHO the best solution would be to use CMA as a flexible replacement of
the static carveout area and put TTM on top of this to solve the needs
of graphics drivers. We certainly don't want to reinvent the wheel
inside CMA. We have solutions for all those things in the kernel right
now, we just have to glue them together in a sane way.

Thanks,
Lucas



Tegra DRM device tree bindings

2012-06-28 Thread Lucas Stach
Hi Thierry,

Am Donnerstag, den 28.06.2012, 13:12 +0200 schrieb Thierry Reding:
> On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
> > Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
> > > On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
> > > > On Wed, 27 Jun 2012 16:08:10 +0200
> > > > Thierry Reding  wrote:
> > > > 
> > > > > * PGP Signed by an unknown key
> > > > > 
> > > > > On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
> > > > > > On Wed, 27 Jun 2012 07:14:18 +0200
> > > > > > Thierry Reding  wrote:
> > > > > > 
> > > > > > > > Old Signed by an unknown key
> > > > > > > 
> > > > > > > On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > > > > > > > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > > > > > > > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > > > > > > > On Tue, 26 Jun 2012 12:55:13 +0200
> > > > > > > > > Thierry Reding  
> > > > > > > > > wrote:
> > > > > > > > >> ...
> > > > > > > >  I'm not sure I understand how information about the 
> > > > > > > >  carveout would be
> > > > > > > >  obtained from the IOMMU API, though.
> > > > > > > > >>>
> > > > > > > > >>> I think that can be similar with current gart 
> > > > > > > > >>> implementation. Define carveout as:
> > > > > > > > >>>
> > > > > > > > >>> carveout {
> > > > > > > > >>> compatible = "nvidia,tegra20-carveout";
> > > > > > > > >>> size = <0x1000>;
> > > > > > > > >>> };
> > > > > > > > >>>
> > > > > > > > >>> Then create a file such like "tegra-carveout.c" to get 
> > > > > > > > >>> these definitions and
> > > > > > > > >> register itself as platform device's iommu instance.
> > > > > > > > >>
> > > > > > > > >> The carveout isn't a HW object, so it doesn't seem 
> > > > > > > > >> appropriate to define a DT
> > > > > > > > >> node to represent it.
> > > > > > > > > 
> > > > > > > > > Yes. But I think it's better to export the size of carveout 
> > > > > > > > > as a configurable item.
> > > > > > > > > So we need to define this somewhere. How about define 
> > > > > > > > > carveout as a property of gart?
> > > > > > > > 
> > > > > > > > There already exists a way of preventing Linux from using 
> > > > > > > > certain chunks
> > > > > > > > of memory; the /memreserve/ syntax. From a brief look at the 
> > > > > > > > dtc source,
> > > > > > > > it looks like /memreserve/ entries can have labels, which 
> > > > > > > > implies that a
> > > > > > > > property in the GART node could refer to the /memreserve/ entry 
> > > > > > > > by
> > > > > > > > phandle in order to know what memory regions to use.
> > > > > > > 
> > > > > > > Wasn't the whole point of using a carveout supposed to be a 
> > > > > > > replacement
> > > > > > > for the GART?
> > > > > > 
> > > > > > Mostly agree. IIUC, we use both carveout/gart allocated buffers in
> > > > > > android/tegra2.
> > > > > > 
> > > > > > >As such I'd think the carveout should rather be a property
> > > > > > > of the host1x device.
> > > > > > 
> > > > > > Rather than introducing a new property, how about using
> > > > > > "coherent_pool=??M" in the kernel command line if necessary? I think
> > > > > > that this carveout size depends on the system usage/load.
> > > > > 
> > > > > I was hoping that we could get away with using the CMA and perhaps
> > > > > initialize it based on device tree content. I agree that the carveout
> > > > > size depends on the use-case, but I still think it makes sense to
> > > > > specify it on a per-board basis.
> > > > 
> > > > DRM driver doesn't know if it uses CMA or not, because DRM only uses
> > > > DMA API.
> > > 
> > > So how is the DRM supposed to allocate buffers? Does it call the
> > > dma_alloc_from_contiguous() function to do that? I can see how it is
> > > used by arm_dma_ops but how does it end up in the driver?
> > > 
> > As I said before the DMA API is not a good fit for graphics drivers.
> > Most of the DMA buffers used by graphics cores are long lived and big,
> > so we need a special pool to alloc from to avoid eating all contiguous
> > address space, as DMA API does not provide shrinker callbacks for
> > clients using large amount of memory.
> 
> I recall you mentioning TTM as a better alternative several times in the
> past. How does it fit in with this? Does it have the capability of using
> a predefined chunk of contiguous memory as a pool to allocate from?
> 
> One problem that all of these solutions don't address is that not all
> devices below host1x are DRM related. At least for the CSI and VI blocks
> I expect there to be V4L2 drivers eventually, so what we really need is
> to manage allocations outside of the DRM. host1x is the most logical
> choice here.

I think you are right here. We might want to move all those
buffer/memory management in the host1x code and provide contig memory to
the host1x clients from there.

TTM has the ability to manage a chunk of memory for contig allocations.
Also I think 

Tegra DRM device tree bindings

2012-06-28 Thread Thierry Reding
On Wed, Jun 27, 2012 at 11:49:43AM -0600, Stephen Warren wrote:
> On 06/26/2012 11:07 PM, Thierry Reding wrote:
> > On Tue, Jun 26, 2012 at 04:48:14PM -0600, Stephen Warren wrote:
> ...
> > I actually like what you proposed above a lot, so if you don't
> > mind either way I'll go with that proposal. Keeping the connector
> > nodes as children of the outputs has the advantage of being able to
> > reference them if we need it at some point. But it is also
> > redundant in that a single output doesn't usually (never?) driver
> > more than one connector.
> 
> Yes, I believe that each output is 1:1 with (the video portion of) a
> connector. The display controllers obviously aren't 1:1.

Yes, the display controllers are only 1:1 for the RGB outputs. I'll
merge your proposed changes and see if I can come up with some code to
parse it.

> > The same issue will have to be addressed for the CSI and VI nodes,
> > but as I currently use neither of those I don't feel qualified to
> > propose a binding for them. Also for the VI part we're completely
> > missing documentation. Maybe somebody could push this to be
> > released as well?
> 
> I did file a bug noting the request for VI documentation. At this
> point in time, it's too early to say what, if anything, will come of that.

I think we have some raw register documentation for VI but it's next to
impossible to really understand the hardware block just by looking at
the registers.

> > If I understand correctly, most of the host1x children can also be 
> > chained in a processing pipeline to do postprocessing an video
> > input for example. I suppose that's generic and doesn't need to be
> > represented in DT either, right?
> 
> Yes, I believe that's something internal to the driver.

Okay. So I think apart from the carveout topic in the other subthread I
think we've settled on solutions for the remaining points. I'll try to
find some time to work all the changes into a new binding proposal and
get working on the code.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-28 Thread Thierry Reding
On Wed, Jun 27, 2012 at 12:02:46PM -0600, Stephen Warren wrote:
> On 06/27/2012 06:44 AM, Hiroshi Doyu wrote:
> ...
> > I think that there are 2 cases:
> > 
> >   (1) discontiguous memory with IOMMU
> >   (2) contiguous memory without IOMMU(called "carveout" in general?)
> ...
> > For (2), although memory is mostly anonymous one, we may need to know
> > how much to allocate, where we only need "size". This size is not from
> > h/w feature, but it depends on the system load/usage. So I think that
> > this size can be passed from kernel command line? For exmaple, we can
> > specify how much contiguous memory is necessary with putting
> > "coherent_pool=??M" in the kernel command line as below:
> > 
> > coherent_pool=nn[KMG]   [ARM,KNL]
> > Sets the size of memory pool for coherent, atomic dma
> > allocations.
> 
> I guess if that's the standard way of initializing CMA, then that's fine.
> 
> It'd be nice if there was a way to specify that from the DT too; that
> way the user/distro/bootloader constructing the kernel command-line
> wouldn't have to remember to add "random" (potentially
> Tegra-/board-specific) extra arguments onto the command-line; the Tegra
> command-line in the upstream kernel is quite clean right now, especially
> compare to the enormous number of options we require downstream:-(

Looking at Documentation/kernel-parameters.txt it seems the canonical
way to initialize CMA is using the "cma" kernel command-line parameter.
For device tree we could extend the "chosen" node to include something
like "carveout = <0x0400>;". Or "contiguous-memory" or whatever.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-28 Thread Thierry Reding
On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
> Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
> > On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
> > > On Wed, 27 Jun 2012 16:08:10 +0200
> > > Thierry Reding  wrote:
> > > 
> > > > * PGP Signed by an unknown key
> > > > 
> > > > On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
> > > > > On Wed, 27 Jun 2012 07:14:18 +0200
> > > > > Thierry Reding  wrote:
> > > > > 
> > > > > > > Old Signed by an unknown key
> > > > > > 
> > > > > > On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > > > > > > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > > > > > > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > > > > > > On Tue, 26 Jun 2012 12:55:13 +0200
> > > > > > > > Thierry Reding  wrote:
> > > > > > > >> ...
> > > > > > >  I'm not sure I understand how information about the carveout 
> > > > > > >  would be
> > > > > > >  obtained from the IOMMU API, though.
> > > > > > > >>>
> > > > > > > >>> I think that can be similar with current gart implementation. 
> > > > > > > >>> Define carveout as:
> > > > > > > >>>
> > > > > > > >>> carveout {
> > > > > > > >>> compatible = "nvidia,tegra20-carveout";
> > > > > > > >>> size = <0x1000>;
> > > > > > > >>> };
> > > > > > > >>>
> > > > > > > >>> Then create a file such like "tegra-carveout.c" to get these 
> > > > > > > >>> definitions and
> > > > > > > >> register itself as platform device's iommu instance.
> > > > > > > >>
> > > > > > > >> The carveout isn't a HW object, so it doesn't seem appropriate 
> > > > > > > >> to define a DT
> > > > > > > >> node to represent it.
> > > > > > > > 
> > > > > > > > Yes. But I think it's better to export the size of carveout as 
> > > > > > > > a configurable item.
> > > > > > > > So we need to define this somewhere. How about define carveout 
> > > > > > > > as a property of gart?
> > > > > > > 
> > > > > > > There already exists a way of preventing Linux from using certain 
> > > > > > > chunks
> > > > > > > of memory; the /memreserve/ syntax. From a brief look at the dtc 
> > > > > > > source,
> > > > > > > it looks like /memreserve/ entries can have labels, which implies 
> > > > > > > that a
> > > > > > > property in the GART node could refer to the /memreserve/ entry by
> > > > > > > phandle in order to know what memory regions to use.
> > > > > > 
> > > > > > Wasn't the whole point of using a carveout supposed to be a 
> > > > > > replacement
> > > > > > for the GART?
> > > > > 
> > > > > Mostly agree. IIUC, we use both carveout/gart allocated buffers in
> > > > > android/tegra2.
> > > > > 
> > > > > >As such I'd think the carveout should rather be a property
> > > > > > of the host1x device.
> > > > > 
> > > > > Rather than introducing a new property, how about using
> > > > > "coherent_pool=??M" in the kernel command line if necessary? I think
> > > > > that this carveout size depends on the system usage/load.
> > > > 
> > > > I was hoping that we could get away with using the CMA and perhaps
> > > > initialize it based on device tree content. I agree that the carveout
> > > > size depends on the use-case, but I still think it makes sense to
> > > > specify it on a per-board basis.
> > > 
> > > DRM driver doesn't know if it uses CMA or not, because DRM only uses
> > > DMA API.
> > 
> > So how is the DRM supposed to allocate buffers? Does it call the
> > dma_alloc_from_contiguous() function to do that? I can see how it is
> > used by arm_dma_ops but how does it end up in the driver?
> > 
> As I said before the DMA API is not a good fit for graphics drivers.
> Most of the DMA buffers used by graphics cores are long lived and big,
> so we need a special pool to alloc from to avoid eating all contiguous
> address space, as DMA API does not provide shrinker callbacks for
> clients using large amount of memory.

I recall you mentioning TTM as a better alternative several times in the
past. How does it fit in with this? Does it have the capability of using
a predefined chunk of contiguous memory as a pool to allocate from?

One problem that all of these solutions don't address is that not all
devices below host1x are DRM related. At least for the CSI and VI blocks
I expect there to be V4L2 drivers eventually, so what we really need is
to manage allocations outside of the DRM. host1x is the most logical
choice here.

Perhaps we can put host1x code somewhere below drivers/gpu (mm
subdirectory?), drivers/memory or perhaps some other or new location
that could eventually host similar drivers for other SoCs.

Then again, maybe it'd be easier for now to put everything below the
drivers/gpu/drm/tegra directory and cross that bridge when we get to it.

> > > I think that "coherent_pool" can be used only when the amount of
> > > contiguous memory is short in your system. Otherwise even unnecessary.
> > > 
> > > Could you explain a bit more why you want carveout size on 

Tegra DRM device tree bindings

2012-06-28 Thread Stephen Warren
On 06/28/2012 11:19 AM, Lucas Stach wrote:
...
> CMA is just a way of providing large contiguous address space blocks in
> a dynamic fashion. ...
> 
> TTM though solves more advanced matters, like buffer synchronisation
> between 3D and 2D block of hardware ...
> 
> IMHO the best solution would be to use CMA as a flexible replacement of
> the static carveout area and put TTM on top of this ...

Ah right, thanks for the explanation. That makes sense to me now.


Tegra DRM device tree bindings

2012-06-28 Thread Stephen Warren
On 06/28/2012 05:12 AM, Thierry Reding wrote:
> On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
>> Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
...
>>> In the ideal case I would want to not have a carveout size at
>>> all. However there may be situations where you need to make
>>> sure some driver can allocate a given amount of memory. Having
>>> to specify this using a kernel command-line parameter is
>>> cumbersome because it may require changes to the bootloader or
>>> whatever. So if you know that a particular board always needs
>>> 128 MiB of carveout, then it makes sense to specify it on a
>>> per-board basis.
>> 
>> If we go with CMA, this is a non-issue, as CMA allows to use the
>> contig area for normal allocations and only purges them if it
>> really needs the space for contig allocs.
> 
> CMA certainly sounds like the most simple approach. While it may
> not be suited for 3D graphics or multimedia processing later on, I
> think we could use it at a starting point to get basic framebuffer
> and X support up and running. We can always move to something more
> advanced like TTM later.

I thought the whole purpose of CMA was to act as the infra-structure
to provide buffers to 3D, camera, etc. in particular allowing sharing
of buffers between them. In other words, isn't CMA the memory manager?
If there's some deficiency with CMA for 3D graphics, it seems like
that should be raised with those designing CMA. Or, am I way off base
with my expectations of CMA?


Tegra DRM device tree bindings

2012-06-28 Thread Stephen Warren
On 06/28/2012 12:18 AM, Hiroshi Doyu wrote:
> On Wed, 27 Jun 2012 16:44:14 +0200
> Thierry Reding  wrote:
> 
>>> I think that "coherent_pool" can be used only when the amount of
>>> contiguous memory is short in your system. Otherwise even unnecessary.
>>>
>>> Could you explain a bit more why you want carveout size on per-board basis?
>>
>> In the ideal case I would want to not have a carveout size at all.
>> However there may be situations where you need to make sure some driver
>> can allocate a given amount of memory. Having to specify this using a
>> kernel command-line parameter is cumbersome because it may require
>> changes to the bootloader or whatever. So if you know that a particular
>> board always needs 128 MiB of carveout, then it makes sense to specify
>> it on a per-board basis.
> 
> Hm...I could understand somewhat;) but DT can also specify "bootargs"
> in dts file, which can support per-board-wide spec too, like the above
> sum of carveout needed from all drivers. I just want to avoid
> introducing a new parameter additionaly if we can make use of the
> existing mechanism.

The bootargs in the DT file is usually provided by (over-written by) the
bootloader. If we start requiring lots of random kernel command-line
arguments, that makes it more effort for the user of the bootloader
(e.g. distribution bootloader scripts, etc.) to create the kernel
command-line. I'd prefer to avoid that as much as possible. That said,
using a standardized command-line option that is (or will be) used by
all (ARM?) SoCs for the same purpose is reasonable, because there's
commonality there.


Tegra DRM device tree bindings

2012-06-28 Thread Lucas Stach
Am Donnerstag, den 28.06.2012, 09:06 +0300 schrieb Hiroshi Doyu:
> Hi Lucas,
> 
> On Wed, 27 Jun 2012 17:59:55 +0200
> Lucas Stach  wrote:
> 
> > > > > > Rather than introducing a new property, how about using
> > > > > > "coherent_pool=??M" in the kernel command line if necessary? I think
> > > > > > that this carveout size depends on the system usage/load.
> > > > > 
> > > > > I was hoping that we could get away with using the CMA and perhaps
> > > > > initialize it based on device tree content. I agree that the carveout
> > > > > size depends on the use-case, but I still think it makes sense to
> > > > > specify it on a per-board basis.
> > > > 
> > > > DRM driver doesn't know if it uses CMA or not, because DRM only uses
> > > > DMA API.
> > > 
> > > So how is the DRM supposed to allocate buffers? Does it call the
> > > dma_alloc_from_contiguous() function to do that? I can see how it is
> > > used by arm_dma_ops but how does it end up in the driver?
> > > 
> > As I said before the DMA API is not a good fit for graphics drivers.
> > Most of the DMA buffers used by graphics cores are long lived and big,
> > so we need a special pool to alloc from to avoid eating all contiguous
> > address space, as DMA API does not provide shrinker callbacks for
> > clients using large amount of memory.
> 
> For contiguious address space shortage issue in DMA API, I think that
> DMABUF framework can handle?
> 
No, DMABUF is only about sharing DMA buffers between different hardware
blocks. It has no address space management at all.

All DRM drivers manage their address space on their own, either through
GEM or TTM. On the desktop the main pool for contiguous memory is on
card VRAM or a area of system memory set aside for use by the DRM
driver. Having a carveout area only for DRM use would be the same thing,
but makes the split between graphics and system memory a bit unflexible.
Currently we have no sane way for integrating DRM memory managers with
the normal DMA API. There was some discussion about DMA pools and/or
shrinkers for DMA clients, but they have not led to any written code.

DMABUF only makes a limited integration between for example V4L and DRM
possible, so they can share buffers. But we still have the situation
that DRM devices allocate from their own pool, for explained reasons and
not use the standard DMA API.



Tegra DRM device tree bindings

2012-06-28 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 20:02:46 +0200
Stephen Warren  wrote:

> On 06/27/2012 06:44 AM, Hiroshi Doyu wrote:
> ...
> > I think that there are 2 cases:
> > 
> >   (1) discontiguous memory with IOMMU
> >   (2) contiguous memory without IOMMU(called "carveout" in general?)
> ...
> > For (2), although memory is mostly anonymous one, we may need to know
> > how much to allocate, where we only need "size". This size is not from
> > h/w feature, but it depends on the system load/usage. So I think that
> > this size can be passed from kernel command line? For exmaple, we can
> > specify how much contiguous memory is necessary with putting
> > "coherent_pool=??M" in the kernel command line as below:
> > 
> > coherent_pool=nn[KMG]   [ARM,KNL]
> > Sets the size of memory pool for coherent, atomic dma
> > allocations.
> 
> I guess if that's the standard way of initializing CMA, then that's fine.
> 
> It'd be nice if there was a way to specify that from the DT too; that
> way the user/distro/bootloader constructing the kernel command-line
> wouldn't have to remember to add "random" (potentially
> Tegra-/board-specific) extra arguments onto the command-line; the Tegra
> command-line in the upstream kernel is quite clean right now, especially
> compare to the enormous number of options we require downstream:-(

DT can specify the kernel command-line as "bootargs" in a dts file,
which can be modified after extracted on memory by bootloader.


Tegra DRM device tree bindings

2012-06-28 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 19:56:35 +0200
Stephen Warren  wrote:

> On 06/27/2012 08:29 AM, Hiroshi Doyu wrote:
> > Could you explain a bit more why you want carveout size on per-board basis?
> 
> Different boards have different amounts of memory, and are sometimes
> targeted at different use-cases (e.g. server with simple display buffer,
> vs. consumer-oriented device intended to play games with OpenGL
> allocating lots of textures).

May I ask a bit? If the above requirement has been satisfied by one of
the kernel parameter in a commandline, wouldn't it be enough since DT
can also specify "bootargs"? Or using bootargs in dts isn't so encouraged?



Tegra DRM device tree bindings

2012-06-28 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 16:44:14 +0200
Thierry Reding  wrote:

> > I think that "coherent_pool" can be used only when the amount of
> > contiguous memory is short in your system. Otherwise even unnecessary.
> > 
> > Could you explain a bit more why you want carveout size on per-board basis?
> 
> In the ideal case I would want to not have a carveout size at all.
> However there may be situations where you need to make sure some driver
> can allocate a given amount of memory. Having to specify this using a
> kernel command-line parameter is cumbersome because it may require
> changes to the bootloader or whatever. So if you know that a particular
> board always needs 128 MiB of carveout, then it makes sense to specify
> it on a per-board basis.

Hm...I could understand somewhat;) but DT can also specify "bootargs"
in dts file, which can support per-board-wide spec too, like the above
sum of carveout needed from all drivers. I just want to avoid
introducing a new parameter additionaly if we can make use of the
existing mechanism.


Tegra DRM device tree bindings

2012-06-28 Thread Hiroshi Doyu
Hi Lucas,

On Wed, 27 Jun 2012 17:59:55 +0200
Lucas Stach  wrote:

> > > > > Rather than introducing a new property, how about using
> > > > > "coherent_pool=??M" in the kernel command line if necessary? I think
> > > > > that this carveout size depends on the system usage/load.
> > > > 
> > > > I was hoping that we could get away with using the CMA and perhaps
> > > > initialize it based on device tree content. I agree that the carveout
> > > > size depends on the use-case, but I still think it makes sense to
> > > > specify it on a per-board basis.
> > > 
> > > DRM driver doesn't know if it uses CMA or not, because DRM only uses
> > > DMA API.
> > 
> > So how is the DRM supposed to allocate buffers? Does it call the
> > dma_alloc_from_contiguous() function to do that? I can see how it is
> > used by arm_dma_ops but how does it end up in the driver?
> > 
> As I said before the DMA API is not a good fit for graphics drivers.
> Most of the DMA buffers used by graphics cores are long lived and big,
> so we need a special pool to alloc from to avoid eating all contiguous
> address space, as DMA API does not provide shrinker callbacks for
> clients using large amount of memory.

For contiguious address space shortage issue in DMA API, I think that
DMABUF framework can handle?

If so, is there any good example for DMABUF used in DRM?


Re: Tegra DRM device tree bindings

2012-06-28 Thread Stephen Warren
On 06/26/2012 11:07 PM, Thierry Reding wrote:
 On Tue, Jun 26, 2012 at 04:48:14PM -0600, Stephen Warren wrote:
...
 I actually like what you proposed above a lot, so if you don't
 mind either way I'll go with that proposal. Keeping the connector
 nodes as children of the outputs has the advantage of being able to
 reference them if we need it at some point. But it is also
 redundant in that a single output doesn't usually (never?) driver
 more than one connector.

Yes, I believe that each output is 1:1 with (the video portion of) a
connector. The display controllers obviously aren't 1:1.

 The same issue will have to be addressed for the CSI and VI nodes,
 but as I currently use neither of those I don't feel qualified to
 propose a binding for them. Also for the VI part we're completely
 missing documentation. Maybe somebody could push this to be
 released as well?

I did file a bug noting the request for VI documentation. At this
point in time, it's too early to say what, if anything, will come of that.

 If I understand correctly, most of the host1x children can also be 
 chained in a processing pipeline to do postprocessing an video
 input for example. I suppose that's generic and doesn't need to be
 represented in DT either, right?

Yes, I believe that's something internal to the driver.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Stephen Warren
On 06/27/2012 08:29 AM, Hiroshi Doyu wrote:
 Could you explain a bit more why you want carveout size on per-board basis?

Different boards have different amounts of memory, and are sometimes
targeted at different use-cases (e.g. server with simple display buffer,
vs. consumer-oriented device intended to play games with OpenGL
allocating lots of textures).
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Stephen Warren
On 06/27/2012 06:44 AM, Hiroshi Doyu wrote:
...
 I think that there are 2 cases:
 
   (1) discontiguous memory with IOMMU
   (2) contiguous memory without IOMMU(called carveout in general?)
...
 For (2), although memory is mostly anonymous one, we may need to know
 how much to allocate, where we only need size. This size is not from
 h/w feature, but it depends on the system load/usage. So I think that
 this size can be passed from kernel command line? For exmaple, we can
 specify how much contiguous memory is necessary with putting
 coherent_pool=??M in the kernel command line as below:
 
   coherent_pool=nn[KMG]   [ARM,KNL]
   Sets the size of memory pool for coherent, atomic dma
   allocations.

I guess if that's the standard way of initializing CMA, then that's fine.

It'd be nice if there was a way to specify that from the DT too; that
way the user/distro/bootloader constructing the kernel command-line
wouldn't have to remember to add random (potentially
Tegra-/board-specific) extra arguments onto the command-line; the Tegra
command-line in the upstream kernel is quite clean right now, especially
compare to the enormous number of options we require downstream:-(
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Hiroshi Doyu
Hi Lucas,

On Wed, 27 Jun 2012 17:59:55 +0200
Lucas Stach d...@lynxeye.de wrote:

 Rather than introducing a new property, how about using
 coherent_pool=??M in the kernel command line if necessary? I think
 that this carveout size depends on the system usage/load.

I was hoping that we could get away with using the CMA and perhaps
initialize it based on device tree content. I agree that the carveout
size depends on the use-case, but I still think it makes sense to
specify it on a per-board basis.
   
   DRM driver doesn't know if it uses CMA or not, because DRM only uses
   DMA API.
  
  So how is the DRM supposed to allocate buffers? Does it call the
  dma_alloc_from_contiguous() function to do that? I can see how it is
  used by arm_dma_ops but how does it end up in the driver?
  
 As I said before the DMA API is not a good fit for graphics drivers.
 Most of the DMA buffers used by graphics cores are long lived and big,
 so we need a special pool to alloc from to avoid eating all contiguous
 address space, as DMA API does not provide shrinker callbacks for
 clients using large amount of memory.

For contiguious address space shortage issue in DMA API, I think that
DMABUF framework can handle?

If so, is there any good example for DMABUF used in DRM?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 16:44:14 +0200
Thierry Reding thierry.red...@avionic-design.de wrote:

  I think that coherent_pool can be used only when the amount of
  contiguous memory is short in your system. Otherwise even unnecessary.
  
  Could you explain a bit more why you want carveout size on per-board basis?
 
 In the ideal case I would want to not have a carveout size at all.
 However there may be situations where you need to make sure some driver
 can allocate a given amount of memory. Having to specify this using a
 kernel command-line parameter is cumbersome because it may require
 changes to the bootloader or whatever. So if you know that a particular
 board always needs 128 MiB of carveout, then it makes sense to specify
 it on a per-board basis.

Hm...I could understand somewhat;) but DT can also specify bootargs
in dts file, which can support per-board-wide spec too, like the above
sum of carveout needed from all drivers. I just want to avoid
introducing a new parameter additionaly if we can make use of the
existing mechanism.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 19:56:35 +0200
Stephen Warren swar...@wwwdotorg.org wrote:

 On 06/27/2012 08:29 AM, Hiroshi Doyu wrote:
  Could you explain a bit more why you want carveout size on per-board basis?
 
 Different boards have different amounts of memory, and are sometimes
 targeted at different use-cases (e.g. server with simple display buffer,
 vs. consumer-oriented device intended to play games with OpenGL
 allocating lots of textures).

May I ask a bit? If the above requirement has been satisfied by one of
the kernel parameter in a commandline, wouldn't it be enough since DT
can also specify bootargs? Or using bootargs in dts isn't so encouraged?

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 20:02:46 +0200
Stephen Warren swar...@wwwdotorg.org wrote:

 On 06/27/2012 06:44 AM, Hiroshi Doyu wrote:
 ...
  I think that there are 2 cases:
  
(1) discontiguous memory with IOMMU
(2) contiguous memory without IOMMU(called carveout in general?)
 ...
  For (2), although memory is mostly anonymous one, we may need to know
  how much to allocate, where we only need size. This size is not from
  h/w feature, but it depends on the system load/usage. So I think that
  this size can be passed from kernel command line? For exmaple, we can
  specify how much contiguous memory is necessary with putting
  coherent_pool=??M in the kernel command line as below:
  
  coherent_pool=nn[KMG]   [ARM,KNL]
  Sets the size of memory pool for coherent, atomic dma
  allocations.
 
 I guess if that's the standard way of initializing CMA, then that's fine.
 
 It'd be nice if there was a way to specify that from the DT too; that
 way the user/distro/bootloader constructing the kernel command-line
 wouldn't have to remember to add random (potentially
 Tegra-/board-specific) extra arguments onto the command-line; the Tegra
 command-line in the upstream kernel is quite clean right now, especially
 compare to the enormous number of options we require downstream:-(

DT can specify the kernel command-line as bootargs in a dts file,
which can be modified after extracted on memory by bootloader.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Thierry Reding
On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
 Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
  On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
   On Wed, 27 Jun 2012 16:08:10 +0200
   Thierry Reding thierry.red...@avionic-design.de wrote:
   
* PGP Signed by an unknown key

On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
 On Wed, 27 Jun 2012 07:14:18 +0200
 Thierry Reding thierry.red...@avionic-design.de wrote:
 
   Old Signed by an unknown key
  
  On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
   On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200
Thierry Reding thierry.red...@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout 
would be
obtained from the IOMMU API, though.
   
I think that can be similar with current gart implementation. 
Define carveout as:
   
carveout {
compatible = nvidia,tegra20-carveout;
size = 0x1000;
};
   
Then create a file such like tegra-carveout.c to get these 
definitions and
register itself as platform device's iommu instance.
   
The carveout isn't a HW object, so it doesn't seem appropriate 
to define a DT
node to represent it.

Yes. But I think it's better to export the size of carveout as 
a configurable item.
So we need to define this somewhere. How about define carveout 
as a property of gart?
   
   There already exists a way of preventing Linux from using certain 
   chunks
   of memory; the /memreserve/ syntax. From a brief look at the dtc 
   source,
   it looks like /memreserve/ entries can have labels, which implies 
   that a
   property in the GART node could refer to the /memreserve/ entry by
   phandle in order to know what memory regions to use.
  
  Wasn't the whole point of using a carveout supposed to be a 
  replacement
  for the GART?
 
 Mostly agree. IIUC, we use both carveout/gart allocated buffers in
 android/tegra2.
 
 As such I'd think the carveout should rather be a property
  of the host1x device.
 
 Rather than introducing a new property, how about using
 coherent_pool=??M in the kernel command line if necessary? I think
 that this carveout size depends on the system usage/load.

I was hoping that we could get away with using the CMA and perhaps
initialize it based on device tree content. I agree that the carveout
size depends on the use-case, but I still think it makes sense to
specify it on a per-board basis.
   
   DRM driver doesn't know if it uses CMA or not, because DRM only uses
   DMA API.
  
  So how is the DRM supposed to allocate buffers? Does it call the
  dma_alloc_from_contiguous() function to do that? I can see how it is
  used by arm_dma_ops but how does it end up in the driver?
  
 As I said before the DMA API is not a good fit for graphics drivers.
 Most of the DMA buffers used by graphics cores are long lived and big,
 so we need a special pool to alloc from to avoid eating all contiguous
 address space, as DMA API does not provide shrinker callbacks for
 clients using large amount of memory.

I recall you mentioning TTM as a better alternative several times in the
past. How does it fit in with this? Does it have the capability of using
a predefined chunk of contiguous memory as a pool to allocate from?

One problem that all of these solutions don't address is that not all
devices below host1x are DRM related. At least for the CSI and VI blocks
I expect there to be V4L2 drivers eventually, so what we really need is
to manage allocations outside of the DRM. host1x is the most logical
choice here.

Perhaps we can put host1x code somewhere below drivers/gpu (mm
subdirectory?), drivers/memory or perhaps some other or new location
that could eventually host similar drivers for other SoCs.

Then again, maybe it'd be easier for now to put everything below the
drivers/gpu/drm/tegra directory and cross that bridge when we get to it.

   I think that coherent_pool can be used only when the amount of
   contiguous memory is short in your system. Otherwise even unnecessary.
   
   Could you explain a bit more why you want carveout size on per-board 
   basis?
  
  In the ideal case I would want to not have a carveout size at all.
  However there may be situations where you need to make sure some driver
  can allocate a given amount of memory. Having to specify this using a
  kernel command-line parameter is cumbersome because it may require
  changes to the bootloader or whatever. So if you know that a particular
  board always needs 128 MiB of carveout, then it makes sense to 

Re: Tegra DRM device tree bindings

2012-06-28 Thread Thierry Reding
On Wed, Jun 27, 2012 at 12:02:46PM -0600, Stephen Warren wrote:
 On 06/27/2012 06:44 AM, Hiroshi Doyu wrote:
 ...
  I think that there are 2 cases:
  
(1) discontiguous memory with IOMMU
(2) contiguous memory without IOMMU(called carveout in general?)
 ...
  For (2), although memory is mostly anonymous one, we may need to know
  how much to allocate, where we only need size. This size is not from
  h/w feature, but it depends on the system load/usage. So I think that
  this size can be passed from kernel command line? For exmaple, we can
  specify how much contiguous memory is necessary with putting
  coherent_pool=??M in the kernel command line as below:
  
  coherent_pool=nn[KMG]   [ARM,KNL]
  Sets the size of memory pool for coherent, atomic dma
  allocations.
 
 I guess if that's the standard way of initializing CMA, then that's fine.
 
 It'd be nice if there was a way to specify that from the DT too; that
 way the user/distro/bootloader constructing the kernel command-line
 wouldn't have to remember to add random (potentially
 Tegra-/board-specific) extra arguments onto the command-line; the Tegra
 command-line in the upstream kernel is quite clean right now, especially
 compare to the enormous number of options we require downstream:-(

Looking at Documentation/kernel-parameters.txt it seems the canonical
way to initialize CMA is using the cma kernel command-line parameter.
For device tree we could extend the chosen node to include something
like carveout = 0x0400;. Or contiguous-memory or whatever.

Thierry


pgpzeDfXvNnQ1.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Thierry Reding
On Wed, Jun 27, 2012 at 11:49:43AM -0600, Stephen Warren wrote:
 On 06/26/2012 11:07 PM, Thierry Reding wrote:
  On Tue, Jun 26, 2012 at 04:48:14PM -0600, Stephen Warren wrote:
 ...
  I actually like what you proposed above a lot, so if you don't
  mind either way I'll go with that proposal. Keeping the connector
  nodes as children of the outputs has the advantage of being able to
  reference them if we need it at some point. But it is also
  redundant in that a single output doesn't usually (never?) driver
  more than one connector.
 
 Yes, I believe that each output is 1:1 with (the video portion of) a
 connector. The display controllers obviously aren't 1:1.

Yes, the display controllers are only 1:1 for the RGB outputs. I'll
merge your proposed changes and see if I can come up with some code to
parse it.

  The same issue will have to be addressed for the CSI and VI nodes,
  but as I currently use neither of those I don't feel qualified to
  propose a binding for them. Also for the VI part we're completely
  missing documentation. Maybe somebody could push this to be
  released as well?
 
 I did file a bug noting the request for VI documentation. At this
 point in time, it's too early to say what, if anything, will come of that.

I think we have some raw register documentation for VI but it's next to
impossible to really understand the hardware block just by looking at
the registers.

  If I understand correctly, most of the host1x children can also be 
  chained in a processing pipeline to do postprocessing an video
  input for example. I suppose that's generic and doesn't need to be
  represented in DT either, right?
 
 Yes, I believe that's something internal to the driver.

Okay. So I think apart from the carveout topic in the other subthread I
think we've settled on solutions for the remaining points. I'll try to
find some time to work all the changes into a new binding proposal and
get working on the code.

Thierry


pgpDLvuUbZ9fm.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Lucas Stach
Hi Thierry,

Am Donnerstag, den 28.06.2012, 13:12 +0200 schrieb Thierry Reding:
 On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
  Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
   On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 16:08:10 +0200
Thierry Reding thierry.red...@avionic-design.de wrote:

 * PGP Signed by an unknown key
 
 On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
  On Wed, 27 Jun 2012 07:14:18 +0200
  Thierry Reding thierry.red...@avionic-design.de wrote:
  
Old Signed by an unknown key
   
   On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
 On 06/26/2012 07:46 PM, Mark Zhang wrote:
 On Tue, 26 Jun 2012 12:55:13 +0200
 Thierry Reding thierry.red...@avionic-design.de wrote:
 ...
 I'm not sure I understand how information about the 
 carveout would be
 obtained from the IOMMU API, though.

 I think that can be similar with current gart 
 implementation. Define carveout as:

 carveout {
 compatible = nvidia,tegra20-carveout;
 size = 0x1000;
 };

 Then create a file such like tegra-carveout.c to get 
 these definitions and
 register itself as platform device's iommu instance.

 The carveout isn't a HW object, so it doesn't seem 
 appropriate to define a DT
 node to represent it.
 
 Yes. But I think it's better to export the size of carveout 
 as a configurable item.
 So we need to define this somewhere. How about define 
 carveout as a property of gart?

There already exists a way of preventing Linux from using 
certain chunks
of memory; the /memreserve/ syntax. From a brief look at the 
dtc source,
it looks like /memreserve/ entries can have labels, which 
implies that a
property in the GART node could refer to the /memreserve/ entry 
by
phandle in order to know what memory regions to use.
   
   Wasn't the whole point of using a carveout supposed to be a 
   replacement
   for the GART?
  
  Mostly agree. IIUC, we use both carveout/gart allocated buffers in
  android/tegra2.
  
  As such I'd think the carveout should rather be a property
   of the host1x device.
  
  Rather than introducing a new property, how about using
  coherent_pool=??M in the kernel command line if necessary? I think
  that this carveout size depends on the system usage/load.
 
 I was hoping that we could get away with using the CMA and perhaps
 initialize it based on device tree content. I agree that the carveout
 size depends on the use-case, but I still think it makes sense to
 specify it on a per-board basis.

DRM driver doesn't know if it uses CMA or not, because DRM only uses
DMA API.
   
   So how is the DRM supposed to allocate buffers? Does it call the
   dma_alloc_from_contiguous() function to do that? I can see how it is
   used by arm_dma_ops but how does it end up in the driver?
   
  As I said before the DMA API is not a good fit for graphics drivers.
  Most of the DMA buffers used by graphics cores are long lived and big,
  so we need a special pool to alloc from to avoid eating all contiguous
  address space, as DMA API does not provide shrinker callbacks for
  clients using large amount of memory.
 
 I recall you mentioning TTM as a better alternative several times in the
 past. How does it fit in with this? Does it have the capability of using
 a predefined chunk of contiguous memory as a pool to allocate from?
 
 One problem that all of these solutions don't address is that not all
 devices below host1x are DRM related. At least for the CSI and VI blocks
 I expect there to be V4L2 drivers eventually, so what we really need is
 to manage allocations outside of the DRM. host1x is the most logical
 choice here.

I think you are right here. We might want to move all those
buffer/memory management in the host1x code and provide contig memory to
the host1x clients from there.

TTM has the ability to manage a chunk of memory for contig allocations.
Also I think TTM does not depend too heavily on DRM, so we may even be
able to use TTM as the general allocator for host1x clients, including
VI and others. The more advanced stuff in TTM like swapping and moving
buffers might be a bit of overkill for simple stuff like V4L, where you
basically just want something like: give me a contig buffer and pin it
in address space so it won't ever move, but it should do no harm.

 
 Perhaps we can put host1x code somewhere below drivers/gpu (mm
 subdirectory?), drivers/memory or perhaps some other or new location
 that could 

Re: Tegra DRM device tree bindings

2012-06-28 Thread Stephen Warren
On 06/28/2012 12:18 AM, Hiroshi Doyu wrote:
 On Wed, 27 Jun 2012 16:44:14 +0200
 Thierry Reding thierry.red...@avionic-design.de wrote:
 
 I think that coherent_pool can be used only when the amount of
 contiguous memory is short in your system. Otherwise even unnecessary.

 Could you explain a bit more why you want carveout size on per-board basis?

 In the ideal case I would want to not have a carveout size at all.
 However there may be situations where you need to make sure some driver
 can allocate a given amount of memory. Having to specify this using a
 kernel command-line parameter is cumbersome because it may require
 changes to the bootloader or whatever. So if you know that a particular
 board always needs 128 MiB of carveout, then it makes sense to specify
 it on a per-board basis.
 
 Hm...I could understand somewhat;) but DT can also specify bootargs
 in dts file, which can support per-board-wide spec too, like the above
 sum of carveout needed from all drivers. I just want to avoid
 introducing a new parameter additionaly if we can make use of the
 existing mechanism.

The bootargs in the DT file is usually provided by (over-written by) the
bootloader. If we start requiring lots of random kernel command-line
arguments, that makes it more effort for the user of the bootloader
(e.g. distribution bootloader scripts, etc.) to create the kernel
command-line. I'd prefer to avoid that as much as possible. That said,
using a standardized command-line option that is (or will be) used by
all (ARM?) SoCs for the same purpose is reasonable, because there's
commonality there.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Stephen Warren
On 06/28/2012 05:12 AM, Thierry Reding wrote:
 On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
 Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
...
 In the ideal case I would want to not have a carveout size at
 all. However there may be situations where you need to make
 sure some driver can allocate a given amount of memory. Having
 to specify this using a kernel command-line parameter is
 cumbersome because it may require changes to the bootloader or
 whatever. So if you know that a particular board always needs
 128 MiB of carveout, then it makes sense to specify it on a
 per-board basis.
 
 If we go with CMA, this is a non-issue, as CMA allows to use the
 contig area for normal allocations and only purges them if it
 really needs the space for contig allocs.
 
 CMA certainly sounds like the most simple approach. While it may
 not be suited for 3D graphics or multimedia processing later on, I
 think we could use it at a starting point to get basic framebuffer
 and X support up and running. We can always move to something more
 advanced like TTM later.

I thought the whole purpose of CMA was to act as the infra-structure
to provide buffers to 3D, camera, etc. in particular allowing sharing
of buffers between them. In other words, isn't CMA the memory manager?
If there's some deficiency with CMA for 3D graphics, it seems like
that should be raised with those designing CMA. Or, am I way off base
with my expectations of CMA?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Lucas Stach
Am Donnerstag, den 28.06.2012, 10:51 -0600 schrieb Stephen Warren:
 On 06/28/2012 05:12 AM, Thierry Reding wrote:
  On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
  Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
 ...
  In the ideal case I would want to not have a carveout size at
  all. However there may be situations where you need to make
  sure some driver can allocate a given amount of memory. Having
  to specify this using a kernel command-line parameter is
  cumbersome because it may require changes to the bootloader or
  whatever. So if you know that a particular board always needs
  128 MiB of carveout, then it makes sense to specify it on a
  per-board basis.
  
  If we go with CMA, this is a non-issue, as CMA allows to use the
  contig area for normal allocations and only purges them if it
  really needs the space for contig allocs.
  
  CMA certainly sounds like the most simple approach. While it may
  not be suited for 3D graphics or multimedia processing later on, I
  think we could use it at a starting point to get basic framebuffer
  and X support up and running. We can always move to something more
  advanced like TTM later.
 
 I thought the whole purpose of CMA was to act as the infra-structure
 to provide buffers to 3D, camera, etc. in particular allowing sharing
 of buffers between them. In other words, isn't CMA the memory manager?
 If there's some deficiency with CMA for 3D graphics, it seems like
 that should be raised with those designing CMA. Or, am I way off base
 with my expectations of CMA?
 
CMA is just a way of providing large contiguous address space blocks in
a dynamic fashion. The problem CMA solves is: we have a system with
relatively low amounts of sysmem (like 512MB), now to ensure we can
always get large contiguous buffers for use by GPU or VIDEO blocks, we
need to set aside a relatively large contiguous pool (like 128MB). So we
are stealing 128MB of memory from the system while we may or may not use
it, which is bad.
Now CMA allows to say: I may need 128MB of contig space, but the system
is free to use it as normal memory as long as I don't really need it. If
the space is really needed, CMA purges pages from the area and may even
swap them out. So yes CMA is a memory allocator for contig memory.

TTM though solves more advanced matters, like buffer synchronisation
between 3D and 2D block of hardware or syncing buffer access between GPU
and CPU.
One of the most interesting things of TTM is the ability to purge the
GPU DMA buffers to scattered sysmem or even swap them out, if they are
not currently used by the GPU. It then makes sure to move them in the
contig space again when the GPU really needs them and fix up the GPU
command stream with the new buffer address.

IMHO the best solution would be to use CMA as a flexible replacement of
the static carveout area and put TTM on top of this to solve the needs
of graphics drivers. We certainly don't want to reinvent the wheel
inside CMA. We have solutions for all those things in the kernel right
now, we just have to glue them together in a sane way.

Thanks,
Lucas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Thierry Reding
On Thu, Jun 28, 2012 at 11:33:56AM -0600, Stephen Warren wrote:
 On 06/28/2012 11:19 AM, Lucas Stach wrote:
 ...
  CMA is just a way of providing large contiguous address space blocks in
  a dynamic fashion. ...
  
  TTM though solves more advanced matters, like buffer synchronisation
  between 3D and 2D block of hardware ...
  
  IMHO the best solution would be to use CMA as a flexible replacement of
  the static carveout area and put TTM on top of this ...
 
 Ah right, thanks for the explanation. That makes sense to me now.

Okay. I think that resolves the last open issue. I'll try to get some
more work done on the DT and corresponding code soonish.

For those who don't know yet I've requested the creation of a project
on freedesktop.org for Tegra graphics drivers[0]. I plan to have the DRM
code hosted there once the project has been approved. Furthermore if we
ever get to write a corresponding X driver it can be hosted there as
well. We should also use the wiki for coordination once things get
started.

Thierry

[0]: https://bugs.freedesktop.org/show_bug.cgi?id=51505


pgpJieC5znIvt.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: Tegra DRM device tree bindings

2012-06-28 Thread Mark Zhang
 Am Donnerstag, den 28.06.2012, 10:51 -0600 schrieb Stephen Warren:
  On 06/28/2012 05:12 AM, Thierry Reding wrote:
   On Wed, Jun 27, 2012 at 05:59:55PM +0200, Lucas Stach wrote:
   Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
  ...
   In the ideal case I would want to not have a carveout size at all.
   However there may be situations where you need to make sure some
   driver can allocate a given amount of memory. Having to specify
   this using a kernel command-line parameter is cumbersome because
   it may require changes to the bootloader or whatever. So if you
   know that a particular board always needs
   128 MiB of carveout, then it makes sense to specify it on a
   per-board basis.
  
   If we go with CMA, this is a non-issue, as CMA allows to use the
   contig area for normal allocations and only purges them if it
   really needs the space for contig allocs.
  
   CMA certainly sounds like the most simple approach. While it may not
   be suited for 3D graphics or multimedia processing later on, I think
   we could use it at a starting point to get basic framebuffer and X
   support up and running. We can always move to something more
   advanced like TTM later.
 
  I thought the whole purpose of CMA was to act as the infra-structure
  to provide buffers to 3D, camera, etc. in particular allowing sharing
  of buffers between them. In other words, isn't CMA the memory manager?
  If there's some deficiency with CMA for 3D graphics, it seems like
  that should be raised with those designing CMA. Or, am I way off base
  with my expectations of CMA?
 
 CMA is just a way of providing large contiguous address space blocks in a 
 dynamic
 fashion. The problem CMA solves is: we have a system with relatively low
 amounts of sysmem (like 512MB), now to ensure we can always get large
 contiguous buffers for use by GPU or VIDEO blocks, we need to set aside a
 relatively large contiguous pool (like 128MB). So we are stealing 128MB of
 memory from the system while we may or may not use it, which is bad.
 Now CMA allows to say: I may need 128MB of contig space, but the system is 
 free
 to use it as normal memory as long as I don't really need it. If the space is 
 really
 needed, CMA purges pages from the area and may even swap them out. So yes
 CMA is a memory allocator for contig memory.
 
 TTM though solves more advanced matters, like buffer synchronisation between
 3D and 2D block of hardware or syncing buffer access between GPU and CPU.
 One of the most interesting things of TTM is the ability to purge the GPU DMA
 buffers to scattered sysmem or even swap them out, if they are not currently
 used by the GPU. It then makes sure to move them in the contig space again
 when the GPU really needs them and fix up the GPU command stream with the
 new buffer address.
 
 IMHO the best solution would be to use CMA as a flexible replacement of the
 static carveout area and put TTM on top of this to solve the needs of graphics
 drivers. We certainly don't want to reinvent the wheel inside CMA. We have
 solutions for all those things in the kernel right now, we just have to glue 
 them
 together in a sane way.
 

That is a great explanation. So could you explain what's the relation between 
IOMMU api 
and TTM(or GEM)? 
Terje said DMABUF api sits on top of IOMMU api. So for normal device 
drivers(such as drm), 
can forget iommu apis, just use dmabuf api is OK. If so, I wanna know does 
TTM/GEM and IOMMU are related? Or TTM/GEM uses dmabuf apis which calls iommu 
api 
to do memory allocation/mapping?


 Thanks,
 Lucas
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-tegra in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Lucas Stach
Hi all,

I'm not sure what your exact plans are for the direction in which the
DRM driver should head, as I'm still a bit out of the loop as many of
those matters were only discussed internally at NVIDIA or with some NDA
developers. But I'll still try to get into the discussion.

Am Mittwoch, den 27.06.2012, 07:14 +0200 schrieb Thierry Reding:
 On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
  On 06/26/2012 08:32 PM, Mark Zhang wrote:
   On 06/26/2012 07:46 PM, Mark Zhang wrote:
   On Tue, 26 Jun 2012 12:55:13 +0200
   Thierry Reding thierry.red...@avionic-design.de wrote:
   ...
   I'm not sure I understand how information about the carveout would be
   obtained from the IOMMU API, though.
  
   I think that can be similar with current gart implementation. Define 
   carveout as:
  
   carveout {
   compatible = nvidia,tegra20-carveout;
   size = 0x1000;
   };
  
   Then create a file such like tegra-carveout.c to get these 
   definitions and
   register itself as platform device's iommu instance.
  
   The carveout isn't a HW object, so it doesn't seem appropriate to define 
   a DT
   node to represent it.
   
   Yes. But I think it's better to export the size of carveout as a 
   configurable item.
   So we need to define this somewhere. How about define carveout as a 
   property of gart?
  
  There already exists a way of preventing Linux from using certain chunks
  of memory; the /memreserve/ syntax. From a brief look at the dtc source,
  it looks like /memreserve/ entries can have labels, which implies that a
  property in the GART node could refer to the /memreserve/ entry by
  phandle in order to know what memory regions to use.
 
 Wasn't the whole point of using a carveout supposed to be a replacement
 for the GART? As such I'd think the carveout should rather be a property
 of the host1x device.
 
In my understanding carveout is neither a hardware nor software
component. It's just a somewhat special pool of memory. As I pointed out
in one of the older mails, carveout can not completely replace GART.
While normal allocations for graphics use should be done contiguous,
GART allows us to link normal scattered sysram buffers into GPU address
space, which is a nice thing to have.
IMHO if carveout is to be used exclusively by the GPU (i.e. the DRM
driver) it should be a property of the host1x device.

 AIUI what we want to do is have a large contiguous region of memory that
 a central component (host1x) manages as a pool from which clients (DRM,
 V4L, ...) can allocate buffers as needed. Since all of this memory will
 be contiguous anyway there isn't much use for the GART anymore.
 
I think this is the wrong way to go. Having a special memory pool
managed by some driver adds one more allocator to the kernel, which is
clearly not desirable. If we want a special mem region for GPU use, we
should not share this memory pool with other components.

But if we want a mem region for contig allocations used by many
components, which seems to be consensus here, CMA is the way to go. In
this case I think we don't want to bother with the carveout property at
all at the DRM driver level. Such a shared mem region managed by CMA
should be defined at a higher level of the device tree.

Thanks,
Lucas


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-28 Thread Lucas Stach
Am Donnerstag, den 28.06.2012, 09:06 +0300 schrieb Hiroshi Doyu:
 Hi Lucas,
 
 On Wed, 27 Jun 2012 17:59:55 +0200
 Lucas Stach d...@lynxeye.de wrote:
 
  Rather than introducing a new property, how about using
  coherent_pool=??M in the kernel command line if necessary? I think
  that this carveout size depends on the system usage/load.
 
 I was hoping that we could get away with using the CMA and perhaps
 initialize it based on device tree content. I agree that the carveout
 size depends on the use-case, but I still think it makes sense to
 specify it on a per-board basis.

DRM driver doesn't know if it uses CMA or not, because DRM only uses
DMA API.
   
   So how is the DRM supposed to allocate buffers? Does it call the
   dma_alloc_from_contiguous() function to do that? I can see how it is
   used by arm_dma_ops but how does it end up in the driver?
   
  As I said before the DMA API is not a good fit for graphics drivers.
  Most of the DMA buffers used by graphics cores are long lived and big,
  so we need a special pool to alloc from to avoid eating all contiguous
  address space, as DMA API does not provide shrinker callbacks for
  clients using large amount of memory.
 
 For contiguious address space shortage issue in DMA API, I think that
 DMABUF framework can handle?
 
No, DMABUF is only about sharing DMA buffers between different hardware
blocks. It has no address space management at all.

All DRM drivers manage their address space on their own, either through
GEM or TTM. On the desktop the main pool for contiguous memory is on
card VRAM or a area of system memory set aside for use by the DRM
driver. Having a carveout area only for DRM use would be the same thing,
but makes the split between graphics and system memory a bit unflexible.
Currently we have no sane way for integrating DRM memory managers with
the normal DMA API. There was some discussion about DMA pools and/or
shrinkers for DMA clients, but they have not led to any written code.

DMABUF only makes a limited integration between for example V4L and DRM
possible, so they can share buffers. But we still have the situation
that DRM devices allocate from their own pool, for explained reasons and
not use the standard DMA API.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Tegra DRM device tree bindings

2012-06-27 Thread Lucas Stach
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
> On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
> > On Wed, 27 Jun 2012 16:08:10 +0200
> > Thierry Reding  wrote:
> > 
> > > * PGP Signed by an unknown key
> > > 
> > > On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
> > > > On Wed, 27 Jun 2012 07:14:18 +0200
> > > > Thierry Reding  wrote:
> > > > 
> > > > > > Old Signed by an unknown key
> > > > > 
> > > > > On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > > > > > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > > > > > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > > > > > On Tue, 26 Jun 2012 12:55:13 +0200
> > > > > > > Thierry Reding  wrote:
> > > > > > >> ...
> > > > > >  I'm not sure I understand how information about the carveout 
> > > > > >  would be
> > > > > >  obtained from the IOMMU API, though.
> > > > > > >>>
> > > > > > >>> I think that can be similar with current gart implementation. 
> > > > > > >>> Define carveout as:
> > > > > > >>>
> > > > > > >>> carveout {
> > > > > > >>> compatible = "nvidia,tegra20-carveout";
> > > > > > >>> size = <0x1000>;
> > > > > > >>> };
> > > > > > >>>
> > > > > > >>> Then create a file such like "tegra-carveout.c" to get these 
> > > > > > >>> definitions and
> > > > > > >> register itself as platform device's iommu instance.
> > > > > > >>
> > > > > > >> The carveout isn't a HW object, so it doesn't seem appropriate 
> > > > > > >> to define a DT
> > > > > > >> node to represent it.
> > > > > > > 
> > > > > > > Yes. But I think it's better to export the size of carveout as a 
> > > > > > > configurable item.
> > > > > > > So we need to define this somewhere. How about define carveout as 
> > > > > > > a property of gart?
> > > > > > 
> > > > > > There already exists a way of preventing Linux from using certain 
> > > > > > chunks
> > > > > > of memory; the /memreserve/ syntax. From a brief look at the dtc 
> > > > > > source,
> > > > > > it looks like /memreserve/ entries can have labels, which implies 
> > > > > > that a
> > > > > > property in the GART node could refer to the /memreserve/ entry by
> > > > > > phandle in order to know what memory regions to use.
> > > > > 
> > > > > Wasn't the whole point of using a carveout supposed to be a 
> > > > > replacement
> > > > > for the GART?
> > > > 
> > > > Mostly agree. IIUC, we use both carveout/gart allocated buffers in
> > > > android/tegra2.
> > > > 
> > > > >As such I'd think the carveout should rather be a property
> > > > > of the host1x device.
> > > > 
> > > > Rather than introducing a new property, how about using
> > > > "coherent_pool=??M" in the kernel command line if necessary? I think
> > > > that this carveout size depends on the system usage/load.
> > > 
> > > I was hoping that we could get away with using the CMA and perhaps
> > > initialize it based on device tree content. I agree that the carveout
> > > size depends on the use-case, but I still think it makes sense to
> > > specify it on a per-board basis.
> > 
> > DRM driver doesn't know if it uses CMA or not, because DRM only uses
> > DMA API.
> 
> So how is the DRM supposed to allocate buffers? Does it call the
> dma_alloc_from_contiguous() function to do that? I can see how it is
> used by arm_dma_ops but how does it end up in the driver?
> 
As I said before the DMA API is not a good fit for graphics drivers.
Most of the DMA buffers used by graphics cores are long lived and big,
so we need a special pool to alloc from to avoid eating all contiguous
address space, as DMA API does not provide shrinker callbacks for
clients using large amount of memory.

> > I think that "coherent_pool" can be used only when the amount of
> > contiguous memory is short in your system. Otherwise even unnecessary.
> > 
> > Could you explain a bit more why you want carveout size on per-board basis?
> 
> In the ideal case I would want to not have a carveout size at all.
> However there may be situations where you need to make sure some driver
> can allocate a given amount of memory. Having to specify this using a
> kernel command-line parameter is cumbersome because it may require
> changes to the bootloader or whatever. So if you know that a particular
> board always needs 128 MiB of carveout, then it makes sense to specify
> it on a per-board basis.

If we go with CMA, this is a non-issue, as CMA allows to use the contig
area for normal allocations and only purges them if it really needs the
space for contig allocs.
> 
> Thierry




Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 16:08:10 +0200
Thierry Reding  wrote:

> * PGP Signed by an unknown key
> 
> On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
> > On Wed, 27 Jun 2012 07:14:18 +0200
> > Thierry Reding  wrote:
> > 
> > > > Old Signed by an unknown key
> > > 
> > > On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > > > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > > > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > > > On Tue, 26 Jun 2012 12:55:13 +0200
> > > > > Thierry Reding  wrote:
> > > > >> ...
> > > >  I'm not sure I understand how information about the carveout would 
> > > >  be
> > > >  obtained from the IOMMU API, though.
> > > > >>>
> > > > >>> I think that can be similar with current gart implementation. 
> > > > >>> Define carveout as:
> > > > >>>
> > > > >>> carveout {
> > > > >>> compatible = "nvidia,tegra20-carveout";
> > > > >>> size = <0x1000>;
> > > > >>> };
> > > > >>>
> > > > >>> Then create a file such like "tegra-carveout.c" to get these 
> > > > >>> definitions and
> > > > >> register itself as platform device's iommu instance.
> > > > >>
> > > > >> The carveout isn't a HW object, so it doesn't seem appropriate to 
> > > > >> define a DT
> > > > >> node to represent it.
> > > > > 
> > > > > Yes. But I think it's better to export the size of carveout as a 
> > > > > configurable item.
> > > > > So we need to define this somewhere. How about define carveout as a 
> > > > > property of gart?
> > > > 
> > > > There already exists a way of preventing Linux from using certain chunks
> > > > of memory; the /memreserve/ syntax. From a brief look at the dtc source,
> > > > it looks like /memreserve/ entries can have labels, which implies that a
> > > > property in the GART node could refer to the /memreserve/ entry by
> > > > phandle in order to know what memory regions to use.
> > > 
> > > Wasn't the whole point of using a carveout supposed to be a replacement
> > > for the GART?
> > 
> > Mostly agree. IIUC, we use both carveout/gart allocated buffers in
> > android/tegra2.
> > 
> > >As such I'd think the carveout should rather be a property
> > > of the host1x device.
> > 
> > Rather than introducing a new property, how about using
> > "coherent_pool=??M" in the kernel command line if necessary? I think
> > that this carveout size depends on the system usage/load.
> 
> I was hoping that we could get away with using the CMA and perhaps
> initialize it based on device tree content. I agree that the carveout
> size depends on the use-case, but I still think it makes sense to
> specify it on a per-board basis.

DRM driver doesn't know if it uses CMA or not, because DRM only uses
DMA API. I think that "coherent_pool" can be used only when the amount
of contiguous memory is short in your system. Otherwise even unnecessary.

Could you explain a bit more why you want carveout size on per-board basis?


Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
> On Wed, 27 Jun 2012 16:08:10 +0200
> Thierry Reding  wrote:
> 
> > * PGP Signed by an unknown key
> > 
> > On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
> > > On Wed, 27 Jun 2012 07:14:18 +0200
> > > Thierry Reding  wrote:
> > > 
> > > > > Old Signed by an unknown key
> > > > 
> > > > On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > > > > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > > > > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > > > > On Tue, 26 Jun 2012 12:55:13 +0200
> > > > > > Thierry Reding  wrote:
> > > > > >> ...
> > > > >  I'm not sure I understand how information about the carveout 
> > > > >  would be
> > > > >  obtained from the IOMMU API, though.
> > > > > >>>
> > > > > >>> I think that can be similar with current gart implementation. 
> > > > > >>> Define carveout as:
> > > > > >>>
> > > > > >>> carveout {
> > > > > >>> compatible = "nvidia,tegra20-carveout";
> > > > > >>> size = <0x1000>;
> > > > > >>> };
> > > > > >>>
> > > > > >>> Then create a file such like "tegra-carveout.c" to get these 
> > > > > >>> definitions and
> > > > > >> register itself as platform device's iommu instance.
> > > > > >>
> > > > > >> The carveout isn't a HW object, so it doesn't seem appropriate to 
> > > > > >> define a DT
> > > > > >> node to represent it.
> > > > > > 
> > > > > > Yes. But I think it's better to export the size of carveout as a 
> > > > > > configurable item.
> > > > > > So we need to define this somewhere. How about define carveout as a 
> > > > > > property of gart?
> > > > > 
> > > > > There already exists a way of preventing Linux from using certain 
> > > > > chunks
> > > > > of memory; the /memreserve/ syntax. From a brief look at the dtc 
> > > > > source,
> > > > > it looks like /memreserve/ entries can have labels, which implies 
> > > > > that a
> > > > > property in the GART node could refer to the /memreserve/ entry by
> > > > > phandle in order to know what memory regions to use.
> > > > 
> > > > Wasn't the whole point of using a carveout supposed to be a replacement
> > > > for the GART?
> > > 
> > > Mostly agree. IIUC, we use both carveout/gart allocated buffers in
> > > android/tegra2.
> > > 
> > > >As such I'd think the carveout should rather be a property
> > > > of the host1x device.
> > > 
> > > Rather than introducing a new property, how about using
> > > "coherent_pool=??M" in the kernel command line if necessary? I think
> > > that this carveout size depends on the system usage/load.
> > 
> > I was hoping that we could get away with using the CMA and perhaps
> > initialize it based on device tree content. I agree that the carveout
> > size depends on the use-case, but I still think it makes sense to
> > specify it on a per-board basis.
> 
> DRM driver doesn't know if it uses CMA or not, because DRM only uses
> DMA API.

So how is the DRM supposed to allocate buffers? Does it call the
dma_alloc_from_contiguous() function to do that? I can see how it is
used by arm_dma_ops but how does it end up in the driver?

> I think that "coherent_pool" can be used only when the amount of
> contiguous memory is short in your system. Otherwise even unnecessary.
> 
> Could you explain a bit more why you want carveout size on per-board basis?

In the ideal case I would want to not have a carveout size at all.
However there may be situations where you need to make sure some driver
can allocate a given amount of memory. Having to specify this using a
kernel command-line parameter is cumbersome because it may require
changes to the bootloader or whatever. So if you know that a particular
board always needs 128 MiB of carveout, then it makes sense to specify
it on a per-board basis.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
> On Wed, 27 Jun 2012 07:14:18 +0200
> Thierry Reding  wrote:
> 
> > * PGP Signed by an unknown key
> > 
> > On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > > On Tue, 26 Jun 2012 12:55:13 +0200
> > > > Thierry Reding  wrote:
> > > >> ...
> > >  I'm not sure I understand how information about the carveout would be
> > >  obtained from the IOMMU API, though.
> > > >>>
> > > >>> I think that can be similar with current gart implementation. Define 
> > > >>> carveout as:
> > > >>>
> > > >>> carveout {
> > > >>> compatible = "nvidia,tegra20-carveout";
> > > >>> size = <0x1000>;
> > > >>> };
> > > >>>
> > > >>> Then create a file such like "tegra-carveout.c" to get these 
> > > >>> definitions and
> > > >> register itself as platform device's iommu instance.
> > > >>
> > > >> The carveout isn't a HW object, so it doesn't seem appropriate to 
> > > >> define a DT
> > > >> node to represent it.
> > > > 
> > > > Yes. But I think it's better to export the size of carveout as a 
> > > > configurable item.
> > > > So we need to define this somewhere. How about define carveout as a 
> > > > property of gart?
> > > 
> > > There already exists a way of preventing Linux from using certain chunks
> > > of memory; the /memreserve/ syntax. From a brief look at the dtc source,
> > > it looks like /memreserve/ entries can have labels, which implies that a
> > > property in the GART node could refer to the /memreserve/ entry by
> > > phandle in order to know what memory regions to use.
> > 
> > Wasn't the whole point of using a carveout supposed to be a replacement
> > for the GART?
> 
> Mostly agree. IIUC, we use both carveout/gart allocated buffers in
> android/tegra2.
> 
> >As such I'd think the carveout should rather be a property
> > of the host1x device.
> 
> Rather than introducing a new property, how about using
> "coherent_pool=??M" in the kernel command line if necessary? I think
> that this carveout size depends on the system usage/load.

I was hoping that we could get away with using the CMA and perhaps
initialize it based on device tree content. I agree that the carveout
size depends on the use-case, but I still think it makes sense to
specify it on a per-board basis.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 07:14:18 +0200
Thierry Reding  wrote:

> * PGP Signed by an unknown key
> 
> On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > On Tue, 26 Jun 2012 12:55:13 +0200
> > > Thierry Reding  wrote:
> > >> ...
> >  I'm not sure I understand how information about the carveout would be
> >  obtained from the IOMMU API, though.
> > >>>
> > >>> I think that can be similar with current gart implementation. Define 
> > >>> carveout as:
> > >>>
> > >>> carveout {
> > >>> compatible = "nvidia,tegra20-carveout";
> > >>> size = <0x1000>;
> > >>> };
> > >>>
> > >>> Then create a file such like "tegra-carveout.c" to get these 
> > >>> definitions and
> > >> register itself as platform device's iommu instance.
> > >>
> > >> The carveout isn't a HW object, so it doesn't seem appropriate to define 
> > >> a DT
> > >> node to represent it.
> > > 
> > > Yes. But I think it's better to export the size of carveout as a 
> > > configurable item.
> > > So we need to define this somewhere. How about define carveout as a 
> > > property of gart?
> > 
> > There already exists a way of preventing Linux from using certain chunks
> > of memory; the /memreserve/ syntax. From a brief look at the dtc source,
> > it looks like /memreserve/ entries can have labels, which implies that a
> > property in the GART node could refer to the /memreserve/ entry by
> > phandle in order to know what memory regions to use.
> 
> Wasn't the whole point of using a carveout supposed to be a replacement
> for the GART?

Mostly agree. IIUC, we use both carveout/gart allocated buffers in
android/tegra2.

>As such I'd think the carveout should rather be a property
> of the host1x device.

Rather than introducing a new property, how about using
"coherent_pool=??M" in the kernel command line if necessary? I think
that this carveout size depends on the system usage/load.

> AIUI what we want to do is have a large contiguous region of memory that
> a central component (host1x) manages as a pool from which clients (DRM,
> V4L, ...) can allocate buffers as needed. Since all of this memory will
> be contiguous anyway there isn't much use for the GART anymore.

Right.

I'd think that the amount of contiguous resion might depend on the
system usage/load.


Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 04:48:18 +0200
Stephen Warren  wrote:

> On 06/26/2012 08:32 PM, Mark Zhang wrote:
> >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > On Tue, 26 Jun 2012 12:55:13 +0200
> > Thierry Reding  wrote:
> >> ...
>  I'm not sure I understand how information about the carveout would be
>  obtained from the IOMMU API, though.
> >>>
> >>> I think that can be similar with current gart implementation. Define 
> >>> carveout as:
> >>>
> >>> carveout {
> >>> compatible = "nvidia,tegra20-carveout";
> >>> size = <0x1000>;
> >>> };
> >>>
> >>> Then create a file such like "tegra-carveout.c" to get these definitions 
> >>> and
> >> register itself as platform device's iommu instance.
> >>
> >> The carveout isn't a HW object, so it doesn't seem appropriate to define a 
> >> DT
> >> node to represent it.
> > 
> > Yes. But I think it's better to export the size of carveout as a 
> > configurable item.
> > So we need to define this somewhere. How about define carveout as a 
> > property of gart?
> 
> There already exists a way of preventing Linux from using certain chunks
> of memory; the /memreserve/ syntax. From a brief look at the dtc source,
> it looks like /memreserve/ entries can have labels, which implies that a
> property in the GART node could refer to the /memreserve/ entry by
> phandle in order to know what memory regions to use.

I think that we don't need the starting address for carveout but we
need its size. carveout memory is just anonymous physically
continguous buffer.


Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 04:32:07 +0200
Mark Zhang  wrote:

> > On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > >>> On Tue, 26 Jun 2012 12:55:13 +0200
> > >>> Thierry Reding  wrote:
> > ...
> > >> I'm not sure I understand how information about the carveout would be
> > >> obtained from the IOMMU API, though.
> > >
> > > I think that can be similar with current gart implementation. Define 
> > > carveout as:
> > >
> > > carveout {
> > > compatible = "nvidia,tegra20-carveout";
> > > size = <0x1000>;
> > > };
> > >
> > > Then create a file such like "tegra-carveout.c" to get these definitions 
> > > and
> > register itself as platform device's iommu instance.
> > 
> > The carveout isn't a HW object, so it doesn't seem appropriate to define a 
> > DT
> > node to represent it.
> > --
> 
> Yes. But I think it's better to export the size of carveout as a configurable 
> item.
> So we need to define this somewhere. How about define carveout as a property 
> of gart?

I agree that the carveout size should be configurable. But it may not be 
related to gart.


Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Tue, 26 Jun 2012 16:00:33 +0200
Thierry Reding  wrote:

> * PGP Signed by an unknown key
> 
> On Tue, Jun 26, 2012 at 04:02:24PM +0300, Hiroshi Doyu wrote:
> > Hi Thierry,
> > 
> > On Tue, 26 Jun 2012 12:55:13 +0200
> > Thierry Reding  wrote:
> > 
> > > > Old Signed by an unknown key
> > > 
> > > Hi,
> > > 
> > > while I haven't got much time to work on the actual code right now, I
> > > think it might still be useful if we could get the device tree binding
> > > to a point where everybody is happy with it. That'll also save me some
> > > time once I get to writing the code because I won't have to redo it over
> > > again. =)
> > > 
> > > So here's the current proposal:
> > > 
> > >   host1x {
> > >   compatible = "nvidia,tegra20-host1x", "simple-bus";
> > >   reg = <0x5000 0x00024000>;
> > >   interrupts = <0 64 0x04   /* cop syncpt */
> > > 0 65 0x04   /* mpcore syncpt */
> > > 0 66 0x04   /* cop general */
> > > 0 67 0x04>; /* mpcore general */
> > > 
> > >   #address-cells = <1>;
> > >   #size-cells = <1>;
> > > 
> > >   ranges = <0x5400 0x5400 0x0400>;
> > > 
> > >   status = "disabled";
> > > 
> > >   gart = <>;
> > ...
> > > output and hooks up a static EDID block with the LVDS output. There is
> > > also a carveout property which might be a better replacement for the
> > > "crippled" GART on Tegra20. Alternatively the CMA might work just as
> > > well instead.
> > > 
> > > The Plutux can be described like this:
> > > 
> > >   host1x {
> > >   carveout = <0x0e00 0x0200>;
> > 
> > As discussed in the following ML thread previously, the necessary info
> > related to the "gart" would be got from the standard IOMMU API(or
> > something above layers, DMABUF or TTM?). So I don't think that we need
> > to refer to "gart" and "carveout" here in the end.
> > 
> > http://lists.linuxfoundation.org/pipermail/iommu/2012-June/004266.html
> 
> Yes, if IOMMU or some layer above can provide the same information, then
> that is certainly better than explicitly referencing it in the DT.
> 
> I'm not sure I understand how information about the carveout would be
> obtained from the IOMMU API, though.

I think that there are 2 cases:

  (1) discontiguous memory with IOMMU
  (2) contiguous memory without IOMMU(called "carveout" in general?)

For (1), the necessary info should be passed via IOMMU API as
mentioned in the previous reply, like DT->{GART,SMMU}->IOMMU->DRM.

  [PATCH 0/5] IOMMU: Make IOMMU-API ready for GART-like hardware
https://lkml.org/lkml/2012/1/19/170

For (2), although memory is mostly anonymous one, we may need to know
how much to allocate, where we only need "size". This size is not from
h/w feature, but it depends on the system load/usage. So I think that
this size can be passed from kernel command line? For exmaple, we can
specify how much contiguous memory is necessary with putting
"coherent_pool=??M" in the kernel command line as below:

coherent_pool=nn[KMG]   [ARM,KNL]
Sets the size of memory pool for coherent, atomic dma
allocations.


Tegra DRM device tree bindings

2012-06-27 Thread Mark Zhang
> On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > On Tue, 26 Jun 2012 12:55:13 +0200 Thierry Reding
> > >  wrote:
> > >> ...
> >  I'm not sure I understand how information about the carveout
> >  would be obtained from the IOMMU API, though.
> > >>>
> > >>> I think that can be similar with current gart implementation. Define
> carveout as:
> > >>>
> > >>> carveout {
> > >>> compatible = "nvidia,tegra20-carveout";
> > >>> size = <0x1000>;
> > >>> };
> > >>>
> > >>> Then create a file such like "tegra-carveout.c" to get these
> > >>> definitions and
> > >> register itself as platform device's iommu instance.
> > >>
> > >> The carveout isn't a HW object, so it doesn't seem appropriate to
> > >> define a DT node to represent it.
> > >
> > > Yes. But I think it's better to export the size of carveout as a 
> > > configurable item.
> > > So we need to define this somewhere. How about define carveout as a
> property of gart?
> >
> > There already exists a way of preventing Linux from using certain
> > chunks of memory; the /memreserve/ syntax. From a brief look at the
> > dtc source, it looks like /memreserve/ entries can have labels, which
> > implies that a property in the GART node could refer to the
> > /memreserve/ entry by phandle in order to know what memory regions to use.
> 
> Wasn't the whole point of using a carveout supposed to be a replacement for 
> the
> GART? As such I'd think the carveout should rather be a property of the host1x
> device.
> 
> AIUI what we want to do is have a large contiguous region of memory that a
> central component (host1x) manages as a pool from which clients (DRM, V4L, 
> ...)
> can allocate buffers as needed. Since all of this memory will be contiguous
> anyway there isn't much use for the GART anymore.
> 

I have the same understanding. We don't need GART anymore if carveout is 
enabled.
I'm thinking that why we need to define a property and reference to global 
/memreserve/ in GART or HOST1X node?
We can just define a label for /memreserve/, so we can distinguish these memory 
reservations already in codes.

> But maybe I'm misunderstanding.
> 
> Thierry
> 
> * Unknown Key
> * 0x7F3EB3A1


Tegra DRM device tree bindings

2012-06-27 Thread Stephen Warren
On 06/27/2012 06:44 AM, Hiroshi Doyu wrote:
...
> I think that there are 2 cases:
> 
>   (1) discontiguous memory with IOMMU
>   (2) contiguous memory without IOMMU(called "carveout" in general?)
...
> For (2), although memory is mostly anonymous one, we may need to know
> how much to allocate, where we only need "size". This size is not from
> h/w feature, but it depends on the system load/usage. So I think that
> this size can be passed from kernel command line? For exmaple, we can
> specify how much contiguous memory is necessary with putting
> "coherent_pool=??M" in the kernel command line as below:
> 
>   coherent_pool=nn[KMG]   [ARM,KNL]
>   Sets the size of memory pool for coherent, atomic dma
>   allocations.

I guess if that's the standard way of initializing CMA, then that's fine.

It'd be nice if there was a way to specify that from the DT too; that
way the user/distro/bootloader constructing the kernel command-line
wouldn't have to remember to add "random" (potentially
Tegra-/board-specific) extra arguments onto the command-line; the Tegra
command-line in the upstream kernel is quite clean right now, especially
compare to the enormous number of options we require downstream:-(


Tegra DRM device tree bindings

2012-06-27 Thread Stephen Warren
On 06/27/2012 08:29 AM, Hiroshi Doyu wrote:
> Could you explain a bit more why you want carveout size on per-board basis?

Different boards have different amounts of memory, and are sometimes
targeted at different use-cases (e.g. server with simple display buffer,
vs. consumer-oriented device intended to play games with OpenGL
allocating lots of textures).


Tegra DRM device tree bindings

2012-06-27 Thread Stephen Warren
On 06/26/2012 11:07 PM, Thierry Reding wrote:
> On Tue, Jun 26, 2012 at 04:48:14PM -0600, Stephen Warren wrote:
...
> I actually like what you proposed above a lot, so if you don't
> mind either way I'll go with that proposal. Keeping the connector
> nodes as children of the outputs has the advantage of being able to
> reference them if we need it at some point. But it is also
> redundant in that a single output doesn't usually (never?) driver
> more than one connector.

Yes, I believe that each output is 1:1 with (the video portion of) a
connector. The display controllers obviously aren't 1:1.

> The same issue will have to be addressed for the CSI and VI nodes,
> but as I currently use neither of those I don't feel qualified to
> propose a binding for them. Also for the VI part we're completely
> missing documentation. Maybe somebody could push this to be
> released as well?

I did file a bug noting the request for VI documentation. At this
point in time, it's too early to say what, if anything, will come of that.

> If I understand correctly, most of the host1x children can also be 
> chained in a processing pipeline to do postprocessing an video
> input for example. I suppose that's generic and doesn't need to be
> represented in DT either, right?

Yes, I believe that's something internal to the driver.


Tegra DRM device tree bindings

2012-06-27 Thread Mark Zhang
> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> >>> On Tue, 26 Jun 2012 12:55:13 +0200
> >>> Thierry Reding  wrote:
> ...
> >> I'm not sure I understand how information about the carveout would be
> >> obtained from the IOMMU API, though.
> >
> > I think that can be similar with current gart implementation. Define 
> > carveout as:
> >
> > carveout {
> > compatible = "nvidia,tegra20-carveout";
> > size = <0x1000>;
> > };
> >
> > Then create a file such like "tegra-carveout.c" to get these definitions and
> register itself as platform device's iommu instance.
> 
> The carveout isn't a HW object, so it doesn't seem appropriate to define a DT
> node to represent it.
> --

Yes. But I think it's better to export the size of carveout as a configurable 
item.
So we need to define this somewhere. How about define carveout as a property of 
gart?

> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the 
> body
> of a message to majordomo at vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


Tegra DRM device tree bindings

2012-06-27 Thread Lucas Stach
Hi all,

I'm not sure what your exact plans are for the direction in which the
DRM driver should head, as I'm still a bit out of the loop as many of
those matters were only discussed internally at NVIDIA or with some NDA
developers. But I'll still try to get into the discussion.

Am Mittwoch, den 27.06.2012, 07:14 +0200 schrieb Thierry Reding:
> On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> > On 06/26/2012 08:32 PM, Mark Zhang wrote:
> > >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > > On Tue, 26 Jun 2012 12:55:13 +0200
> > > Thierry Reding  wrote:
> > >> ...
> >  I'm not sure I understand how information about the carveout would be
> >  obtained from the IOMMU API, though.
> > >>>
> > >>> I think that can be similar with current gart implementation. Define 
> > >>> carveout as:
> > >>>
> > >>> carveout {
> > >>> compatible = "nvidia,tegra20-carveout";
> > >>> size = <0x1000>;
> > >>> };
> > >>>
> > >>> Then create a file such like "tegra-carveout.c" to get these 
> > >>> definitions and
> > >> register itself as platform device's iommu instance.
> > >>
> > >> The carveout isn't a HW object, so it doesn't seem appropriate to define 
> > >> a DT
> > >> node to represent it.
> > > 
> > > Yes. But I think it's better to export the size of carveout as a 
> > > configurable item.
> > > So we need to define this somewhere. How about define carveout as a 
> > > property of gart?
> > 
> > There already exists a way of preventing Linux from using certain chunks
> > of memory; the /memreserve/ syntax. From a brief look at the dtc source,
> > it looks like /memreserve/ entries can have labels, which implies that a
> > property in the GART node could refer to the /memreserve/ entry by
> > phandle in order to know what memory regions to use.
> 
> Wasn't the whole point of using a carveout supposed to be a replacement
> for the GART? As such I'd think the carveout should rather be a property
> of the host1x device.
> 
In my understanding carveout is neither a hardware nor software
component. It's just a somewhat special pool of memory. As I pointed out
in one of the older mails, carveout can not completely replace GART.
While normal allocations for graphics use should be done contiguous,
GART allows us to link normal scattered sysram buffers into GPU address
space, which is a nice thing to have.
IMHO if carveout is to be used exclusively by the GPU (i.e. the DRM
driver) it should be a property of the host1x device.

> AIUI what we want to do is have a large contiguous region of memory that
> a central component (host1x) manages as a pool from which clients (DRM,
> V4L, ...) can allocate buffers as needed. Since all of this memory will
> be contiguous anyway there isn't much use for the GART anymore.
> 
I think this is the wrong way to go. Having a special memory pool
managed by some driver adds one more allocator to the kernel, which is
clearly not desirable. If we want a special mem region for GPU use, we
should not share this memory pool with other components.

But if we want a mem region for contig allocations used by many
components, which seems to be consensus here, CMA is the way to go. In
this case I think we don't want to bother with the carveout property at
all at the DRM driver level. Such a shared mem region managed by CMA
should be defined at a higher level of the device tree.

Thanks,
Lucas




Tegra DRM device tree bindings

2012-06-27 Thread Mark Zhang
> > On Tue, 26 Jun 2012 12:55:13 +0200
> > Thierry Reding  wrote:
> >
> > > > Old Signed by an unknown key
> > >
> > > Hi,
> > >
> > > while I haven't got much time to work on the actual code right now,
> > > I think it might still be useful if we could get the device tree
> > > binding to a point where everybody is happy with it. That'll also
> > > save me some time once I get to writing the code because I won't
> > > have to redo it over again. =)
> > >
> > > So here's the current proposal:
> > >
> > >   host1x {
> > >   compatible = "nvidia,tegra20-host1x", "simple-bus";
> > >   reg = <0x5000 0x00024000>;
> > >   interrupts = <0 64 0x04   /* cop syncpt */
> > > 0 65 0x04   /* mpcore syncpt */
> > > 0 66 0x04   /* cop general */
> > > 0 67 0x04>; /* mpcore general */
> > >
> > >   #address-cells = <1>;
> > >   #size-cells = <1>;
> > >
> > >   ranges = <0x5400 0x5400 0x0400>;
> > >
> > >   status = "disabled";
> > >
> > >   gart = <>;
> > ...
> > > output and hooks up a static EDID block with the LVDS output. There
> > > is also a carveout property which might be a better replacement for
> > > the "crippled" GART on Tegra20. Alternatively the CMA might work
> > > just as well instead.
> > >
> > > The Plutux can be described like this:
> > >
> > >   host1x {
> > >   carveout = <0x0e00 0x0200>;
> >
> > As discussed in the following ML thread previously, the necessary info
> > related to the "gart" would be got from the standard IOMMU API(or
> > something above layers, DMABUF or TTM?). So I don't think that we need
> > to refer to "gart" and "carveout" here in the end.
> >
> > http://lists.linuxfoundation.org/pipermail/iommu/2012-June/004266.html
> 
> Yes, if IOMMU or some layer above can provide the same information, then that
> is certainly better than explicitly referencing it in the DT.
> 
> I'm not sure I understand how information about the carveout would be
> obtained from the IOMMU API, though.

I think that can be similar with current gart implementation. Define carveout 
as: 

carveout {
compatible = "nvidia,tegra20-carveout";
size = <0x1000>;
};

Then create a file such like "tegra-carveout.c" to get these definitions and 
register itself as platform device's iommu instance.

> 
> Thierry
> 
> * Unknown Key
> * 0x7F3EB3A1


Tegra DRM device tree bindings

2012-06-27 Thread Terje Bergström
On 26.06.2012 20:46, Stephen Warren wrote:

> I'd certainly like to see some upstream discussion re: why exactly we
> have a custom bus type here. What does it do that a regular platform bus
> doesn't do? Are those features something that would be generally useful
> to add to the existing platform bus? Can we instead add hooks into
> platform bus rather than creating a new bus? I recall you saying that
> the nvhost_bus duplicated a lot of code from the existing platform bus.

Yes, the code is largely a copy of platform bus. One big difference is
type safety - all the nvhost_bus APIs take in and give out nvhost_device
or nvhost_driver pointers. At the registration time I also enforce
parent-child relationship to host1x and bus clients.

Another is that I am able to enumerate all host1x related devices. The
latter could of course be done via listing the children of host1x.

And third is that I was able to add version field to device to
distinguish between Tegra2 and Tegra3 3D (for example) and make the bus
use that information for binding driver and device. As the sysfs entry
path is determined by the device name, this allows keeping the old sysfs
APIs.

As you can see, most of the reasons are more related to convenience than
own bus_type being the only way to implement them.

Terje


Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Wed, Jun 27, 2012 at 09:13:52AM +0300, Terje Bergstr?m wrote:
> On 26.06.2012 20:46, Stephen Warren wrote:
> 
> > I'd certainly like to see some upstream discussion re: why exactly we
> > have a custom bus type here. What does it do that a regular platform bus
> > doesn't do? Are those features something that would be generally useful
> > to add to the existing platform bus? Can we instead add hooks into
> > platform bus rather than creating a new bus? I recall you saying that
> > the nvhost_bus duplicated a lot of code from the existing platform bus.
> 
> Yes, the code is largely a copy of platform bus. One big difference is
> type safety - all the nvhost_bus APIs take in and give out nvhost_device
> or nvhost_driver pointers. At the registration time I also enforce
> parent-child relationship to host1x and bus clients.

I don't know where NVIDIA wants to go with respect to device trees in
their downstream kernels, but at least for mainline (with DT-only
support on Tegra) the parent-child relationship will be rooted in the
DT anyway.

> Another is that I am able to enumerate all host1x related devices. The
> latter could of course be done via listing the children of host1x.
> 
> And third is that I was able to add version field to device to
> distinguish between Tegra2 and Tegra3 3D (for example) and make the bus
> use that information for binding driver and device. As the sysfs entry
> path is determined by the device name, this allows keeping the old sysfs
> APIs.

Again this can easily be represented by the DT compatible property.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-27 Thread Terje Bergström
On 26.06.2012 22:31, Thierry Reding wrote:

> Okay, I see. Does the same apply to the COP interrupts of the host1x
> node in your opinion? I don't know if it makes sense to describe
> something that's not reachable from the CPU. Yet it is defined in the
> GIC.


Hi,

I wasn't sure so I had to check from the hardware people. The COP
interrupts are not usable in a real system. They're usable in our
internal simulation environment.

The client module interrupts are only for flagging error conditions.

Terje


Tegra DRM device tree bindings

2012-06-27 Thread Terje Bergström
On 26.06.2012 22:38, Stephen Warren wrote:

> I would assume this can safely be inferred from the compatible value;
> nvidia,tegra20-host1x v.s. nvidia,tegra30-host1x, and so there's no
> need to represent this in DT. I would assume (and it's certainly just
> an assumption) that there are numerous other small details that are
> different between the two SoCs, and so the driver will need to have
> some table mapping from compatible value to various information
> anyway. Terje, can you confirm/deny this (and hopefully use your
> knowledge of any future chips to guide the decision without giving
> anything away)?


I'll now discuss what is going in our internal driver, which will be
visible in our public git tree at some point.

Regarding just host1x, I have just parameterized per SoC the following
properties in the driver:
 * number of sync points
 * number of channels
 * number of wait bases
 * number of mlocks

In addition to this, per SoC we parameterize the mapping of sync points
to channels, the names of sync points (for debugging purposes) and
whether sync point is stream managed or client managed.

For host1x and other devices, we have parameters indicating ability to
power gate, idle times before clock gating and power gating, power
domain ids (also used for resetting), associated clocks and some other
stuff. In the driver parameters we have function pointers for context
management and clock scaling.

Of these, possibility to power gate, power domain ids, number of
resources are hardware properties, and rest are more of less agreements
on how software uses the resources. Context management is in the
borderline - each SoC has its own sequence to save and restore a context
for 3D and MPE.

We will also have different register layouts for host1x in different
SoCs and I have recently made nvhost ready for that.

Everything else except the register layout and the context management
could be encoded into device trees. The remaining things will ensure
that we will need to have some SoC specific code in the driver.

Terje


Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> On 06/26/2012 08:32 PM, Mark Zhang wrote:
> >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > On Tue, 26 Jun 2012 12:55:13 +0200
> > Thierry Reding  wrote:
> >> ...
>  I'm not sure I understand how information about the carveout would be
>  obtained from the IOMMU API, though.
> >>>
> >>> I think that can be similar with current gart implementation. Define 
> >>> carveout as:
> >>>
> >>> carveout {
> >>> compatible = "nvidia,tegra20-carveout";
> >>> size = <0x1000>;
> >>> };
> >>>
> >>> Then create a file such like "tegra-carveout.c" to get these definitions 
> >>> and
> >> register itself as platform device's iommu instance.
> >>
> >> The carveout isn't a HW object, so it doesn't seem appropriate to define a 
> >> DT
> >> node to represent it.
> > 
> > Yes. But I think it's better to export the size of carveout as a 
> > configurable item.
> > So we need to define this somewhere. How about define carveout as a 
> > property of gart?
> 
> There already exists a way of preventing Linux from using certain chunks
> of memory; the /memreserve/ syntax. From a brief look at the dtc source,
> it looks like /memreserve/ entries can have labels, which implies that a
> property in the GART node could refer to the /memreserve/ entry by
> phandle in order to know what memory regions to use.

That doesn't work, unfortunately. The /memreserve/ label isn't even
stored in the DTB. Even DTC throws an error when you try to reference
the /memreserve/ by label.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
> On 06/26/2012 08:32 PM, Mark Zhang wrote:
> >> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> > On Tue, 26 Jun 2012 12:55:13 +0200
> > Thierry Reding  wrote:
> >> ...
>  I'm not sure I understand how information about the carveout would be
>  obtained from the IOMMU API, though.
> >>>
> >>> I think that can be similar with current gart implementation. Define 
> >>> carveout as:
> >>>
> >>> carveout {
> >>> compatible = "nvidia,tegra20-carveout";
> >>> size = <0x1000>;
> >>> };
> >>>
> >>> Then create a file such like "tegra-carveout.c" to get these definitions 
> >>> and
> >> register itself as platform device's iommu instance.
> >>
> >> The carveout isn't a HW object, so it doesn't seem appropriate to define a 
> >> DT
> >> node to represent it.
> > 
> > Yes. But I think it's better to export the size of carveout as a 
> > configurable item.
> > So we need to define this somewhere. How about define carveout as a 
> > property of gart?
> 
> There already exists a way of preventing Linux from using certain chunks
> of memory; the /memreserve/ syntax. From a brief look at the dtc source,
> it looks like /memreserve/ entries can have labels, which implies that a
> property in the GART node could refer to the /memreserve/ entry by
> phandle in order to know what memory regions to use.

Wasn't the whole point of using a carveout supposed to be a replacement
for the GART? As such I'd think the carveout should rather be a property
of the host1x device.

AIUI what we want to do is have a large contiguous region of memory that
a central component (host1x) manages as a pool from which clients (DRM,
V4L, ...) can allocate buffers as needed. Since all of this memory will
be contiguous anyway there isn't much use for the GART anymore.

But maybe I'm misunderstanding.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Tue, Jun 26, 2012 at 04:48:14PM -0600, Stephen Warren wrote:
> On 06/26/2012 01:51 PM, Thierry Reding wrote:
> > On Tue, Jun 26, 2012 at 12:10:42PM -0600, Stephen Warren wrote:
> >> On 06/26/2012 04:55 AM, Thierry Reding wrote:
> >>> Hi,
> >>> 
> >>> while I haven't got much time to work on the actual code right
> >>> now, I think it might still be useful if we could get the
> >>> device tree binding to a point where everybody is happy with
> >>> it. That'll also save me some time once I get to writing the
> >>> code because I won't have to redo it over again. =)
> >>> 
> >>> So here's the current proposal:
> >>> 
> >>> host1x { compatible = "nvidia,tegra20-host1x", "simple-bus"; 
> >>> reg = <0x5000 0x00024000>; interrupts = <0 64 0x04   /* cop
> >>> syncpt */ 0 65 0x04   /* mpcore syncpt */ 0 66 0x04   /* cop
> >>> general */ 0 67 0x04>; /* mpcore general */
> >>> 
> >>> #address-cells = <1>; #size-cells = <1>;
> >>> 
> >>> ranges = <0x5400 0x5400 0x0400>;
> >>> 
> >>> status = "disabled";
> >> 
> >> The idea behind status="disabled" is that some HW only makes
> >> sense to use on particular boards. This concept really only
> >> applies to HW modules that drive external interfaces on the SoC,
> >> which in turn the board can choose whether to connect anything to
> >> (or whether to even connect to any external pins using the
> >> pinmux, or not).
> >> 
> >> As such, I don't think it makes sense to set status="disabled"
> >> on host1x, nor many other internal-only engines such as say mpe,
> >> epp, i2sp, gr2d, gr3d, dc1, dc2.
> > 
> > What about power management and resource usage? If a board for
> > instance doesn't need gr3d at all it could just leave it at status
> > = "disabled" to not have the corresponding driver loaded and not
> > waste the power and resources.
> 
> The driver should be turning off all the clocks and power if the
> devices aren't actually used (or not turning it on in the first
> place). I guess there will be some slight overhead for the
> device/driver instantiation.

I think it would be great if we provided true runtime power-management
for the host1x children at some point. Then this should all work pretty
much automatically.

> However, in all likelihood, all/most boards will enable this feature
> once it's in place. For very resource-constrained scenarios without
> display, presumably one would be building a custom kernel without the
> display driver enabled, so the DT content for display wouldn't ever
> come into play.

Resource-constrained systems could always choose to override the default
and set the corresponding nodes to status = "disabled". Anyway, I agree
that this functionality will likely be used by most boards so I'll
enable everything except the outputs by default.

> >>> Board DTS files could then extend this with board-specific
> >>> requirements and connectors. The following describes the Medcom
> >>> Wide:
> >> 
> >>> connectors { #address-cells = <1>; #size-cells = <0>;
> >>> 
> >>> };
> >> 
> >> The connector seems to be a property of the individual output
> >> resources. I'd expect to see the connector configuration be a
> >> child of the outputs that a particular board had enabled;
> >> something more like:
> >> 
> >> host1x { rgb { status = "okay";
> >> 
> >> connector at 0 { nvidia,edid = /incbin/("tegra-medcom.edid"); }; }; 
> >> hdmi { status = "okay";
> >> 
> >> connector at 0 { nvidia,ddc-i2c-bus = <_i2c1>; }; }; };
> >> 
> >> Perhaps even completely omit the connector node, and put the
> >> properties directly within the rgb/hdmi node itself. After all
> >> the HDMI output really is the connector as far as Tegra goes.
> > 
> > Heh. I seem to remember you objecting to this in a previous
> > series[0] which is actually the reason that I moved them to the
> > top-level in the first place. =)
> > 
> > Thierry
> > 
> > [0]: http://www.spinics.net/lists/linux-tegra/msg05298.html
> 
> I don't think I was objecting to exactly what I wrote above; in that
> email, there were already separate connector nodes, but they were
> placed directly inside the host1x node (at that time called graphics),
> so still rather disconnected from the HW they represent.
> 
> The argument for sharing e.g. an HDMI port between both video and
> audio still exists though. That said, I think now I'd still be
> inclined to put all the connector information for video into the
> hdmi/rgb/tvo nodes. If DT does grow to represent the user-connectors
> at the top-level perhaps in conjunction with drivers/extcon, perhaps
> the hdmi node can point at the extcon node with a phandle, or
> vice-versa, to set up the link between the components of the
> user-visible port.
> 
> Still, the decision here is possibly a little arbitrary; many schemes
> would work. I think at this point I don't care /too/ strongly about
> which is used, so the separate-connectors-at-top-level concept in your
> email is probably OK. I wonder if the hdmi node doesn't need a phandle
> pointing at the 

Re: Tegra DRM device tree bindings

2012-06-27 Thread Stephen Warren
On 06/26/2012 01:51 PM, Thierry Reding wrote:
 On Tue, Jun 26, 2012 at 12:10:42PM -0600, Stephen Warren wrote:
 On 06/26/2012 04:55 AM, Thierry Reding wrote:
 Hi,
 
 while I haven't got much time to work on the actual code right
 now, I think it might still be useful if we could get the
 device tree binding to a point where everybody is happy with
 it. That'll also save me some time once I get to writing the
 code because I won't have to redo it over again. =)
 
 So here's the current proposal:
 
 host1x { compatible = nvidia,tegra20-host1x, simple-bus; 
 reg = 0x5000 0x00024000; interrupts = 0 64 0x04   /* cop
 syncpt */ 0 65 0x04   /* mpcore syncpt */ 0 66 0x04   /* cop
 general */ 0 67 0x04; /* mpcore general */
 
 #address-cells = 1; #size-cells = 1;
 
 ranges = 0x5400 0x5400 0x0400;
 
 status = disabled;
 
 The idea behind status=disabled is that some HW only makes
 sense to use on particular boards. This concept really only
 applies to HW modules that drive external interfaces on the SoC,
 which in turn the board can choose whether to connect anything to
 (or whether to even connect to any external pins using the
 pinmux, or not).
 
 As such, I don't think it makes sense to set status=disabled
 on host1x, nor many other internal-only engines such as say mpe,
 epp, i2sp, gr2d, gr3d, dc1, dc2.
 
 What about power management and resource usage? If a board for
 instance doesn't need gr3d at all it could just leave it at status
 = disabled to not have the corresponding driver loaded and not
 waste the power and resources.

The driver should be turning off all the clocks and power if the
devices aren't actually used (or not turning it on in the first
place). I guess there will be some slight overhead for the
device/driver instantiation.

However, in all likelihood, all/most boards will enable this feature
once it's in place. For very resource-constrained scenarios without
display, presumably one would be building a custom kernel without the
display driver enabled, so the DT content for display wouldn't ever
come into play.

 Board DTS files could then extend this with board-specific
 requirements and connectors. The following describes the Medcom
 Wide:
 
 connectors { #address-cells = 1; #size-cells = 0;
 
 };
 
 The connector seems to be a property of the individual output
 resources. I'd expect to see the connector configuration be a
 child of the outputs that a particular board had enabled;
 something more like:
 
 host1x { rgb { status = okay;
 
 connector@0 { nvidia,edid = /incbin/(tegra-medcom.edid); }; }; 
 hdmi { status = okay;
 
 connector@0 { nvidia,ddc-i2c-bus = tegra_i2c1; }; }; };
 
 Perhaps even completely omit the connector node, and put the
 properties directly within the rgb/hdmi node itself. After all
 the HDMI output really is the connector as far as Tegra goes.
 
 Heh. I seem to remember you objecting to this in a previous
 series[0] which is actually the reason that I moved them to the
 top-level in the first place. =)
 
 Thierry
 
 [0]: http://www.spinics.net/lists/linux-tegra/msg05298.html

I don't think I was objecting to exactly what I wrote above; in that
email, there were already separate connector nodes, but they were
placed directly inside the host1x node (at that time called graphics),
so still rather disconnected from the HW they represent.

The argument for sharing e.g. an HDMI port between both video and
audio still exists though. That said, I think now I'd still be
inclined to put all the connector information for video into the
hdmi/rgb/tvo nodes. If DT does grow to represent the user-connectors
at the top-level perhaps in conjunction with drivers/extcon, perhaps
the hdmi node can point at the extcon node with a phandle, or
vice-versa, to set up the link between the components of the
user-visible port.

Still, the decision here is possibly a little arbitrary; many schemes
would work. I think at this point I don't care /too/ strongly about
which is used, so the separate-connectors-at-top-level concept in your
email is probably OK. I wonder if the hdmi node doesn't need a phandle
pointing at the connector node though, so they can both find each-other?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Stephen Warren
On 06/26/2012 07:46 PM, Mark Zhang wrote:
 On Tue, 26 Jun 2012 12:55:13 +0200
 Thierry Reding thierry.red...@avionic-design.de wrote:
...
 I'm not sure I understand how information about the carveout would be
 obtained from the IOMMU API, though.
 
 I think that can be similar with current gart implementation. Define carveout 
 as: 
 
 carveout {
 compatible = nvidia,tegra20-carveout;
 size = 0x1000;
 };
 
 Then create a file such like tegra-carveout.c to get these definitions and 
 register itself as platform device's iommu instance.

The carveout isn't a HW object, so it doesn't seem appropriate to define
a DT node to represent it.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Stephen Warren
On 06/26/2012 08:32 PM, Mark Zhang wrote:
 On 06/26/2012 07:46 PM, Mark Zhang wrote:
 On Tue, 26 Jun 2012 12:55:13 +0200
 Thierry Reding thierry.red...@avionic-design.de wrote:
 ...
 I'm not sure I understand how information about the carveout would be
 obtained from the IOMMU API, though.

 I think that can be similar with current gart implementation. Define 
 carveout as:

 carveout {
 compatible = nvidia,tegra20-carveout;
 size = 0x1000;
 };

 Then create a file such like tegra-carveout.c to get these definitions and
 register itself as platform device's iommu instance.

 The carveout isn't a HW object, so it doesn't seem appropriate to define a DT
 node to represent it.
 
 Yes. But I think it's better to export the size of carveout as a configurable 
 item.
 So we need to define this somewhere. How about define carveout as a property 
 of gart?

There already exists a way of preventing Linux from using certain chunks
of memory; the /memreserve/ syntax. From a brief look at the dtc source,
it looks like /memreserve/ entries can have labels, which implies that a
property in the GART node could refer to the /memreserve/ entry by
phandle in order to know what memory regions to use.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Terje Bergström
On 26.06.2012 22:38, Stephen Warren wrote:

 I would assume this can safely be inferred from the compatible value;
 nvidia,tegra20-host1x v.s. nvidia,tegra30-host1x, and so there's no
 need to represent this in DT. I would assume (and it's certainly just
 an assumption) that there are numerous other small details that are
 different between the two SoCs, and so the driver will need to have
 some table mapping from compatible value to various information
 anyway. Terje, can you confirm/deny this (and hopefully use your
 knowledge of any future chips to guide the decision without giving
 anything away)?


I'll now discuss what is going in our internal driver, which will be
visible in our public git tree at some point.

Regarding just host1x, I have just parameterized per SoC the following
properties in the driver:
 * number of sync points
 * number of channels
 * number of wait bases
 * number of mlocks

In addition to this, per SoC we parameterize the mapping of sync points
to channels, the names of sync points (for debugging purposes) and
whether sync point is stream managed or client managed.

For host1x and other devices, we have parameters indicating ability to
power gate, idle times before clock gating and power gating, power
domain ids (also used for resetting), associated clocks and some other
stuff. In the driver parameters we have function pointers for context
management and clock scaling.

Of these, possibility to power gate, power domain ids, number of
resources are hardware properties, and rest are more of less agreements
on how software uses the resources. Context management is in the
borderline - each SoC has its own sequence to save and restore a context
for 3D and MPE.

We will also have different register layouts for host1x in different
SoCs and I have recently made nvhost ready for that.

Everything else except the register layout and the context management
could be encoded into device trees. The remaining things will ensure
that we will need to have some SoC specific code in the driver.

Terje
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Terje Bergström
On 26.06.2012 22:31, Thierry Reding wrote:

 Okay, I see. Does the same apply to the COP interrupts of the host1x
 node in your opinion? I don't know if it makes sense to describe
 something that's not reachable from the CPU. Yet it is defined in the
 GIC.


Hi,

I wasn't sure so I had to check from the hardware people. The COP
interrupts are not usable in a real system. They're usable in our
internal simulation environment.

The client module interrupts are only for flagging error conditions.

Terje
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
 On 06/26/2012 08:32 PM, Mark Zhang wrote:
  On 06/26/2012 07:46 PM, Mark Zhang wrote:
  On Tue, 26 Jun 2012 12:55:13 +0200
  Thierry Reding thierry.red...@avionic-design.de wrote:
  ...
  I'm not sure I understand how information about the carveout would be
  obtained from the IOMMU API, though.
 
  I think that can be similar with current gart implementation. Define 
  carveout as:
 
  carveout {
  compatible = nvidia,tegra20-carveout;
  size = 0x1000;
  };
 
  Then create a file such like tegra-carveout.c to get these definitions 
  and
  register itself as platform device's iommu instance.
 
  The carveout isn't a HW object, so it doesn't seem appropriate to define a 
  DT
  node to represent it.
  
  Yes. But I think it's better to export the size of carveout as a 
  configurable item.
  So we need to define this somewhere. How about define carveout as a 
  property of gart?
 
 There already exists a way of preventing Linux from using certain chunks
 of memory; the /memreserve/ syntax. From a brief look at the dtc source,
 it looks like /memreserve/ entries can have labels, which implies that a
 property in the GART node could refer to the /memreserve/ entry by
 phandle in order to know what memory regions to use.

That doesn't work, unfortunately. The /memreserve/ label isn't even
stored in the DTB. Even DTC throws an error when you try to reference
the /memreserve/ by label.

Thierry


pgptX5mys6Elb.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Terje Bergström
On 26.06.2012 20:46, Stephen Warren wrote:

 I'd certainly like to see some upstream discussion re: why exactly we
 have a custom bus type here. What does it do that a regular platform bus
 doesn't do? Are those features something that would be generally useful
 to add to the existing platform bus? Can we instead add hooks into
 platform bus rather than creating a new bus? I recall you saying that
 the nvhost_bus duplicated a lot of code from the existing platform bus.

Yes, the code is largely a copy of platform bus. One big difference is
type safety - all the nvhost_bus APIs take in and give out nvhost_device
or nvhost_driver pointers. At the registration time I also enforce
parent-child relationship to host1x and bus clients.

Another is that I am able to enumerate all host1x related devices. The
latter could of course be done via listing the children of host1x.

And third is that I was able to add version field to device to
distinguish between Tegra2 and Tegra3 3D (for example) and make the bus
use that information for binding driver and device. As the sysfs entry
path is determined by the device name, this allows keeping the old sysfs
APIs.

As you can see, most of the reasons are more related to convenience than
own bus_type being the only way to implement them.

Terje
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Wed, Jun 27, 2012 at 09:13:52AM +0300, Terje Bergström wrote:
 On 26.06.2012 20:46, Stephen Warren wrote:
 
  I'd certainly like to see some upstream discussion re: why exactly we
  have a custom bus type here. What does it do that a regular platform bus
  doesn't do? Are those features something that would be generally useful
  to add to the existing platform bus? Can we instead add hooks into
  platform bus rather than creating a new bus? I recall you saying that
  the nvhost_bus duplicated a lot of code from the existing platform bus.
 
 Yes, the code is largely a copy of platform bus. One big difference is
 type safety - all the nvhost_bus APIs take in and give out nvhost_device
 or nvhost_driver pointers. At the registration time I also enforce
 parent-child relationship to host1x and bus clients.

I don't know where NVIDIA wants to go with respect to device trees in
their downstream kernels, but at least for mainline (with DT-only
support on Tegra) the parent-child relationship will be rooted in the
DT anyway.

 Another is that I am able to enumerate all host1x related devices. The
 latter could of course be done via listing the children of host1x.
 
 And third is that I was able to add version field to device to
 distinguish between Tegra2 and Tegra3 3D (for example) and make the bus
 use that information for binding driver and device. As the sysfs entry
 path is determined by the device name, this allows keeping the old sysfs
 APIs.

Again this can easily be represented by the DT compatible property.

Thierry


pgpXdIeTx2s7S.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
 On Wed, 27 Jun 2012 07:14:18 +0200
 Thierry Reding thierry.red...@avionic-design.de wrote:
 
  * PGP Signed by an unknown key
  
  On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
   On 06/26/2012 08:32 PM, Mark Zhang wrote:
On 06/26/2012 07:46 PM, Mark Zhang wrote:
On Tue, 26 Jun 2012 12:55:13 +0200
Thierry Reding thierry.red...@avionic-design.de wrote:
...
I'm not sure I understand how information about the carveout would be
obtained from the IOMMU API, though.
   
I think that can be similar with current gart implementation. Define 
carveout as:
   
carveout {
compatible = nvidia,tegra20-carveout;
size = 0x1000;
};
   
Then create a file such like tegra-carveout.c to get these 
definitions and
register itself as platform device's iommu instance.
   
The carveout isn't a HW object, so it doesn't seem appropriate to 
define a DT
node to represent it.

Yes. But I think it's better to export the size of carveout as a 
configurable item.
So we need to define this somewhere. How about define carveout as a 
property of gart?
   
   There already exists a way of preventing Linux from using certain chunks
   of memory; the /memreserve/ syntax. From a brief look at the dtc source,
   it looks like /memreserve/ entries can have labels, which implies that a
   property in the GART node could refer to the /memreserve/ entry by
   phandle in order to know what memory regions to use.
  
  Wasn't the whole point of using a carveout supposed to be a replacement
  for the GART?
 
 Mostly agree. IIUC, we use both carveout/gart allocated buffers in
 android/tegra2.
 
 As such I'd think the carveout should rather be a property
  of the host1x device.
 
 Rather than introducing a new property, how about using
 coherent_pool=??M in the kernel command line if necessary? I think
 that this carveout size depends on the system usage/load.

I was hoping that we could get away with using the CMA and perhaps
initialize it based on device tree content. I agree that the carveout
size depends on the use-case, but I still think it makes sense to
specify it on a per-board basis.

Thierry


pgphDFTbOZB7i.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Thierry Reding
On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
 On Wed, 27 Jun 2012 16:08:10 +0200
 Thierry Reding thierry.red...@avionic-design.de wrote:
 
  * PGP Signed by an unknown key
  
  On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
   On Wed, 27 Jun 2012 07:14:18 +0200
   Thierry Reding thierry.red...@avionic-design.de wrote:
   
 Old Signed by an unknown key

On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
 On 06/26/2012 08:32 PM, Mark Zhang wrote:
  On 06/26/2012 07:46 PM, Mark Zhang wrote:
  On Tue, 26 Jun 2012 12:55:13 +0200
  Thierry Reding thierry.red...@avionic-design.de wrote:
  ...
  I'm not sure I understand how information about the carveout 
  would be
  obtained from the IOMMU API, though.
 
  I think that can be similar with current gart implementation. 
  Define carveout as:
 
  carveout {
  compatible = nvidia,tegra20-carveout;
  size = 0x1000;
  };
 
  Then create a file such like tegra-carveout.c to get these 
  definitions and
  register itself as platform device's iommu instance.
 
  The carveout isn't a HW object, so it doesn't seem appropriate to 
  define a DT
  node to represent it.
  
  Yes. But I think it's better to export the size of carveout as a 
  configurable item.
  So we need to define this somewhere. How about define carveout as a 
  property of gart?
 
 There already exists a way of preventing Linux from using certain 
 chunks
 of memory; the /memreserve/ syntax. From a brief look at the dtc 
 source,
 it looks like /memreserve/ entries can have labels, which implies 
 that a
 property in the GART node could refer to the /memreserve/ entry by
 phandle in order to know what memory regions to use.

Wasn't the whole point of using a carveout supposed to be a replacement
for the GART?
   
   Mostly agree. IIUC, we use both carveout/gart allocated buffers in
   android/tegra2.
   
   As such I'd think the carveout should rather be a property
of the host1x device.
   
   Rather than introducing a new property, how about using
   coherent_pool=??M in the kernel command line if necessary? I think
   that this carveout size depends on the system usage/load.
  
  I was hoping that we could get away with using the CMA and perhaps
  initialize it based on device tree content. I agree that the carveout
  size depends on the use-case, but I still think it makes sense to
  specify it on a per-board basis.
 
 DRM driver doesn't know if it uses CMA or not, because DRM only uses
 DMA API.

So how is the DRM supposed to allocate buffers? Does it call the
dma_alloc_from_contiguous() function to do that? I can see how it is
used by arm_dma_ops but how does it end up in the driver?

 I think that coherent_pool can be used only when the amount of
 contiguous memory is short in your system. Otherwise even unnecessary.
 
 Could you explain a bit more why you want carveout size on per-board basis?

In the ideal case I would want to not have a carveout size at all.
However there may be situations where you need to make sure some driver
can allocate a given amount of memory. Having to specify this using a
kernel command-line parameter is cumbersome because it may require
changes to the bootloader or whatever. So if you know that a particular
board always needs 128 MiB of carveout, then it makes sense to specify
it on a per-board basis.

Thierry


pgpNU4XIfURTv.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Tue, 26 Jun 2012 16:00:33 +0200
Thierry Reding thierry.red...@avionic-design.de wrote:

 * PGP Signed by an unknown key
 
 On Tue, Jun 26, 2012 at 04:02:24PM +0300, Hiroshi Doyu wrote:
  Hi Thierry,
  
  On Tue, 26 Jun 2012 12:55:13 +0200
  Thierry Reding thierry.red...@avionic-design.de wrote:
  
Old Signed by an unknown key
   
   Hi,
   
   while I haven't got much time to work on the actual code right now, I
   think it might still be useful if we could get the device tree binding
   to a point where everybody is happy with it. That'll also save me some
   time once I get to writing the code because I won't have to redo it over
   again. =)
   
   So here's the current proposal:
   
 host1x {
 compatible = nvidia,tegra20-host1x, simple-bus;
 reg = 0x5000 0x00024000;
 interrupts = 0 64 0x04   /* cop syncpt */
   0 65 0x04   /* mpcore syncpt */
   0 66 0x04   /* cop general */
   0 67 0x04; /* mpcore general */
   
 #address-cells = 1;
 #size-cells = 1;
   
 ranges = 0x5400 0x5400 0x0400;
   
 status = disabled;
   
 gart = gart;
  ...
   output and hooks up a static EDID block with the LVDS output. There is
   also a carveout property which might be a better replacement for the
   crippled GART on Tegra20. Alternatively the CMA might work just as
   well instead.
   
   The Plutux can be described like this:
   
 host1x {
 carveout = 0x0e00 0x0200;
  
  As discussed in the following ML thread previously, the necessary info
  related to the gart would be got from the standard IOMMU API(or
  something above layers, DMABUF or TTM?). So I don't think that we need
  to refer to gart and carveout here in the end.
  
  http://lists.linuxfoundation.org/pipermail/iommu/2012-June/004266.html
 
 Yes, if IOMMU or some layer above can provide the same information, then
 that is certainly better than explicitly referencing it in the DT.
 
 I'm not sure I understand how information about the carveout would be
 obtained from the IOMMU API, though.

I think that there are 2 cases:

  (1) discontiguous memory with IOMMU
  (2) contiguous memory without IOMMU(called carveout in general?)

For (1), the necessary info should be passed via IOMMU API as
mentioned in the previous reply, like DT-{GART,SMMU}-IOMMU-DRM.

  [PATCH 0/5] IOMMU: Make IOMMU-API ready for GART-like hardware
https://lkml.org/lkml/2012/1/19/170

For (2), although memory is mostly anonymous one, we may need to know
how much to allocate, where we only need size. This size is not from
h/w feature, but it depends on the system load/usage. So I think that
this size can be passed from kernel command line? For exmaple, we can
specify how much contiguous memory is necessary with putting
coherent_pool=??M in the kernel command line as below:

coherent_pool=nn[KMG]   [ARM,KNL]
Sets the size of memory pool for coherent, atomic dma
allocations.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 04:32:07 +0200
Mark Zhang ma...@nvidia.com wrote:

  On 06/26/2012 07:46 PM, Mark Zhang wrote:
   On Tue, 26 Jun 2012 12:55:13 +0200
   Thierry Reding thierry.red...@avionic-design.de wrote:
  ...
   I'm not sure I understand how information about the carveout would be
   obtained from the IOMMU API, though.
  
   I think that can be similar with current gart implementation. Define 
   carveout as:
  
   carveout {
   compatible = nvidia,tegra20-carveout;
   size = 0x1000;
   };
  
   Then create a file such like tegra-carveout.c to get these definitions 
   and
  register itself as platform device's iommu instance.
  
  The carveout isn't a HW object, so it doesn't seem appropriate to define a 
  DT
  node to represent it.
  --
 
 Yes. But I think it's better to export the size of carveout as a configurable 
 item.
 So we need to define this somewhere. How about define carveout as a property 
 of gart?

I agree that the carveout size should be configurable. But it may not be 
related to gart.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 04:48:18 +0200
Stephen Warren swar...@nvidia.com wrote:

 On 06/26/2012 08:32 PM, Mark Zhang wrote:
  On 06/26/2012 07:46 PM, Mark Zhang wrote:
  On Tue, 26 Jun 2012 12:55:13 +0200
  Thierry Reding thierry.red...@avionic-design.de wrote:
  ...
  I'm not sure I understand how information about the carveout would be
  obtained from the IOMMU API, though.
 
  I think that can be similar with current gart implementation. Define 
  carveout as:
 
  carveout {
  compatible = nvidia,tegra20-carveout;
  size = 0x1000;
  };
 
  Then create a file such like tegra-carveout.c to get these definitions 
  and
  register itself as platform device's iommu instance.
 
  The carveout isn't a HW object, so it doesn't seem appropriate to define a 
  DT
  node to represent it.
  
  Yes. But I think it's better to export the size of carveout as a 
  configurable item.
  So we need to define this somewhere. How about define carveout as a 
  property of gart?
 
 There already exists a way of preventing Linux from using certain chunks
 of memory; the /memreserve/ syntax. From a brief look at the dtc source,
 it looks like /memreserve/ entries can have labels, which implies that a
 property in the GART node could refer to the /memreserve/ entry by
 phandle in order to know what memory regions to use.

I think that we don't need the starting address for carveout but we
need its size. carveout memory is just anonymous physically
continguous buffer.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 07:14:18 +0200
Thierry Reding thierry.red...@avionic-design.de wrote:

 * PGP Signed by an unknown key
 
 On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
  On 06/26/2012 08:32 PM, Mark Zhang wrote:
   On 06/26/2012 07:46 PM, Mark Zhang wrote:
   On Tue, 26 Jun 2012 12:55:13 +0200
   Thierry Reding thierry.red...@avionic-design.de wrote:
   ...
   I'm not sure I understand how information about the carveout would be
   obtained from the IOMMU API, though.
  
   I think that can be similar with current gart implementation. Define 
   carveout as:
  
   carveout {
   compatible = nvidia,tegra20-carveout;
   size = 0x1000;
   };
  
   Then create a file such like tegra-carveout.c to get these 
   definitions and
   register itself as platform device's iommu instance.
  
   The carveout isn't a HW object, so it doesn't seem appropriate to define 
   a DT
   node to represent it.
   
   Yes. But I think it's better to export the size of carveout as a 
   configurable item.
   So we need to define this somewhere. How about define carveout as a 
   property of gart?
  
  There already exists a way of preventing Linux from using certain chunks
  of memory; the /memreserve/ syntax. From a brief look at the dtc source,
  it looks like /memreserve/ entries can have labels, which implies that a
  property in the GART node could refer to the /memreserve/ entry by
  phandle in order to know what memory regions to use.
 
 Wasn't the whole point of using a carveout supposed to be a replacement
 for the GART?

Mostly agree. IIUC, we use both carveout/gart allocated buffers in
android/tegra2.

As such I'd think the carveout should rather be a property
 of the host1x device.

Rather than introducing a new property, how about using
coherent_pool=??M in the kernel command line if necessary? I think
that this carveout size depends on the system usage/load.

 AIUI what we want to do is have a large contiguous region of memory that
 a central component (host1x) manages as a pool from which clients (DRM,
 V4L, ...) can allocate buffers as needed. Since all of this memory will
 be contiguous anyway there isn't much use for the GART anymore.

Right.

I'd think that the amount of contiguous resion might depend on the
system usage/load.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Hiroshi Doyu
On Wed, 27 Jun 2012 16:08:10 +0200
Thierry Reding thierry.red...@avionic-design.de wrote:

 * PGP Signed by an unknown key
 
 On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
  On Wed, 27 Jun 2012 07:14:18 +0200
  Thierry Reding thierry.red...@avionic-design.de wrote:
  
Old Signed by an unknown key
   
   On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
On 06/26/2012 08:32 PM, Mark Zhang wrote:
 On 06/26/2012 07:46 PM, Mark Zhang wrote:
 On Tue, 26 Jun 2012 12:55:13 +0200
 Thierry Reding thierry.red...@avionic-design.de wrote:
 ...
 I'm not sure I understand how information about the carveout would 
 be
 obtained from the IOMMU API, though.

 I think that can be similar with current gart implementation. 
 Define carveout as:

 carveout {
 compatible = nvidia,tegra20-carveout;
 size = 0x1000;
 };

 Then create a file such like tegra-carveout.c to get these 
 definitions and
 register itself as platform device's iommu instance.

 The carveout isn't a HW object, so it doesn't seem appropriate to 
 define a DT
 node to represent it.
 
 Yes. But I think it's better to export the size of carveout as a 
 configurable item.
 So we need to define this somewhere. How about define carveout as a 
 property of gart?

There already exists a way of preventing Linux from using certain chunks
of memory; the /memreserve/ syntax. From a brief look at the dtc source,
it looks like /memreserve/ entries can have labels, which implies that a
property in the GART node could refer to the /memreserve/ entry by
phandle in order to know what memory regions to use.
   
   Wasn't the whole point of using a carveout supposed to be a replacement
   for the GART?
  
  Mostly agree. IIUC, we use both carveout/gart allocated buffers in
  android/tegra2.
  
  As such I'd think the carveout should rather be a property
   of the host1x device.
  
  Rather than introducing a new property, how about using
  coherent_pool=??M in the kernel command line if necessary? I think
  that this carveout size depends on the system usage/load.
 
 I was hoping that we could get away with using the CMA and perhaps
 initialize it based on device tree content. I agree that the carveout
 size depends on the use-case, but I still think it makes sense to
 specify it on a per-board basis.

DRM driver doesn't know if it uses CMA or not, because DRM only uses
DMA API. I think that coherent_pool can be used only when the amount
of contiguous memory is short in your system. Otherwise even unnecessary.

Could you explain a bit more why you want carveout size on per-board basis?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Tegra DRM device tree bindings

2012-06-27 Thread Lucas Stach
Am Mittwoch, den 27.06.2012, 16:44 +0200 schrieb Thierry Reding:
 On Wed, Jun 27, 2012 at 05:29:14PM +0300, Hiroshi Doyu wrote:
  On Wed, 27 Jun 2012 16:08:10 +0200
  Thierry Reding thierry.red...@avionic-design.de wrote:
  
   * PGP Signed by an unknown key
   
   On Wed, Jun 27, 2012 at 03:59:07PM +0300, Hiroshi Doyu wrote:
On Wed, 27 Jun 2012 07:14:18 +0200
Thierry Reding thierry.red...@avionic-design.de wrote:

  Old Signed by an unknown key
 
 On Tue, Jun 26, 2012 at 08:48:18PM -0600, Stephen Warren wrote:
  On 06/26/2012 08:32 PM, Mark Zhang wrote:
   On 06/26/2012 07:46 PM, Mark Zhang wrote:
   On Tue, 26 Jun 2012 12:55:13 +0200
   Thierry Reding thierry.red...@avionic-design.de wrote:
   ...
   I'm not sure I understand how information about the carveout 
   would be
   obtained from the IOMMU API, though.
  
   I think that can be similar with current gart implementation. 
   Define carveout as:
  
   carveout {
   compatible = nvidia,tegra20-carveout;
   size = 0x1000;
   };
  
   Then create a file such like tegra-carveout.c to get these 
   definitions and
   register itself as platform device's iommu instance.
  
   The carveout isn't a HW object, so it doesn't seem appropriate 
   to define a DT
   node to represent it.
   
   Yes. But I think it's better to export the size of carveout as a 
   configurable item.
   So we need to define this somewhere. How about define carveout as 
   a property of gart?
  
  There already exists a way of preventing Linux from using certain 
  chunks
  of memory; the /memreserve/ syntax. From a brief look at the dtc 
  source,
  it looks like /memreserve/ entries can have labels, which implies 
  that a
  property in the GART node could refer to the /memreserve/ entry by
  phandle in order to know what memory regions to use.
 
 Wasn't the whole point of using a carveout supposed to be a 
 replacement
 for the GART?

Mostly agree. IIUC, we use both carveout/gart allocated buffers in
android/tegra2.

As such I'd think the carveout should rather be a property
 of the host1x device.

Rather than introducing a new property, how about using
coherent_pool=??M in the kernel command line if necessary? I think
that this carveout size depends on the system usage/load.
   
   I was hoping that we could get away with using the CMA and perhaps
   initialize it based on device tree content. I agree that the carveout
   size depends on the use-case, but I still think it makes sense to
   specify it on a per-board basis.
  
  DRM driver doesn't know if it uses CMA or not, because DRM only uses
  DMA API.
 
 So how is the DRM supposed to allocate buffers? Does it call the
 dma_alloc_from_contiguous() function to do that? I can see how it is
 used by arm_dma_ops but how does it end up in the driver?
 
As I said before the DMA API is not a good fit for graphics drivers.
Most of the DMA buffers used by graphics cores are long lived and big,
so we need a special pool to alloc from to avoid eating all contiguous
address space, as DMA API does not provide shrinker callbacks for
clients using large amount of memory.

  I think that coherent_pool can be used only when the amount of
  contiguous memory is short in your system. Otherwise even unnecessary.
  
  Could you explain a bit more why you want carveout size on per-board basis?
 
 In the ideal case I would want to not have a carveout size at all.
 However there may be situations where you need to make sure some driver
 can allocate a given amount of memory. Having to specify this using a
 kernel command-line parameter is cumbersome because it may require
 changes to the bootloader or whatever. So if you know that a particular
 board always needs 128 MiB of carveout, then it makes sense to specify
 it on a per-board basis.

If we go with CMA, this is a non-issue, as CMA allows to use the contig
area for normal allocations and only purges them if it really needs the
space for contig allocs.
 
 Thierry


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Tegra DRM device tree bindings

2012-06-26 Thread Thierry Reding
On Tue, Jun 26, 2012 at 12:10:42PM -0600, Stephen Warren wrote:
> On 06/26/2012 04:55 AM, Thierry Reding wrote:
> > Hi,
> > 
> > while I haven't got much time to work on the actual code right now, I
> > think it might still be useful if we could get the device tree binding
> > to a point where everybody is happy with it. That'll also save me some
> > time once I get to writing the code because I won't have to redo it over
> > again. =)
> > 
> > So here's the current proposal:
> > 
> > host1x {
> > compatible = "nvidia,tegra20-host1x", "simple-bus";
> > reg = <0x5000 0x00024000>;
> > interrupts = <0 64 0x04   /* cop syncpt */
> >   0 65 0x04   /* mpcore syncpt */
> >   0 66 0x04   /* cop general */
> >   0 67 0x04>; /* mpcore general */
> > 
> > #address-cells = <1>;
> > #size-cells = <1>;
> > 
> > ranges = <0x5400 0x5400 0x0400>;
> > 
> > status = "disabled";
> 
> The idea behind status="disabled" is that some HW only makes sense to
> use on particular boards. This concept really only applies to HW modules
> that drive external interfaces on the SoC, which in turn the board can
> choose whether to connect anything to (or whether to even connect to any
> external pins using the pinmux, or not).
> 
> As such, I don't think it makes sense to set status="disabled" on
> host1x, nor many other internal-only engines such as say mpe, epp, i2sp,
> gr2d, gr3d, dc1, dc2.

What about power management and resource usage? If a board for instance
doesn't need gr3d at all it could just leave it at status = "disabled"
to not have the corresponding driver loaded and not waste the power and
resources.

> 
> However it does make sense for the output resources rgb, hdmi, tvo, dsi.
> 
> > /* outputs */
> > rgb {
> > compatible = "nvidia,tegra20-rgb";
> > status = "disabled";
> > };
> ...
> > The rgb node is something that I don't quite know how to handle yet.
> > Since it is really part of the display controller and uses its register
> > space, it isn't quite correct to represent it as a separate device. But
> > we will need a separate node to make it available as a connector, which
> > will become more obvious below.
> 
> Are you referring to the DC_COM_PIN_OUTPUT* registers? Sorry, I'm not at
> all familiar with our display HW yet.
> 
> Some possible solutions spring to mind:
> 
> a) The output nodes don't have to be direct children of host1x. Instead,
> each DC could have an rgb child node that represents its own individual
> output capability.

Yes, that idea had sprung to my mind as well. I rather like it, too.

> b) If the RGB-related registers in DC are completely independent of any
> other DC registers and grouped together well enough, we can just carve a
> chunk out of the DC register space and give that to the RGB node instead:
> 
> i.e. not:
> 
> > dc1: dc at 5420 {
> > compatible = "nvidia,tegra20-dc";
> > reg = <0x5420 0x0004>;
> > interrupts = <0 73 0x04>;
> > status = "disabled";
> > };
> 
> 
> but something more like (the completely made up example):
> 
> dc1: dc at 5420 {
> compatible = "nvidia,tegra20-dc";
> reg = <0x5420 0x0002 0x54203000 0x1>;
> interrupts = <0 73 0x04>;
> status = "disabled";
> };
> 
> rgb {
> compatible = "nvidia,tegra20-rgb";
> reg = <0x5422 0x0001>;
> status = "disabled";
> };
> 
> c) The rgb node could simply reference the dc nodes using a phandle, and
> call into the dc driver to obtain RGB configuration services from it:
> 
> rgb {
> compatible = "nvidia,tegra20-rgb";
> status = "disabled";
> nvidia,dcs = < >;
> };
> 
> By the way, if the RGB registers are in the DC, aren't there two
> separate RGB outputs. Certainly the TRM implies that both DCs can be
> driving LCDs, by reducing the width of the LCD signals that each DC uses
> (lower bits-per-pixel, or perhaps DDR encoding on the data lines).

Yes, there are two RGB outputs. Using alternative a) above should be
able to represent this quite well.

> > Board DTS files could then extend this with board-specific requirements
> > and connectors. The following describes the Medcom Wide:
> 
> > connectors {
> > #address-cells = <1>;
> > #size-cells = <0>;
> > 
> > };
> 
> The connector seems to be a property of the individual output resources.
> I'd expect to see the connector configuration be a child of the outputs
> that a particular board had enabled; something more like:
> 
> host1x {
> rgb {
> status = "okay";
> 
> connector at 0 {
> nvidia,edid = 

Tegra DRM device tree bindings

2012-06-26 Thread Thierry Reding
On Tue, Jun 26, 2012 at 11:46:42AM -0600, Stephen Warren wrote:
> On 06/26/2012 07:01 AM, Terje Bergstr?m wrote:
> > On 26.06.2012 13:55, Thierry Reding wrote:
> ...
> >> An alternative would be to call of_platform_populate() from the host1x
> 
> [alternative to making the host1x node contain compatible="simple-bus".]
> 
> >> driver. This has the advantage that it could integrate better with the
> >> host1x bus implementation that Terje is working on, but it also needs
> >> additional code to tear down the devices when the host1x driver is
> >> unloaded because a module reload would try to create duplicate devices
> >> otherwise.
> > 
> > Yes, we already have a bus_type for nvhost, and we have nvhost_device
> > and nvhost_driver that device from device and device_driver
> > respectively. They all accommodate some host1x client device common
> > behavior and data that we need to store. We use the bus_type also to
> > match each device and driver together, but the matching is version
> > sensitive. For example, Tegra2 3D needs different driver than Tegra3 3D.
> 
> I'd certainly like to see some upstream discussion re: why exactly we
> have a custom bus type here. What does it do that a regular platform bus
> doesn't do? Are those features something that would be generally useful
> to add to the existing platform bus? Can we instead add hooks into
> platform bus rather than creating a new bus? I recall you saying that
> the nvhost_bus duplicated a lot of code from the existing platform bus.

I don't like the idea of duplicating the code either, but host1x does
have some properties that might be convenient to have part of a bus
abstraction (sync points and channels). This could all be implemented
using a library of functions instead of a bus-type midlayer, though. I
think we've also had that discussion before.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-26 Thread Thierry Reding
On Tue, Jun 26, 2012 at 11:43:38AM -0600, Stephen Warren wrote:
> On 06/26/2012 07:41 AM, Thierry Reding wrote:
> > On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergstr?m wrote:
> >> On 26.06.2012 13:55, Thierry Reding wrote:
> ...
> >>> status = "disabled";
> >>> 
> >>> gart = <>;
> >>> 
> >>> /* video-encoding/decoding */ mpe { reg = <0x5404
> >>> 0x0004>; interrupts = <0 68 0x04>; status = "disabled"; };
> >> 
> >> 
> >> The client device interrupts are not very interesting, so they
> >> could be left out, too. Display controller related are probably
> >> an exception to this.
> > 
> > If the interrupts aren't used at all we should drop them.
> 
> I disagree here; "used" is most likely something specific to a
> particular OS's drivers. The HW always has the interrupts, and hence
> they should be described in DT.

Okay, I see. Does the same apply to the COP interrupts of the host1x
node in your opinion? I don't know if it makes sense to describe
something that's not reachable from the CPU. Yet it is defined in the
GIC.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-26 Thread Thierry Reding
On Tue, Jun 26, 2012 at 11:41:43AM -0600, Stephen Warren wrote:
> On 06/26/2012 08:02 AM, Terje Bergstr?m wrote:
> > On 26.06.2012 16:41, Thierry Reding wrote:
> > 
> >> On Tue, Jun 26, 2012 at 04:01:05PM +0300, Terje Bergstr?m wrote:
> >>> We also assign certain host1x common resources per device by convention,
> >>> f.ex. sync points, channels etc. We currently encode that information in
> >>> the device node (3D uses sync point number X, 2D uses numbers Y and Z).
> >>> The information is not actually describing hardware, as it just
> >>> describes the convention, so I'm not sure if device tree is the proper
> >>> place for it.
> >> Are they configurable? If so I think we should provide for them being
> >> specified in the device tree. They are still hardware resources being
> >> assigned to devices.
> > 
> > Yes, they're configurable, and there's nothing hardware specific in the
> > assignment of a sync point to a particular use. It's all just a software
> > agreement. That's why I'm a bit hesitant on putting it in device trees,
> > which are supposed to only describe hardware.
> 
> So I think that the DT can describe the existence of sync-points
> (presumably include a node for the sync-point HW device if it's
> separate). However, since the usage of each sync-point is entirely
> arbitrary, that seems like something which should be either assigned
> dynamically at run-time, or at least managed/assigned in SW at runtime
> somehow, rather than hard-coded into DT; it's more policy than HW.

The sync-points are part of the host1x device as I understand it. If
their usage is truly generic, then we can probably ignore them safely.
Maybe it'd make sense to carry a property that defines the number of
sync points available for the host1x hardware represented by the DT?

> >>> Either way is fine for me. The full addresses are more familiar to me as
> >>> we tend to use them internally.
> >
> >> Using the OF mechanism for translating the host1x bus addresses,
> >> relative to the host1x base address, to CPU addresses seems "purer", but
> >> either way should work fine.
> > 
> > I'll let you decide, as I don't have a strong opinion either way. I
> > guess whatever is the more common way wins.
> 
> I'd certainly prefer all the nodes to use the full/absolute address.
> That way, the DT will exactly match the addresses in the documentation.

Okay, I'll leave the ranges property as it is now.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Tegra DRM device tree bindings

2012-06-26 Thread Stephen Warren
On 06/26/2012 08:32 PM, Mark Zhang wrote:
>> On 06/26/2012 07:46 PM, Mark Zhang wrote:
> On Tue, 26 Jun 2012 12:55:13 +0200
> Thierry Reding  wrote:
>> ...
 I'm not sure I understand how information about the carveout would be
 obtained from the IOMMU API, though.
>>>
>>> I think that can be similar with current gart implementation. Define 
>>> carveout as:
>>>
>>> carveout {
>>> compatible = "nvidia,tegra20-carveout";
>>> size = <0x1000>;
>>> };
>>>
>>> Then create a file such like "tegra-carveout.c" to get these definitions and
>> register itself as platform device's iommu instance.
>>
>> The carveout isn't a HW object, so it doesn't seem appropriate to define a DT
>> node to represent it.
> 
> Yes. But I think it's better to export the size of carveout as a configurable 
> item.
> So we need to define this somewhere. How about define carveout as a property 
> of gart?

There already exists a way of preventing Linux from using certain chunks
of memory; the /memreserve/ syntax. From a brief look at the dtc source,
it looks like /memreserve/ entries can have labels, which implies that a
property in the GART node could refer to the /memreserve/ entry by
phandle in order to know what memory regions to use.


Tegra DRM device tree bindings

2012-06-26 Thread Stephen Warren
On 06/26/2012 07:46 PM, Mark Zhang wrote:
>>> On Tue, 26 Jun 2012 12:55:13 +0200
>>> Thierry Reding  wrote:
...
>> I'm not sure I understand how information about the carveout would be
>> obtained from the IOMMU API, though.
> 
> I think that can be similar with current gart implementation. Define carveout 
> as: 
> 
> carveout {
> compatible = "nvidia,tegra20-carveout";
> size = <0x1000>;
> };
> 
> Then create a file such like "tegra-carveout.c" to get these definitions and 
> register itself as platform device's iommu instance.

The carveout isn't a HW object, so it doesn't seem appropriate to define
a DT node to represent it.


  1   2   >