Re: [Mesa-dev] [PATCH] winsys/radeon: fix nop packet padding v2.

2014-07-25 Thread Jerome Glisse
On Thu, Jul 24, 2014 at 8:07 PM, Alex Deucher  wrote:

> On Thu, Jul 24, 2014 at 6:28 PM,   wrote:
> > From: Jerome Glisse 
> >
> > The ucode we got for hawaii does not support 0x1000 special nop
> > packet type 3 and this leads to gpu reading invalid memory. As packet
> > type 2 still exist just use packet type 2.
> >
> > Note this only partialy fix hawaii issues and some zbuffer tiling
> > issues are still present.
> >
> > Changed since v1:
> >   - use packet type 2 instead of packet 3.
>
> We don't need this change if we use the updated firmware in my 3.17
> tree.  Looks like the original hawaii CP ucode didn't support the new
> 0x1000 special case NOP packet.
>
>
I would rather have the nop2 packet solution than yet another is accel
working. Several
reasons :
  - 3.16 will be out soon and has most important fix
  - nop2 packet can easily be backported to stable mesa
  - testing for 3.16 is easy

So i think it would be cleaner to just do nop2 and 3.16.



> Alex
>
> >
> > Signed-off-by: Jérôme Glisse 
> > ---
> >  src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 9 ++---
> >  1 file changed, 2 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
> b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
> > index a06ecb2..9ac7d0e 100644
> > --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
> > +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
> > @@ -447,13 +447,8 @@ static void radeon_drm_cs_flush(struct
> radeon_winsys_cs *rcs,
> >  /* pad DMA ring to 8 DWs to meet CP fetch alignment requirements
> >   * r6xx, requires at least 4 dw alignment to avoid a hw bug.
> >   */
> > -if (cs->ws->info.chip_class <= SI) {
> > -while (rcs->cdw & 7)
> > -OUT_CS(&cs->base, 0x8000); /* type2 nop packet */
> > -} else {
> > -while (rcs->cdw & 7)
> > -OUT_CS(&cs->base, 0x1000); /* type3 nop packet */
> > -}
> > +while (rcs->cdw & 7)
> > +OUT_CS(&cs->base, 0x8000); /* type2 nop packet */
> >  break;
> >  case RING_UVD:
> >  while (rcs->cdw & 15)
> > --
> > 1.8.3.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] winsys/radeon: fix nop packet padding.

2014-07-24 Thread Jerome Glisse
On Thu, Jul 24, 2014 at 05:42:21PM -0400, j.gli...@gmail.com wrote:
> From: Jerome Glisse 
> 
> The gpu packet prefetcher hates the ugly big nop packet those leads
> to prefetching some invalid memory in some case. Apparently hawaii
> is particularly sensible to this.
> 
> Note this only partialy fix hawaii issues and some zbuffer tiling
> issues are still present.

Just to clarify this patch is almost good to go, there is the cs[MAX_DW-1]
case that need fixing and i am pondering on how to do that. Also i have not
tested on bonaire but i do expect that it should only fix thing and not
break things.

Cheers,
Jérôme

> 
> Signed-off-by: Jérôme Glisse 
> ---
>  src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
> b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
> index a06ecb2..502a550 100644
> --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
> @@ -451,8 +451,22 @@ static void radeon_drm_cs_flush(struct radeon_winsys_cs 
> *rcs,
>  while (rcs->cdw & 7)
>  OUT_CS(&cs->base, 0x8000); /* type2 nop packet */
>  } else {
> -while (rcs->cdw & 7)
> -OUT_CS(&cs->base, 0x1000); /* type3 nop packet */
> +switch (rcs->cdw & 7) {
> +case 0:
> +break;
> +case 7:
> +/* FIXME can this be bad if we are at cs[LAST_DW-1] ? Need to
> + * think of something.
> + */
> +OUT_CS(&cs->base, 0xc0001000);
> +OUT_CS(&cs->base, 0xcafedead);
> +/* Note we fallthrough as this will add another 7 dwords */
> +default:
> +OUT_CS(&cs->base, 0xc0001000 | (((8 - (rcs->cdw & 7)) - 1) 
> << 16));
> +while (rcs->cdw & 7) {
> +OUT_CS(&cs->base, 0xcafedead);
> +}
> +}
>  }
>  break;
>  case RING_UVD:
> -- 
> 1.8.3.1
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] rules for merging patches to libdrm

2013-11-18 Thread Jerome Glisse
On Mon, Nov 18, 2013 at 05:41:50PM +0100, Thierry Reding wrote:
> On Mon, Nov 18, 2013 at 11:21:36AM -0500, Rob Clark wrote:
> > On Mon, Nov 18, 2013 at 10:23 AM, Thierry Reding
> >  wrote:
> > > On Mon, Nov 18, 2013 at 10:17:47AM -0500, Rob Clark wrote:
> > >> On Mon, Nov 18, 2013 at 8:29 AM, Thierry Reding
> > >>  wrote:
> > >> > On Sat, Nov 09, 2013 at 01:26:24PM -0800, Ian Romanick wrote:
> > >> >> On 11/09/2013 12:11 AM, Dave Airlie wrote:
> > >> >> >>> How does this interact with the rule that kernel interfaces 
> > >> >> >>> require an
> > >> >> >>> open source userspace? Is "here are the mesa/libdrm patches that 
> > >> >> >>> use
> > >> >> >>> it" sufficient to get the kernel interface merged?
> > >> >> >>
> > >> >> >> That's my understanding: open source userspace needs to exist, but 
> > >> >> >> it
> > >> >> >> doesn't need to be merged upstream yet.
> > >> >> >
> > >> >> > Having an opensource userspace and having it committed to a final 
> > >> >> > repo
> > >> >> > are different things, as Daniel said patches on the mesa-list were
> > >> >> > sufficient, they're was no hurry to merge them considering a kernel
> > >> >> > release with the code wasn't close, esp with a 3 month release 
> > >> >> > window
> > >> >> > if the kernel merge window is close to that anyways.
> > >> >> >
> > >> >> >>> libdrm is easy to change and its releases are cheap. What problem 
> > >> >> >>> does
> > >> >> >>> committing code that uses an in-progress kernel interface to 
> > >> >> >>> libdrm
> > >> >> >>> cause? I guess I'm not understanding something.
> > >> >> >>
> > >> >> >
> > >> >> > Releases are cheap, but ABI breaks aren't so you can't just go 
> > >> >> > release
> > >> >> > a libdrm with an ABI for mesa then decide later it was a bad plan.
> > >> >> >
> > >> >> >> Introducing new kernel API usually involves assigning numbers for 
> > >> >> >> things
> > >> >> >> - a new ioctl number, new #defines for bitfield members, and so on.
> > >> >> >>
> > >> >> >> Multiple patches can be in flight at the same time.  For example, 
> > >> >> >> Abdiel
> > >> >> >> and I both defined execbuf2 flags:
> > >> >> >>
> > >> >> >> #define I915_EXEC_RS (1 << 13) (Abdiel's code)
> > >> >> >> #define I915_EXEC_OA (1 << 13) (my code)
> > >> >> >>
> > >> >> >> These obviously conflict.  One of the two will land, and the second
> > >> >> >> patch author will need to switch to (1 << 14) and resubmit.
> > >> >> >>
> > >> >> >> If we both decide to push to libdrm, we might get the order 
> > >> >> >> backwards,
> > >> >> >> or maybe one series won't get pushed at all (in this case, I'm 
> > >> >> >> planning
> > >> >> >> to drop my patch).  Waiting until one lands in the kernel avoids 
> > >> >> >> that
> > >> >> >> problem.  Normally, I believe we copy the kernel headers to 
> > >> >> >> userspace
> > >> >> >> and fix them up a bit.
> > >> >> >>
> > >> >> >> Dave may have other reasons; this is just the one I thought of.
> > >> >> >
> > >> >> > But mostly this, we've been stung by this exact thing happening
> > >> >> > before, and we made the process to stop it from happening again.
> > >> >>
> > >> >> Then in all honestly, commits to libdrm should be controlled by 
> > >> >> either a
> > >> >> single person or a small cabal... just like the kernel and the 
> > >> >> xserver.
> > >> >>  We're clearly in an uncomfortable middle area where we have a 
> > >> >> stringent
> > >> >> set of restrictions but no way to actually enforce them.
> > >> >
> > >> > That doesn't sound like a bad idea at all. It obviously causes more 
> > >> > work
> > >> > for whoever will be the gatekeeper(s).
> > >> >
> > >> > It seems to me that libdrm is currently more of a free-for-all type of
> > >> > project, and whoever merges some new feature required for a particular 
> > >> > X
> > >> > or Mesa driver cuts a new release so that the version number can be 
> > >> > used
> > >> > to track the dependency.
> > >> >
> > >> > I wonder if perhaps tying the libdrm releases more tightly to Linux
> > >> > kernel releases would help. Since there already is a requirement for 
> > >> > new
> > >> > kernel APIs to be merged before the libdrm equivalent can be merged,
> > >> > then having both release cycles in lockstep makes some sense.
> > >>
> > >> Not sure about strictly tying it to kernel releases would be ideal.
> > >> Not *everything* in libdrm is about new kernel APIs.  It tends to be
> > >> the place for things needed both by xorg ddx and mesa driver, which I
> > >> suppose is why it ends up a bit of a free-for-all.
> > >
> > > I didn't mean that every release would need to be tied to the Linux
> > > kernel. But whenever a new Linux kernel release was made, relevant
> > > changes from the public headers could be pulled into libdrm and a
> > > release be made. I could even imagine a matching of version numbers.
> > > libdrm releases could be numbered using the same major and minor as
> > > Linux kernels that they support. Micro version numbers could 

Re: [Mesa-dev] Update: UVD status on loongson 3a platform

2013-09-05 Thread Jerome Glisse
On Thu, Sep 05, 2013 at 03:29:52PM -0400, Jerome Glisse wrote:
> On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
> > Hi all,
> > 
> > This thread is about
> > http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
> > 
> > We recently find some interesting thing about UVD based playback on
> > loongson 3a plaform, and also find a way to fix the problem.
> > 
> > First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
> > caused the problem:
> > * If memcpy is implemented though 16B or 8B load/store instructions,
> > it will normally caused video mosaic. When insert a memcmp after the
> > copying code in memcpy, it will report the src and dest are not equal.
> > * If memcpy use 1B load/store instructions only, the memcmp after the
> > copying code reports equal.
> > 
> > Then we find the following changeset fixs out problem:
> > 
> > diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
> > b/src/gallium/drivers/radeon/radeon_uvd.c
> > index 2f98de2..f9599b6 100644
> > --- a/src/gallium/drivers/radeon/radeon_uvd.c
> > +++ b/src/gallium/drivers/radeon/radeon_uvd.c
> > @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
> >unsigned size)
> >  {
> >   buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false,
> > - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
> > + RADEON_DOMAIN_GTT);
> >   if (!buffer->buf)
> >   return false;
> > 
> > The VRAM is mapped to an uncached area in out platform, so, my
> > question is what could go wrong while using  >4B load/store
> > instructions in UVD workflow? Any idea?
> > 
> 
> How do you map the VRAM into user process mapping ? ie do you have
> something like Intel PAT or something like MTRR or something else.
> 
> In other word, can you map into process address space a region of
> io memory (GPU VRAM in this case) and mark it as uncached so that
> none of the access to it goes through CPU cache.
> 
> Cheers,
> Jerome

Also it might be that you can't do write combining on your platform,
which would be a major drawback as it's assume by radeon userspace.
I would need to check the pcie specification, but write combining is
probably not mandatory meaning that your architecture might not have
it. This would explain why only memset with byte size copy works.

Don't think there is any easy way to work around that.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Update: UVD status on loongson 3a platform

2013-09-05 Thread Jerome Glisse
On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
> Hi all,
> 
> This thread is about
> http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
> 
> We recently find some interesting thing about UVD based playback on
> loongson 3a plaform, and also find a way to fix the problem.
> 
> First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
> caused the problem:
> * If memcpy is implemented though 16B or 8B load/store instructions,
> it will normally caused video mosaic. When insert a memcmp after the
> copying code in memcpy, it will report the src and dest are not equal.
> * If memcpy use 1B load/store instructions only, the memcmp after the
> copying code reports equal.
> 
> Then we find the following changeset fixs out problem:
> 
> diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
> b/src/gallium/drivers/radeon/radeon_uvd.c
> index 2f98de2..f9599b6 100644
> --- a/src/gallium/drivers/radeon/radeon_uvd.c
> +++ b/src/gallium/drivers/radeon/radeon_uvd.c
> @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
>unsigned size)
>  {
>   buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false,
> - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
> + RADEON_DOMAIN_GTT);
>   if (!buffer->buf)
>   return false;
> 
> The VRAM is mapped to an uncached area in out platform, so, my
> question is what could go wrong while using  >4B load/store
> instructions in UVD workflow? Any idea?
> 

How do you map the VRAM into user process mapping ? ie do you have
something like Intel PAT or something like MTRR or something else.

In other word, can you map into process address space a region of
io memory (GPU VRAM in this case) and mark it as uncached so that
none of the access to it goes through CPU cache.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] SIGFPE in libdrm_radeon on evergreen

2013-05-20 Thread Jerome Glisse
On Mon, May 20, 2013 at 5:13 AM, Vadim Girlin  wrote:
> On 05/20/2013 11:27 AM, Dragomir Ivanov wrote:
>>
>> 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0,
>> surf=0x88d848,
>> level=0x88dea8, bpe=1, tile_split=0, offset=65536, start_level=0)
>>
>> It looks like division by 0. tile_split=0 from the call site.
>
>
> Yes, I'm just not sure why tile_split is 0 here and what is the best way to
> fix it, possibly in fact this is a consequence of some problem in r600g, not
> in the libdrm. Though probably libdrm should handle it more gracefully
> anyway.
>
> Vadim

Just a guess, ddx is not properly setting tile split on a surface and
then r600g call in trying to rebuild miptree ... I think i fixed issue
in ddx couple month ago but maybe i did not.

Cheers,
Jerome

>>
>>
>> On Mon, May 20, 2013 at 4:11 AM, Vadim Girlin 
>> wrote:
>>
>>> Reduced test app attached and below is gdb backtrace. I suspect something
>>> is not initialized properly but I'm not very familiar with this code.
>>>
>>> Vadim
>>>
>>>
>>> Program received signal SIGFPE, Arithmetic exception.
>>> 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0,
>>> surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536,
>>> start_level=0)
>>>   at radeon_surface.c:651
>>> 651 slice_pt = tileb / tile_split;
>>>
>>> #0  0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0,
>>> surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536,
>>> start_level=0)
>>>   at radeon_surface.c:651
>>> #1  0x76905eea in eg_surface_init_2d_miptrees (surf_man=0x633ea0,
>>> surf=0x88d848) at radeon_surface.c:807
>>> #2  0x76906062 in eg_surface_init (surf_man=0x633ea0,
>>> surf=0x88d848) at radeon_surface.c:863
>>> #3  0x76907fe6 in radeon_surface_init (surf_man=0x633ea0,
>>> surf=0x88d848) at radeon_surface.c:1901
>>> #4  0x7713260b in radeon_drm_winsys_surface_init (rws=0x6339a0,
>>> surf=0x88d848) at radeon_drm_winsys.c:477
>>> #5  0x770a3e1c in r600_setup_surface (screen=0x6340d0,
>>> rtex=0x88d760, pitch_in_bytes_override=0) at r600_texture.c:203
>>> #6  0x770a4774 in r600_texture_create_object (screen=0x6340d0,
>>> base=0x7fffd6d0, pitch_in_bytes_override=0, buf=0x0,
>>> surface=0x7fffc8e0)
>>>   at r600_texture.c:432
>>> #7  0x770a5268 in r600_texture_create (screen=0x6340d0,
>>> templ=0x7fffd6d0) at r600_texture.c:607
>>> #8  0x7708a5bd in r600_resource_create (screen=0x6340d0,
>>> templ=0x7fffd6d0) at r600_resource.c:38
>>> #9  0x77125579 in dri2_drawable_process_buffers
>>> (drawable=0x88af80, buffers=0x88aea0, buffer_count=1, atts=0x88b628,
>>> att_count=2) at dri2.c:283
>>> #10 0x7712590a in dri2_allocate_textures (drawable=0x88af80,
>>> statts=0x88b628, statts_count=2) at dri2.c:404
>>> #11 0x77123e6a in dri_st_framebuffer_validate (stfbi=0x88af80,
>>> statts=0x88b628, count=2, out=0x7fffd840) at dri_drawable.c:81
>>> #12 0x76e461c1 in st_framebuffer_validate (stfb=0x88b1e0,
>>> st=0x883870) at ../../src/mesa/state_tracker/**st_manager.c:193
>>>
>>> #13 0x76e472a8 in st_api_make_current (stapi=0x7761b9e0
>>> , stctxi=0x883870, stdrawi=0x88af80, streadi=0x88af80)
>>>   at ../../src/mesa/state_tracker/**st_manager.c:721
>>>
>>> #14 0x77122ce8 in dri_make_current (cPriv=0x7fdb70,
>>> driDrawPriv=0x88af40, driReadPriv=0x88af40) at dri_context.c:255
>>> #15 0x76c6ba1f in driBindContext (pcp=0x7fdb70, pdp=0x88af40,
>>> prp=0x88af40) at ../../../../src/mesa/drivers/**dri/common/dri_util.c:382
>>>
>>> #16 0x77dc57e3 in dri2_bind_context (context=0x7fd9d0,
>>> old=0x616650, draw=67108873, read=67108873) at dri2_glx.c:172
>>> #17 0x77d8c253 in MakeContextCurrent (dpy=0x602040,
>>> draw=67108873,
>>> read=67108873, gc_user=0x7fd9d0) at glxcurrent.c:269
>>> #18 0x00384e82713c in fgOpenWindow () from /lib64/libglut.so.3
>>> #19 0x00384e825afa in fgCreateWindow () from /lib64/libglut.so.3
>>> #20 0x00384e825b95 in fgCreateMenu () from /lib64/libglut.so.3
>>> #21 0x00384e823cd3 in glutCreateMenu () from /lib64/libglut.so.3
>>> #22 0x00400816 in main (argc=1, argv=0x7fffdf18) at test.c:17
>>>
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>>
>>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV

2013-05-02 Thread Jerome Glisse
On Wed, May 1, 2013 at 1:23 PM, Marek Olšák  wrote:
> This is a funny subject. Originally, we only used SURFACE_SYNC on
> Evergreen, which is what the hw guys recommend using, but then Jerome
> came and rewrote it with no reasonable argument to back it up (what he
> was trying to fix by his cache-flush rework is not fixed to this day),
> such that we now flush and invalidate more caches than we need.

I guess fixing lockup is not reasonable.

Jerome


> FLUSH_AND_INV isn't recommended, because it should be slower in theory
> when streamout is used. Frequent changes of streamout buffers would
> also flush and invalidate the framebuffer cache, which is undesirable.
> Unfortunately, I don't know of any apps using streamout.
>
> This patch looks good. However, once we start seeing apps taking full
> advantage of GL3 and GL4, we will have to switch back to SURFACE_SYNC
> at least for graphics.
>
> Marek
>
> On Fri, Apr 26, 2013 at 7:21 PM, Tom Stellard  wrote:
>> From: Tom Stellard 
>>
>> We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
>> when this flush flag is set, so flushing the dest caches with a
>> SURFACE_SYNC should not be necessary.
>>
>> The motivation for this change is that emitting a SURFACE_SYNC packet with
>> the CB bits set was causing compute shaders to hang on Cayman.
>> ---
>>  src/gallium/drivers/r600/r600_hw_context.c | 28 +---
>>  1 file changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
>> b/src/gallium/drivers/r600/r600_hw_context.c
>> index b4fb3bf..8aebd25 100644
>> --- a/src/gallium/drivers/r600/r600_hw_context.c
>> +++ b/src/gallium/drivers/r600/r600_hw_context.c
>> @@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx)
>> cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
>> cs->buf[cs->cdw++] = 
>> EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
>> if (rctx->chip_class >= EVERGREEN) {
>> -   cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB1_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB2_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB3_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB4_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB5_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB6_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB7_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB8_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB9_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB10_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB11_DEST_BASE_ENA(1) |
>> -   S_0085F0_DB_DEST_BASE_ENA(1) |
>> -   S_0085F0_TC_ACTION_ENA(1) |
>> -   S_0085F0_CB_ACTION_ENA(1) |
>> +   /* We were previously setting the CB and DB bits on
>> +* cp_coher_cntl, but this is unnecessary since
>> +* we are emitting the
>> +* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
>> +* Setting the CB bits was causing lockups when using
>> +* compute on cayman.
>> +*
>> +* XXX: Do even need to emit a surface sync packet 
>> here?
>> +* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
>> +* surface sync was not being emitted with the
>> +* R600_CONTEXT_FLUSH_AND_INV flag.
>> +*/
>> +   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
>> S_0085F0_DB_ACTION_ENA(1) |
>> S_0085F0_SH_ACTION_ENA(1) |
>> S_0085F0_SMX_ACTION_ENA(1) |
>> --
>> 1.8.1.5
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: use CP DMA for buffer clears on evergreen+

2013-04-24 Thread Jerome Glisse
On Wed, Apr 24, 2013 at 3:15 PM,   wrote:
> From: Alex Deucher 
>
> Lighter weight then using streamout.  Only evergreen
> and newer asics support embedded data as src with
> CP DMA.
>
> Signed-off-by: Alex Deucher 

Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/evergreen_hw_context.c |   66 
> +++
>  src/gallium/drivers/r600/evergreend.h   |   42 ++
>  src/gallium/drivers/r600/r600_blit.c|   10 +++-
>  src/gallium/drivers/r600/r600_pipe.h|3 +
>  4 files changed, 119 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
> b/src/gallium/drivers/r600/evergreen_hw_context.c
> index d980c18..7cab879 100644
> --- a/src/gallium/drivers/r600/evergreen_hw_context.c
> +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
> @@ -106,3 +106,69 @@ void evergreen_dma_copy(struct r600_context *rctx,
> util_range_add(&rdst->valid_buffer_range, dst_offset,
>dst_offset + size);
>  }
> +
> +/* The max number of bytes to copy per packet. */
> +#define CP_DMA_MAX_BYTE_COUNT ((1 << 21) - 8)
> +
> +void evergreen_cp_dma_clear_buffer(struct r600_context *rctx,
> +  struct pipe_resource *dst, uint64_t offset,
> +  unsigned size, uint32_t clear_value)
> +{
> +   struct radeon_winsys_cs *cs = rctx->rings.gfx.cs;
> +
> +   assert(size);
> +   assert(rctx->screen->has_cp_dma);
> +
> +   offset += r600_resource_va(&rctx->screen->screen, dst);
> +
> +   /* We flush the caches, because we might read from or write
> +* to resources which are bound right now. */
> +   rctx->flags |= R600_CONTEXT_INVAL_READ_CACHES |
> +  R600_CONTEXT_FLUSH_AND_INV |
> +  R600_CONTEXT_FLUSH_AND_INV_CB_META |
> +  R600_CONTEXT_FLUSH_AND_INV_DB_META |
> +  R600_CONTEXT_STREAMOUT_FLUSH |
> +  R600_CONTEXT_WAIT_3D_IDLE;
> +
> +   while (size) {
> +   unsigned sync = 0;
> +   unsigned byte_count = MIN2(size, CP_DMA_MAX_BYTE_COUNT);
> +   unsigned reloc;
> +
> +   r600_need_cs_space(rctx, 10 + (rctx->flags ? 
> R600_MAX_FLUSH_CS_DWORDS : 0), FALSE);
> +
> +   /* Flush the caches for the first copy only. */
> +   if (rctx->flags) {
> +   r600_flush_emit(rctx);
> +   }
> +
> +   /* Do the synchronization after the last copy, so that all 
> data is written to memory. */
> +   if (size == byte_count) {
> +   sync = PKT3_CP_DMA_CP_SYNC;
> +   }
> +
> +   /* This must be done after r600_need_cs_space. */
> +   reloc = r600_context_bo_reloc(rctx, &rctx->rings.gfx,
> + (struct r600_resource*)dst, 
> RADEON_USAGE_WRITE);
> +
> +   r600_write_value(cs, PKT3(PKT3_CP_DMA, 4, 0));
> +   r600_write_value(cs, clear_value);  /* DATA [31:0] */
> +   r600_write_value(cs, sync | PKT3_CP_DMA_SRC_SEL(2));/* 
> CP_SYNC [31] | SRC_SEL[30:29] */
> +   r600_write_value(cs, offset);   /* DST_ADDR_LO [31:0] */
> +   r600_write_value(cs, (offset >> 32) & 0xff);/* 
> DST_ADDR_HI [7:0] */
> +   r600_write_value(cs, byte_count);   /* COMMAND [29:22] | 
> BYTE_COUNT [20:0] */
> +
> +   r600_write_value(cs, PKT3(PKT3_NOP, 0, 0));
> +   r600_write_value(cs, reloc);
> +
> +   size -= byte_count;
> +   offset += byte_count;
> +   }
> +
> +   /* Invalidate the read caches. */
> +   rctx->flags |= R600_CONTEXT_INVAL_READ_CACHES;
> +
> +   util_range_add(&r600_resource(dst)->valid_buffer_range, offset,
> +  offset + size);
> +}
> +
> diff --git a/src/gallium/drivers/r600/evergreend.h 
> b/src/gallium/drivers/r600/evergreend.h
> index 53b68a4..5d72432 100644
> --- a/src/gallium/drivers/r600/evergreend.h
> +++ b/src/gallium/drivers/r600/evergreend.h
> @@ -118,6 +118,48 @@
>  #define PKT3_PREDICATE(x)   (((x) >> 0) & 0x1)
>  #define PKT0(index, count) (PKT_TYPE_S(0) | PKT0_BASE_INDEX_S(index) | 
> PKT_COUNT_S(count))
>
> +#define PKT3_CP_DMA0x41
> +/* 1. header
> + * 2. SRC_ADDR_LO [31:0] or DATA [31:0]
> + * 3. CP_SYNC [31] | SRC_SEL [30:29] | ENGINE [27] | DST_SEL [21:20] | 
> SRC_A

Re: [Mesa-dev] [PATCH] winsys/radeon: add command stream replay dump for faulty lockup

2013-03-27 Thread Jerome Glisse
On Wed, Mar 27, 2013 at 11:27 AM,   wrote:
> From: Jerome Glisse 
>
> Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to
> enable it.
>
> When enabled after each cs submission the code will try to detect lockup by
> waiting on one of the buffer of the cs to become idle, after a timeout it
> will consider that the cs triggered a lockup and will write a radeon_lockup.c
> file in current directory that have all information for replaying the cs.
>
> To build this file :
> gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm
>
> Signed-off-by: Jerome Glisse 

Maybe i should add the radeon_ctx.h file to winsys dir as you need it
to build the radeon_lockup.c i did not wanted to printf the whole
helper. For example you can check radeon_lockup.c and radeon_ctx.h
here :
http://people.freedesktop.org/~glisse/rlockup/

Note this is a radeon si verde capture for a 2d tiling that lockup
(can be hard lockup sometimes so be careful).

Cheers,
Jerome

> ---
>  src/gallium/winsys/radeon/drm/Makefile.sources |   1 +
>  src/gallium/winsys/radeon/drm/radeon_drm_bo.c  |  80 ++--
>  src/gallium/winsys/radeon/drm/radeon_drm_bo.h  |   2 +
>  src/gallium/winsys/radeon/drm/radeon_drm_cs.c  |   4 +
>  src/gallium/winsys/radeon/drm/radeon_drm_cs.h  |   6 +
>  src/gallium/winsys/radeon/drm/radeon_drm_cs_dump.c | 135 
> +
>  6 files changed, 191 insertions(+), 37 deletions(-)
>  create mode 100644 src/gallium/winsys/radeon/drm/radeon_drm_cs_dump.c
>
> diff --git a/src/gallium/winsys/radeon/drm/Makefile.sources 
> b/src/gallium/winsys/radeon/drm/Makefile.sources
> index 1d18d61..4ca5ebb 100644
> --- a/src/gallium/winsys/radeon/drm/Makefile.sources
> +++ b/src/gallium/winsys/radeon/drm/Makefile.sources
> @@ -1,4 +1,5 @@
>  C_SOURCES := \
> radeon_drm_bo.c \
> radeon_drm_cs.c \
> +   radeon_drm_cs_dump.c \
> radeon_drm_winsys.c
> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
> b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> index f4ac526..5a9493a 100644
> --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> @@ -391,14 +391,54 @@ static void radeon_bo_destroy(struct pb_buffer *_buf)
>  FREE(bo);
>  }
>
> +void *radeon_bo_do_map(struct radeon_bo *bo)
> +{
> +struct drm_radeon_gem_mmap args = {0};
> +void *ptr;
> +
> +/* Return the pointer if it's already mapped. */
> +if (bo->ptr)
> +return bo->ptr;
> +
> +/* Map the buffer. */
> +pipe_mutex_lock(bo->map_mutex);
> +/* Return the pointer if it's already mapped (in case of a race). */
> +if (bo->ptr) {
> +pipe_mutex_unlock(bo->map_mutex);
> +return bo->ptr;
> +}
> +args.handle = bo->handle;
> +args.offset = 0;
> +args.size = (uint64_t)bo->base.size;
> +if (drmCommandWriteRead(bo->rws->fd,
> +DRM_RADEON_GEM_MMAP,
> +&args,
> +sizeof(args))) {
> +pipe_mutex_unlock(bo->map_mutex);
> +fprintf(stderr, "radeon: gem_mmap failed: %p 0x%08X\n",
> +bo, bo->handle);
> +return NULL;
> +}
> +
> +ptr = os_mmap(0, args.size, PROT_READ|PROT_WRITE, MAP_SHARED,
> +   bo->rws->fd, args.addr_ptr);
> +if (ptr == MAP_FAILED) {
> +pipe_mutex_unlock(bo->map_mutex);
> +fprintf(stderr, "radeon: mmap failed, errno: %i\n", errno);
> +return NULL;
> +}
> +bo->ptr = ptr;
> +pipe_mutex_unlock(bo->map_mutex);
> +
> +return bo->ptr;
> +}
> +
>  static void *radeon_bo_map(struct radeon_winsys_cs_handle *buf,
> struct radeon_winsys_cs *rcs,
> enum pipe_transfer_usage usage)
>  {
>  struct radeon_bo *bo = (struct radeon_bo*)buf;
>  struct radeon_drm_cs *cs = (struct radeon_drm_cs*)rcs;
> -struct drm_radeon_gem_mmap args = {0};
> -void *ptr;
>
>  /* If it's not unsynchronized bo_map, flush CS if needed and then wait. 
> */
>  if (!(usage & PIPE_TRANSFER_UNSYNCHRONIZED)) {
> @@ -461,41 +501,7 @@ static void *radeon_bo_map(struct 
> radeon_winsys_cs_handle *buf,
>  }
>  }
>
> -/* Return the pointer if it's already mapped. */
> -if (bo->ptr)
> -return bo->ptr;
> -
> -/* Map the buffer. */
> -pipe_mutex_lock(bo->map_mutex);
> -/* Return the pointer if it's already mapped (in case of a race). */
> -if (bo->ptr) 

Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2

2013-03-27 Thread Jerome Glisse
On Wed, Mar 27, 2013 at 4:45 AM, Christian König
 wrote:
> Am 27.03.2013 01:43, schrieb Jerome Glisse:
>
>> On Tue, Mar 26, 2013 at 6:45 PM, Dave Airlie  wrote:
>>>>>>
>>>>>> correctly). But Marek is quite right that this only counts for state
>>>>>> objects
>>>>>> and makes no sense for set_* and draw_* calls (and I'm currently
>>>>>> thinking
>>>>>> how to avoid that and can't come up with a proper solution). Anyway
>>>>>> it's
>>>>>> definitely not an urgent problem for radeonsi.
>>>>>
>>>>> It will be a problem once we actually start caring about performance
>>>>> and, most importantly, the CPU overhead of the driver.
>>>>>
>>>>>> I still think that writing into the command buffers directly (e.g.
>>>>>> without
>>>>>> wrapper functions) is a bad idea, cause that lead to mixing driver
>>>>>> logic
>>>>>> and
>>>>>
>>>>> I'm convinced the exact opposite is a bad idea, because it adds
>>>>> another layer all commands must go through. A layer which brings no
>>>>> advantage. Think about apps which issue 1k-10k draw calls per frame.
>>>>> It's obvious that every byte moved around counts and the key to high
>>>>> framerate is to do (almost) nothing in the driver. It looks like the
>>>>> idea here is to make the driver as slow as possible.
>>>>>
>>>>>> packet building in r600g. For example just try to figure out how the
>>>>>> relocation in NOPs work by reading the source (please keep in mind
>>>>>> that
>>>>>> one
>>>>>> of the primary goals why AMD is supporting this driver is to give a
>>>>>> good
>>>>>> example code for customers who want to implement that stuff on their
>>>>>> own
>>>>>> systems).
>>>>>
>>>>> I'm shocked. Sacrificing performance in the name of making the code
>>>>> nicer for some customers? Seriously? I thought the plan was to make
>>>>> the best graphics driver ever.
>>>>
>>>>
>>>> Well, maybe I'm repeating myself: Performance is not a priority, it's
>>>> only
>>>> nice to have!
>>>>
>>>> Sorry to say so, but if we sacrifice a bit of performance for more code
>>>> readability than that is perfectly ok with me (Don't understand me wrong
>>>> I
>>>> would really prefer to replace the closed source driver today than
>>>> tomorrow,
>>>> it's unfortunately just not what I'm paid for).
>>>>
>>>> On the other hand, we are talking about perfectly optimizeable inline
>>>> functions and/or macros. All I'm saying is that we should structurize
>>>> the
>>>> code a bit more.
>>>
>>> Its okay to take steps in the right direction, but if you start taking
>>> steps that away
>>> from performance in lieu of code readability then please be prepared
>>> to deal with
>>> objections.
>>>
>>> The thing is in a lot of cases, code readability is in the eye of the
>>> beholder, I'm sure
>>> Jerome though r600g was perfectly readable when he wrote it, but a lot
>>> of us didn't
>>> and spent a lot of time trying to remove the CPU overheads, not least
>>> the amount of
>>> time Marek spent. The thing is performance is measureable, code
>>> readability isn't.
>>>
>>> Dave.
>>
>> Maybe once again you forgot why i did things the way i did them, i
>> explained myself to you back then, i designed r600g for a new kernel
>> api which was violently different from the cs one, my hope was that
>> the other kernel api would be better, it was not and i never pushed
>> more on that front. So r600g design was definitely not adapted to the
>> cs ioctl and not thinked for it. History often explain a lot of things
>> and people seems to forget about them.
>>
>> That being said, i too find ironic the code readability argument, if
>> one understand the cs ioctl then the r600g code as it's nowadays make
>> sense, but the radeonsi code is closer to what r600g use to be. So
>> assuming same ioctl i would say that radeonsi should move towards what
>> r600g is nowadays.
>>
>> Anyway just wanted to set history straight.
>
>
> Well I think you hit the point here quite well, may I ask what your kernel
> interface would have been looked like?
>
> Christian.

I use to have a branch on fdo with it, basicly what use to be
r600_hw_context was a nop in gallium and you had state in kernel (cb,
db, sampler view, sampler, ...) and you created them and then bound
them so everything was mostly security check at creation time and
bound time was pretty quick, it was also transaction based. Relocation
was easier too. Anyway it was a bad API, i know that in closed world
or more obscure stack you can have a kernel api that doesn't do much
security check and call it a day which gives you a lot more freedom on
api.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 6:45 PM, Dave Airlie  wrote:
>>
 correctly). But Marek is quite right that this only counts for state
 objects
 and makes no sense for set_* and draw_* calls (and I'm currently thinking
 how to avoid that and can't come up with a proper solution). Anyway it's
 definitely not an urgent problem for radeonsi.
>>>
>>> It will be a problem once we actually start caring about performance
>>> and, most importantly, the CPU overhead of the driver.
>>>
 I still think that writing into the command buffers directly (e.g.
 without
 wrapper functions) is a bad idea, cause that lead to mixing driver logic
 and
>>>
>>> I'm convinced the exact opposite is a bad idea, because it adds
>>> another layer all commands must go through. A layer which brings no
>>> advantage. Think about apps which issue 1k-10k draw calls per frame.
>>> It's obvious that every byte moved around counts and the key to high
>>> framerate is to do (almost) nothing in the driver. It looks like the
>>> idea here is to make the driver as slow as possible.
>>>
 packet building in r600g. For example just try to figure out how the
 relocation in NOPs work by reading the source (please keep in mind that
 one
 of the primary goals why AMD is supporting this driver is to give a good
 example code for customers who want to implement that stuff on their own
 systems).
>>>
>>> I'm shocked. Sacrificing performance in the name of making the code
>>> nicer for some customers? Seriously? I thought the plan was to make
>>> the best graphics driver ever.
>>
>>
>> Well, maybe I'm repeating myself: Performance is not a priority, it's only
>> nice to have!
>>
>> Sorry to say so, but if we sacrifice a bit of performance for more code
>> readability than that is perfectly ok with me (Don't understand me wrong I
>> would really prefer to replace the closed source driver today than tomorrow,
>> it's unfortunately just not what I'm paid for).
>>
>> On the other hand, we are talking about perfectly optimizeable inline
>> functions and/or macros. All I'm saying is that we should structurize the
>> code a bit more.
>
> Its okay to take steps in the right direction, but if you start taking
> steps that away
> from performance in lieu of code readability then please be prepared
> to deal with
> objections.
>
> The thing is in a lot of cases, code readability is in the eye of the
> beholder, I'm sure
> Jerome though r600g was perfectly readable when he wrote it, but a lot
> of us didn't
> and spent a lot of time trying to remove the CPU overheads, not least
> the amount of
> time Marek spent. The thing is performance is measureable, code
> readability isn't.
>
> Dave.

Maybe once again you forgot why i did things the way i did them, i
explained myself to you back then, i designed r600g for a new kernel
api which was violently different from the cs one, my hope was that
the other kernel api would be better, it was not and i never pushed
more on that front. So r600g design was definitely not adapted to the
cs ioctl and not thinked for it. History often explain a lot of things
and people seems to forget about them.

That being said, i too find ironic the code readability argument, if
one understand the cs ioctl then the r600g code as it's nowadays make
sense, but the radeonsi code is closer to what r600g use to be. So
assuming same ioctl i would say that radeonsi should move towards what
r600g is nowadays.

Anyway just wanted to set history straight.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] OpenGL ES 3.0 support

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 2:14 PM, Jordan Justen  wrote:
> On Tue, Mar 26, 2013 at 10:34 AM, Jerome Glisse  wrote:
>> On Tue, Mar 26, 2013 at 4:39 AM, violin yanev  wrote:
>>> Thanks for your replies guys!
>>>
>>> The output of eglinfo is:
>>> EGL API version: 1.4
>>> EGL vendor string: Mesa Project
>>> EGL version string: 1.4 (DRI2)
>>> EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2
>>> EGL extensions string:
>>> EGL_MESA_drm_image EGL_WL_bind_wayland_display EGL_KHR_image_base
>>> EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_renderbuffer_image
>>> EGL_KHR_surfaceless_context EGL_KHR_create_context
>>> EGL_NOK_swap_region EGL_NOK_texture_from_pixmap
>>> EGL_NV_post_sub_buffer
>>>
>>> So apparently ES3.0 is not a supported API :(
>>>
>>> @Jordan: do you know if one can reenable ES3 on Intel graphics? Is a special
>>> flag expected? I had read a message that Fedora 18 will enable ES3.0 by
>>> default?
>>>
>>> Violin
>>
>> AFAICT fedora won't enable ES3.0 due to patent uncertainty regarding
>> floating point format
>
> This feature should be usable on Intel hardware which is why it was
> enabled by default (for Intel hardware) in 9bdf5be.
>
> -Jordan

Fedora patch revert this commit.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] OpenGL ES 3.0 support

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 4:39 AM, violin yanev  wrote:
> Thanks for your replies guys!
>
> The output of eglinfo is:
> EGL API version: 1.4
> EGL vendor string: Mesa Project
> EGL version string: 1.4 (DRI2)
> EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2
> EGL extensions string:
> EGL_MESA_drm_image EGL_WL_bind_wayland_display EGL_KHR_image_base
> EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_renderbuffer_image
> EGL_KHR_surfaceless_context EGL_KHR_create_context
> EGL_NOK_swap_region EGL_NOK_texture_from_pixmap
> EGL_NV_post_sub_buffer
>
> So apparently ES3.0 is not a supported API :(
>
> @Jordan: do you know if one can reenable ES3 on Intel graphics? Is a special
> flag expected? I had read a message that Fedora 18 will enable ES3.0 by
> default?
>
> Violin

AFAICT fedora won't enable ES3.0 due to patent uncertainty regarding
floating point format, you can however build mesa yourself and enable
floating point format that would give you ES3.0 support.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 12:40 PM, Marek Olšák  wrote:
> On Tue, Mar 26, 2013 at 3:59 PM, Christian König
>  wrote:
>> Am 26.03.2013 15:34, schrieb Marek Olšák:
>>
>>> Speaking of si_pm4_state, I think it's a horrible mechanism for
>>> anything other than constant state objects (create/bind/delete
>>> functions). For everything else (set/draw functions), you want to emit
>>> directly into the command stream. It's not so different from the bad
>>> state management which r600g used to have (which is now gone). If you
>>> have to call malloc or calloc in a set_* or draw_* function, you're
>>> doing it wrong. Are there plans to change it to something more
>>> efficient (e.g. how r300g and r600g emit non-CSO states right now), or
>>> will it be like this forever?
>>
>>
>> Actually I hoped that r600g sooner or later moves into the same direction
>> some more. The fact that we currently need to malloc every buffer indeed
>> sucks badly, but that is still better than mixing packet generation with
>> driver logic.
>
> I don't understand the last sentence. What mixing? The set_* and
> draw_* commands are supposed to be executed immediately, therefore
> it's reasonable and preferable to write to the CS directly. Having any
> intermediate storage for commands is a waste of time and space.

I agree here, i don't think uncached bo for command stream on new hw
would bring huge perf increase, probably will just be noise.

>>
>> Also I don't think that emitting directly into the command stream is such a
>> good idea, we sooner or later want that buffer to be a buffer allocated in
>> GART memory. And under this condition it is better to build up the commands
>> in a (heavily cached) system memory and then memcpy then to the destination
>> buffer.
>
> AFAIK, GART memory is cached on non-AGP systems, but even uncached
> access shouldn't be a big issue, because the access pattern is
> sequential and write-only. BTW, I have talked about emitting commands
> into a buffer object with Dave and he thinks it's a bad idea due to
> the map and unmap overhead. Also, we have to disallow writing to
> certain unsafe registers anyway.
>
> Marek

I think Christian is thinking about new hw > cayman where we can skip
register checking because of vm and hardware register checking (the hw
CP checks that register in the user IB is not one of the privilege
register and block write and throw irq if so). On this kind of hw you
can have cmd stream in bo and don't do the map/unmap.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 6:22 AM, Christian König
 wrote:
> Am 25.03.2013 18:15, schrieb j.gli...@gmail.com:
>
>> From: Jerome Glisse 
>>
>> Same as on r600, trace cs execution by writting cs offset after each
>> states, this allow to pin point lockup inside command stream and
>> narrow down the scope of lockup investigation.
>>
>> v2: Use WRITE_DATA packet instead of WRITE_MEM
>>
>> Signed-off-by: Jerome Glisse 
>> ---
>>   src/gallium/drivers/radeonsi/r600_hw_context.c | 61
>> ++
>>   src/gallium/drivers/radeonsi/radeonsi_pipe.c   | 22 ++
>>   src/gallium/drivers/radeonsi/radeonsi_pipe.h   | 12 +
>>   src/gallium/drivers/radeonsi/radeonsi_pm4.c| 12 +
>>   src/gallium/drivers/radeonsi/si_state_draw.c   |  7 ++-
>>   src/gallium/drivers/radeonsi/sid.h | 14 ++
>>   6 files changed, 127 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c
>> b/src/gallium/drivers/radeonsi/r600_hw_context.c
>> index bd348f9..967f093 100644
>> --- a/src/gallium/drivers/radeonsi/r600_hw_context.c
>> +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
>> @@ -142,6 +142,12 @@ void si_need_cs_space(struct r600_context *ctx,
>> unsigned num_dw,
>> /* Save 16 dwords for the fence mechanism. */
>> num_dw += 16;
>>   +#if R600_TRACE_CS
>> +   if (ctx->screen->trace_bo) {
>> +   num_dw += R600_TRACE_CS_DWORDS;
>> +   }
>> +#endif
>> +
>> /* Flush if there's not enough space. */
>> if (num_dw > RADEON_MAX_CMDBUF_DWORDS) {
>> radeonsi_flush(&ctx->context, NULL, RADEON_FLUSH_ASYNC);
>> @@ -206,9 +212,41 @@ void si_context_flush(struct r600_context *ctx,
>> unsigned flags)
>> /* force to keep tiling flags */
>> flags |= RADEON_FLUSH_KEEP_TILING_FLAGS;
>>   +#if R600_TRACE_CS
>> +   if (ctx->screen->trace_bo) {
>> +   struct r600_screen *rscreen = ctx->screen;
>> +   unsigned i;
>> +
>> +   for (i = 0; i < cs->cdw; i++) {
>> +   fprintf(stderr, "[%4d] [%5d] 0x%08x\n",
>> rscreen->cs_count, i, cs->buf[i]);
>> +   }
>> +   rscreen->cs_count++;
>> +   }
>> +#endif
>> +
>> /* Flush the CS. */
>> ctx->ws->cs_flush(ctx->cs, flags);
>>   +#if R600_TRACE_CS
>> +   if (ctx->screen->trace_bo) {
>> +   struct r600_screen *rscreen = ctx->screen;
>> +   unsigned i;
>> +
>> +   for (i = 0; i < 10; i++) {
>> +   usleep(5);
>> +   if
>> (!ctx->ws->buffer_is_busy(rscreen->trace_bo->buf, RADEON_USAGE_READWRITE)) {
>> +   break;
>> +   }
>> +   }
>> +   if (i == 10) {
>> +   fprintf(stderr, "timeout on cs lockup likely
>> happen at cs %d dw %d\n",
>> +   rscreen->trace_ptr[1],
>> rscreen->trace_ptr[0]);
>> +   } else {
>> +   fprintf(stderr, "cs %d executed in %dms\n",
>> rscreen->trace_ptr[1], i * 5);
>> +   }
>> +   }
>> +#endif
>> +
>> ctx->pm4_dirty_cdwords = 0;
>> ctx->flags = 0;
>>   @@ -665,3 +703,26 @@ void r600_context_draw_opaque_count(struct
>> r600_context *ctx, struct r600_so_tar
>> cs->buf[cs->cdw++] = r600_context_bo_reloc(ctx, t->filled_size,
>> RADEON_USAGE_READ);
>> }
>> +
>> +#if R600_TRACE_CS
>> +void r600_trace_emit(struct r600_context *rctx)
>> +{
>> +   struct r600_screen *rscreen = rctx->screen;
>> +   struct radeon_winsys_cs *cs = rctx->cs;
>> +   uint64_t va;
>> +   uint32_t reloc;
>> +
>> +   va = r600_resource_va(&rscreen->screen, (void*)rscreen->trace_bo);
>> +   reloc = r600_context_bo_reloc(rctx, rscreen->trace_bo,
>> RADEON_USAGE_READWRITE);
>> +   cs->buf[cs->cdw++] = PKT3(PKT3_WRITE_DATA, 4, 0);
>> +   cs->buf[cs->cdw++] =
>> PKT3_WRITE_DATA_DST_SEL(PKT3_WRITE_DATA_DST_SEL_MEM_SYNC) |
>> +   PKT3_WRITE_DATA_WR_CONFIRM |
>> +
>> PKT3_WRITE_DATA_ENGINE_SEL(PKT3_WRITE_DATA_ENGINE_SEL_ME);
>> +   cs->buf[cs->cdw+

Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing

2013-03-25 Thread Jerome Glisse
On Mon, Mar 25, 2013 at 1:12 PM, Christian König
 wrote:
> Am 25.03.2013 17:50, schrieb Jerome Glisse:
>
>> On Mon, Mar 25, 2013 at 12:38 PM, Christian König
>>  wrote:
>>>
>>> Am 25.03.2013 17:01, schrieb j.gli...@gmail.com:
>>>
>>>> From: Jerome Glisse 
>>>>
>>>> Same as on r600, trace cs execution by writting cs offset after each
>>>> states, this allow to pin point lockup inside command stream and
>>>> narrow down the scope of lockup investigation.
>>>>
>>>> Signed-off-by: Jerome Glisse 
>>>
>>>
>>> Could your rewrite this to use an si_pm4_state instead of hand coding it?
>>> It's cleaner and should reduce the needed code quite a bit.
>>>
>>> Christian.
>>
>> Well no, the whole point is to emit inside each si_pm4_state_emit so
>> that you can pin point which reg/packet trigger the lockup.
>
>
> Ok, well then it makes no sense that you increment the counter only once per
> flush.
>
> Christian.

The counter is for tracking the cs number (number of call to cs
ioctl), while in r600_emit_trace i emit both the counter and the
cs->cdw value so that you have both the dwords offset of last trace
that went through as well as which cs ioctl call it was. The printf of
command stream print both so that you can easily pin point things.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing

2013-03-25 Thread Jerome Glisse
On Mon, Mar 25, 2013 at 12:38 PM, Christian König
 wrote:
> Am 25.03.2013 17:01, schrieb j.gli...@gmail.com:
>
>> From: Jerome Glisse 
>>
>> Same as on r600, trace cs execution by writting cs offset after each
>> states, this allow to pin point lockup inside command stream and
>> narrow down the scope of lockup investigation.
>>
>> Signed-off-by: Jerome Glisse 
>
>
> Could your rewrite this to use an si_pm4_state instead of hand coding it?
> It's cleaner and should reduce the needed code quite a bit.
>
> Christian.

Well no, the whole point is to emit inside each si_pm4_state_emit so
that you can pin point which reg/packet trigger the lockup.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing

2013-03-25 Thread Jerome Glisse
On Mon, Mar 25, 2013 at 12:17 PM, Michel Dänzer  wrote:
> On Mon, 2013-03-25 at 12:01 -0400, j.gli...@gmail.com wrote:
>> From: Jerome Glisse 
>>
>> Same as on r600, trace cs execution by writting cs offset after each
>> states, this allow to pin point lockup inside command stream and
>> narrow down the scope of lockup investigation.
>>
>> Signed-off-by: Jerome Glisse 
>
> [...]
>
>> diff --git a/src/gallium/drivers/radeonsi/r600_texture.c 
>> b/src/gallium/drivers/radeonsi/r600_texture.c
>> index 6cafc3d..3d074a3 100644
>> --- a/src/gallium/drivers/radeonsi/r600_texture.c
>> +++ b/src/gallium/drivers/radeonsi/r600_texture.c
>> @@ -550,7 +550,7 @@ struct pipe_resource *si_texture_create(struct 
>> pipe_screen *screen,
>>
>>   if (!(templ->flags & R600_RESOURCE_FLAG_TRANSFER) &&
>>   !(templ->bind & PIPE_BIND_SCANOUT)) {
>> - array_mode = V_009910_ARRAY_2D_TILED_THIN1;
>> + array_mode = V_009910_ARRAY_1D_TILED_THIN1;
>>   }
>>
>>   r = r600_init_surface(rscreen, &surface, templ, array_mode,
>
> What's this hunk doing in here? :)
>
> The rest looks good to me on a quick look.
>

Oops i did it on top of my 2d tiling stuff

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/radeonsi: unreference previous fence in flush

2013-03-04 Thread Jerome Glisse
On Mon, Mar 4, 2013 at 2:05 PM, Michel Dänzer  wrote:
> On Mon, 2013-03-04 at 13:17 -0500, j.gli...@gmail.com wrote:
>> From: Jerome Glisse 
>>
>> Some code calling the flush function gave a fence pointer that point
>> to an old fence and should be unreference to avoid leaking fence.
>>
>> Candidate for 9.1
>>
>> Signed-off-by: Jerome Glisse 
>> ---
>>  src/gallium/drivers/r600/r600_pipe.c | 8 +---
>>  src/gallium/drivers/radeonsi/radeonsi_pipe.c | 9 ++---
>>  2 files changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
>> b/src/gallium/drivers/r600/r600_pipe.c
>> index 78002ae..4bcfc67 100644
>> --- a/src/gallium/drivers/r600/r600_pipe.c
>> +++ b/src/gallium/drivers/r600/r600_pipe.c
>> @@ -145,12 +145,14 @@ static void r600_flush_from_st(struct pipe_context 
>> *ctx,
>>  enum pipe_flush_flags flags)
>>  {
>>   struct r600_context *rctx = (struct r600_context *)ctx;
>> - struct r600_fence **rfence = (struct r600_fence**)fence;
>> + struct r600_fence *rfence;
>>   unsigned fflags;
>>
>>   fflags = flags & PIPE_FLUSH_END_OF_FRAME ? RADEON_FLUSH_END_OF_FRAME : 
>> 0;
>> - if (rfence) {
>> - *rfence = r600_create_fence(rctx);
>> + if (fence) {
>> + rfence = r600_create_fence(rctx);
>> + ctx->screen->fence_reference(ctx->screen, fence,
>> + (struct pipe_fence_handle 
>> *)rfence);
>
> This change increases the reference count of the returned fence from 1
> to 2. I don't think that's correct, but if it is, the change should be
> amended with an explanation why.
>

No i have uncommited change in my tree. I will probably resend with
the xa patchset.

Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] r600g: unify vgt states

2013-03-04 Thread Jerome Glisse
On Wed, Feb 27, 2013 at 6:11 PM, Marek Olšák  wrote:
> The states were split because we thought it caused a hardlock. Now we know
> the hardlock was caused by something else and has since been fixed.

For the serie:
Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/evergreen_state.c   |3 +--
>  src/gallium/drivers/r600/r600_hw_context.c   |1 -
>  src/gallium/drivers/r600/r600_pipe.h |6 --
>  src/gallium/drivers/r600/r600_state.c|3 +--
>  src/gallium/drivers/r600/r600_state_common.c |   22 +++---
>  5 files changed, 9 insertions(+), 26 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index 205bbc5..244989d 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -2615,8 +2615,7 @@ void evergreen_init_state_functions(struct r600_context 
> *rctx)
> r600_init_atom(rctx, 
> &rctx->samplers[PIPE_SHADER_GEOMETRY].views.atom, id++, 
> evergreen_emit_gs_sampler_views, 0);
> r600_init_atom(rctx, 
> &rctx->samplers[PIPE_SHADER_FRAGMENT].views.atom, id++, 
> evergreen_emit_ps_sampler_views, 0);
>
> -   r600_init_atom(rctx, &rctx->vgt_state.atom, id++, 
> r600_emit_vgt_state, 6);
> -   r600_init_atom(rctx, &rctx->vgt2_state.atom, id++, 
> r600_emit_vgt2_state, 3);
> +   r600_init_atom(rctx, &rctx->vgt_state.atom, id++, 
> r600_emit_vgt_state, 7);
>
> if (rctx->chip_class == EVERGREEN) {
> r600_init_atom(rctx, &rctx->sample_mask.atom, id++, 
> evergreen_emit_sample_mask, 3);
> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
> b/src/gallium/drivers/r600/r600_hw_context.c
> index 91af6b8..b78b004 100644
> --- a/src/gallium/drivers/r600/r600_hw_context.c
> +++ b/src/gallium/drivers/r600/r600_hw_context.c
> @@ -827,7 +827,6 @@ void r600_begin_new_cs(struct r600_context *ctx)
> ctx->framebuffer.atom.dirty = true;
> ctx->poly_offset_state.atom.dirty = true;
> ctx->vgt_state.atom.dirty = true;
> -   ctx->vgt2_state.atom.dirty = true;
> ctx->sample_mask.atom.dirty = true;
> ctx->scissor.atom.dirty = true;
> ctx->config_state.atom.dirty = true;
> diff --git a/src/gallium/drivers/r600/r600_pipe.h 
> b/src/gallium/drivers/r600/r600_pipe.h
> index 570a284..4cfade1 100644
> --- a/src/gallium/drivers/r600/r600_pipe.h
> +++ b/src/gallium/drivers/r600/r600_pipe.h
> @@ -127,10 +127,6 @@ struct r600_vgt_state {
> struct r600_atom atom;
> uint32_t vgt_multi_prim_ib_reset_en;
> uint32_t vgt_multi_prim_ib_reset_indx;
> -};
> -
> -struct r600_vgt2_state {
> -   struct r600_atom atom;
> uint32_t vgt_indx_offset;
>  };
>
> @@ -506,7 +502,6 @@ struct r600_context {
> struct r600_config_stateconfig_state;
> struct r600_stencil_ref_state   stencil_ref;
> struct r600_vgt_state   vgt_state;
> -   struct r600_vgt2_state  vgt2_state;
> struct r600_viewport_state  viewport;
> /* Shaders and shader resources. */
> struct r600_cso_state   vertex_fetch_shader;
> @@ -733,7 +728,6 @@ void r600_emit_cso_state(struct r600_context *rctx, 
> struct r600_atom *atom);
>  void r600_emit_alphatest_state(struct r600_context *rctx, struct r600_atom 
> *atom);
>  void r600_emit_blend_color(struct r600_context *rctx, struct r600_atom 
> *atom);
>  void r600_emit_vgt_state(struct r600_context *rctx, struct r600_atom *atom);
> -void r600_emit_vgt2_state(struct r600_context *rctx, struct r600_atom *atom);
>  void r600_emit_clip_misc_state(struct r600_context *rctx, struct r600_atom 
> *atom);
>  void r600_emit_stencil_ref(struct r600_context *rctx, struct r600_atom 
> *atom);
>  void r600_emit_viewport_state(struct r600_context *rctx, struct r600_atom 
> *atom);
> diff --git a/src/gallium/drivers/r600/r600_state.c 
> b/src/gallium/drivers/r600/r600_state.c
> index bbff6bd..fd3b14e 100644
> --- a/src/gallium/drivers/r600/r600_state.c
> +++ b/src/gallium/drivers/r600/r600_state.c
> @@ -2312,8 +2312,7 @@ void r600_init_state_functions(struct r600_context 
> *rctx)
> r600_init_atom(rctx, 
> &rctx->samplers[PIPE_SHADER_FRAGMENT].views.atom, id++, 
> r600_emit_ps_sampler_views, 0);
> r600_init_atom(rctx, &rctx->vertex_buffer_state.atom, id++, 
> r600_emit_vertex_buffers, 0);
>
> -   r600_init_atom(rctx, &rctx->vgt_state.atom, id++, 
> r600_emit_vgt_state, 6);
> -   r600_init_atom(rctx, &rctx->vgt2_state.atom, id++, 
> r

Re: [Mesa-dev] [PATCH 1/6] r600g: inline r600_pipe_shader function

2013-03-04 Thread Jerome Glisse
On Sun, Mar 3, 2013 at 8:39 AM, Marek Olšák  wrote:
> also change names of other functions, so that they make sense

For the serie:
Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/evergreen_state.c   |4 +-
>  src/gallium/drivers/r600/r600_pipe.h |8 +--
>  src/gallium/drivers/r600/r600_shader.c   |   89 
> --
>  src/gallium/drivers/r600/r600_state.c|4 +-
>  src/gallium/drivers/r600/r600_state_common.c |4 +-
>  5 files changed, 51 insertions(+), 58 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index 97f91df..5c7cd40 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -3311,7 +3311,7 @@ void evergreen_init_atom_start_cs(struct r600_context 
> *rctx)
> eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (32 * 4), 
> 0x01000FFF);
>  }
>
> -void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader)
> +void evergreen_update_ps_state(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader)
>  {
> struct r600_context *rctx = (struct r600_context *)ctx;
> struct r600_pipe_state *rstate = &shader->rstate;
> @@ -3460,7 +3460,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
> struct r600_pipe_shader
> shader->flatshade = rctx->rasterizer->flatshade;
>  }
>
> -void evergreen_pipe_shader_vs(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader)
> +void evergreen_update_vs_state(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader)
>  {
> struct r600_context *rctx = (struct r600_context *)ctx;
> struct r600_pipe_state *rstate = &shader->rstate;
> diff --git a/src/gallium/drivers/r600/r600_pipe.h 
> b/src/gallium/drivers/r600/r600_pipe.h
> index 3eb2968..28c7de3 100644
> --- a/src/gallium/drivers/r600/r600_pipe.h
> +++ b/src/gallium/drivers/r600/r600_pipe.h
> @@ -626,8 +626,8 @@ void cayman_init_common_regs(struct r600_command_buffer 
> *cb,
>
>  void evergreen_init_state_functions(struct r600_context *rctx);
>  void evergreen_init_atom_start_cs(struct r600_context *rctx);
> -void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader);
> -void evergreen_pipe_shader_vs(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader);
> +void evergreen_update_ps_state(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader);
> +void evergreen_update_vs_state(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader);
>  void *evergreen_create_db_flush_dsa(struct r600_context *rctx);
>  void *evergreen_create_resolve_blend(struct r600_context *rctx);
>  void *evergreen_create_decompress_blend(struct r600_context *rctx);
> @@ -701,8 +701,8 @@ r600_create_sampler_view_custom(struct pipe_context *ctx,
> unsigned width_first_level, unsigned 
> height_first_level);
>  void r600_init_state_functions(struct r600_context *rctx);
>  void r600_init_atom_start_cs(struct r600_context *rctx);
> -void r600_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader 
> *shader);
> -void r600_pipe_shader_vs(struct pipe_context *ctx, struct r600_pipe_shader 
> *shader);
> +void r600_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader 
> *shader);
> +void r600_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader 
> *shader);
>  void *r600_create_db_flush_dsa(struct r600_context *rctx);
>  void *r600_create_resolve_blend(struct r600_context *rctx);
>  void *r700_create_resolve_blend(struct r600_context *rctx);
> diff --git a/src/gallium/drivers/r600/r600_shader.c 
> b/src/gallium/drivers/r600/r600_shader.c
> index 949191a..7ecab7b 100644
> --- a/src/gallium/drivers/r600/r600_shader.c
> +++ b/src/gallium/drivers/r600/r600_shader.c
> @@ -58,52 +58,6 @@ issued in the w slot as well.
>  The compiler must issue the source argument to slots z, y, and x
>  */
>
> -static int r600_pipe_shader(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader)
> -{
> -   struct r600_context *rctx = (struct r600_context *)ctx;
> -   struct r600_shader *rshader = &shader->shader;
> -   uint32_t *ptr;
> -   int i;
> -
> -   /* copy new shader */
> -   if (shader->bo == NULL) {
> -   shader->bo = (struct r600_resource*)
> -   pipe_buffer_create(ctx->screen, PIPE_BIND_CUSTOM, 
> PIPE_USAGE_IMMUTABLE, rshader->bc.ndw * 4);
> -   if (shader->bo == NULL) {
> -   return -ENOMEM;
> -   

Re: [Mesa-dev] [PATCH] r600g: allocate FMASK right after the texture, so that it's aligned with it

2013-03-04 Thread Jerome Glisse
On Sun, Mar 3, 2013 at 9:13 AM, Marek Olšák  wrote:
> This avoids the kernel CS checker errors with MSAA textures.

Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/r600_texture.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/r600/r600_texture.c 
> b/src/gallium/drivers/r600/r600_texture.c
> index 484045e..4825592 100644
> --- a/src/gallium/drivers/r600/r600_texture.c
> +++ b/src/gallium/drivers/r600/r600_texture.c
> @@ -435,8 +435,8 @@ r600_texture_create_object(struct pipe_screen *screen,
> }
>
> if (base->nr_samples > 1 && !rtex->is_depth && !buf) {
> -   r600_texture_allocate_cmask(rscreen, rtex);
> r600_texture_allocate_fmask(rscreen, rtex);
> +   r600_texture_allocate_cmask(rscreen, rtex);
> }
>
> if (!rtex->is_depth && base->nr_samples > 1 &&
> --
> 1.7.10.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] winsys/radeon: Only add bo to hash table when creating flink

2013-03-01 Thread Jerome Glisse
On Fri, Mar 1, 2013 at 4:34 PM, Martin Andersson  wrote:
> The problem is that we mix bo handles and flinked names in the hash
> table. Because kms type handles are not flinked they should not be
> added to the hash table. If we do that we will sooner or later
> get a situation where we will overwrite a correct entry because
> the bo handle was the same as a flinked name.
> ---
>  src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
> b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> index 2d41c26..f4ac526 100644
> --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> @@ -957,16 +957,16 @@ static boolean radeon_winsys_bo_get_handle(struct 
> pb_buffer *buffer,
>
>  bo->flinked = TRUE;
>  bo->flink = flink.name;
> +
> +pipe_mutex_lock(bo->mgr->bo_handles_mutex);
> +util_hash_table_set(bo->mgr->bo_handles, 
> (void*)(uintptr_t)bo->flink, bo);
> +pipe_mutex_unlock(bo->mgr->bo_handles_mutex);
>  }
>  whandle->handle = bo->flink;
>  } else if (whandle->type == DRM_API_HANDLE_TYPE_KMS) {
>  whandle->handle = bo->handle;
>  }
>
> -pipe_mutex_lock(bo->mgr->bo_handles_mutex);
> -util_hash_table_set(bo->mgr->bo_handles, 
> (void*)(uintptr_t)whandle->handle, bo);
> -pipe_mutex_unlock(bo->mgr->bo_handles_mutex);
> -
>  whandle->stride = stride;
>  return TRUE;
>  }
> --
> 1.8.1.4
>

Reviewed-by: Jerome Glisse 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of my work on the shader optimization

2013-02-26 Thread Jerome Glisse
On Tue, Feb 26, 2013 at 1:05 PM, Stefan Seifert  wrote:
> Good news!
>
> I gave the r600-sb branch a good testing at commit
> 265ae41b1f1d086d35d274c7378c43cddb8215c8 and so far I've not had a single
> lockup in about 1 1/2 hours of flight time!
>
> The downside is that this is with R600_HYPERZ=0. But with HYPERZ enabled, I
> get lockups on master as well, so it would seem your branch is in pretty good
> shape. Testing done on a Radeon HD 5670 using kernel 3.8
>
> Regards,
> Stefan

Hyperz bug # ?

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] r6xx flushing rework and enable CP DMA

2013-02-22 Thread Jerome Glisse
On Fri, Feb 22, 2013 at 2:38 PM,   wrote:
> From: Alex Deucher 
>
> This patch set cleans up the flushing on r6xx in what seems to be
> a logical manner.  The last patch enables CP DMA on r6xx.  No piglit
> regressions on RS780 which I was testing on.
>
> Alex Deucher (4):
>   r600g: add missing emit_flush for R600_CONTEXT_FLUSH_AND_INV case
>   r600g: synchronize streamout buffers on r6xx too (v2)
>   r600g: set additional cp_coher_cntl bits for 6xx/7xx flush (v2)
>   r600g: enable CP DMA on r6xx (v2)
>
>  src/gallium/drivers/r600/r600_blit.c   |3 +--
>  src/gallium/drivers/r600/r600_hw_context.c |   26 +-
>  2 files changed, 18 insertions(+), 11 deletions(-)

For the serie:
Reviewed-by: Jerome Glisse 

>
> --
> 1.7.7.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: properly implement S8Z24 depth-stencil format for Evergreen

2013-02-13 Thread Jerome Glisse
On Tue, Feb 12, 2013 at 8:06 PM, Marek Olšák  wrote:
> I should say "fix", but it has never been used until now.
> S8Z24 is the format equivalent to the GL_UNSIGNED_INT_24_8 packing,
> so we'll start to see it more often with st/mesa now making smart decisions
> about formats.
>
> The DB<->CB copy can change the channel ordering for transfers, other than
> that, the internal DB format doesn't really matter.
>
> R600-R700 support is possible except shadow mapping.
> FMT_24_8 is broken if the SAMPLE_C instruction is used (no idea why).
>
> Also the sampler swizzling was broken in theory and the fact it worked was
> a lucky coincidence.
>
> radeonsi might need to port this.

Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/evergreen_state.c |   13 +++-
>  src/gallium/drivers/r600/r600_state.c  |8 -
>  src/gallium/drivers/r600/r600_texture.c|   44 
> ++--
>  3 files changed, 47 insertions(+), 18 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index 211c218..c6e29db 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -200,6 +200,8 @@ static uint32_t r600_translate_dbformat(enum pipe_format 
> format)
> return V_028040_Z_16;
> case PIPE_FORMAT_Z24X8_UNORM:
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> +   case PIPE_FORMAT_X8Z24_UNORM:
> +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> return V_028040_Z_24;
> case PIPE_FORMAT_Z32_FLOAT:
> case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT:
> @@ -339,7 +341,7 @@ static uint32_t r600_translate_colorswap(enum pipe_format 
> format)
>
> case PIPE_FORMAT_X8Z24_UNORM:
> case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> -   return V_028C70_SWAP_STD;
> +   return V_028C70_SWAP_STD_REV;
>
> case PIPE_FORMAT_R10G10B10A2_UNORM:
> case PIPE_FORMAT_R10G10B10X2_SNORM:
> @@ -1106,6 +1108,11 @@ evergreen_create_sampler_view_custom(struct 
> pipe_context *ctx,
> case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT:
> pipe_format = PIPE_FORMAT_Z32_FLOAT;
> break;
> +   case PIPE_FORMAT_X8Z24_UNORM:
> +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> +   /* Z24 is always stored like this. */
> +   pipe_format = PIPE_FORMAT_Z24X8_UNORM;
> +   break;
> case PIPE_FORMAT_X24S8_UINT:
> case PIPE_FORMAT_S8X24_UINT:
> case PIPE_FORMAT_X32_S8X24_UINT:
> @@ -1603,6 +1610,8 @@ static void evergreen_init_depth_surface(struct 
> r600_context *rctx,
> switch (surf->base.format) {
> case PIPE_FORMAT_Z24X8_UNORM:
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> +   case PIPE_FORMAT_X8Z24_UNORM:
> +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> surf->pa_su_poly_offset_db_fmt_cntl =
> S_028B78_POLY_OFFSET_NEG_NUM_DB_BITS((char)-24);
> break;
> @@ -2179,6 +2188,8 @@ static void evergreen_emit_polygon_offset(struct 
> r600_context *rctx, struct r600
> switch (state->zs_format) {
> case PIPE_FORMAT_Z24X8_UNORM:
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> +   case PIPE_FORMAT_X8Z24_UNORM:
> +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> offset_units *= 2.0f;
> break;
> case PIPE_FORMAT_Z16_UNORM:
> diff --git a/src/gallium/drivers/r600/r600_state.c 
> b/src/gallium/drivers/r600/r600_state.c
> index 5322850..d1f6626 100644
> --- a/src/gallium/drivers/r600/r600_state.c
> +++ b/src/gallium/drivers/r600/r600_state.c
> @@ -270,10 +270,6 @@ static uint32_t r600_translate_colorswap(enum 
> pipe_format format)
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> return V_0280A0_SWAP_STD;
>
> -   case PIPE_FORMAT_X8Z24_UNORM:
> -   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> -   return V_0280A0_SWAP_STD;
> -
> case PIPE_FORMAT_R10G10B10A2_UNORM:
> case PIPE_FORMAT_R10G10B10X2_SNORM:
> case PIPE_FORMAT_R10SG10SB10SA2U_NORM:
> @@ -440,10 +436,6 @@ static uint32_t r600_translate_colorformat(enum 
> pipe_format format)
> case PIPE_FORMAT_Z24_UNORM_S8_UINT:
> return V_0280A0_COLOR_8_24;
>
> -   case PIPE_FORMAT_X8Z24_UNORM:
> -   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
> -   return V_0280A0_COLOR_24_8;
> -
> case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT:
> 

Re: [Mesa-dev] [PATCH 2/2] r600g: fix lockup when hyperz & alpha test are enabled together. v2

2013-02-11 Thread Jerome Glisse
On Mon, Feb 11, 2013 at 6:45 PM,   wrote:
> From: Jerome Glisse 
>
> Seems that alpha test being enabled confuse the GPU on the order in
> which it should perform the Z testing. So force the order programmed
> throught db shader control.
>
> v2: Only force z order when alpha test is enabled
>
> Signed-off-by: Jerome Glisse 
> Reviewed-by: Marek Olšák 

This one does not regress piglit (redwood or rv770) and still fix
lockup afaict. If no objection i will push tomorrow.

Cheers,
Jerome

> ---
>  src/gallium/drivers/r600/evergreen_state.c | 25 +++--
>  src/gallium/drivers/r600/r600_state.c  | 22 +-
>  2 files changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index 211c218..b710131 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -2251,6 +2251,13 @@ static void evergreen_emit_db_misc_state(struct 
> r600_context *rctx, struct r600_
> if (rctx->db_state.rsurf && rctx->db_state.rsurf->htile_enabled) {
> /* FORCE_OFF means HiZ/HiS are determined by 
> DB_SHADER_CONTROL */
> db_render_override |= 
> S_02800C_FORCE_HIZ_ENABLE(V_02800C_FORCE_OFF);
> +   /* This is to fix a lockup when hyperz and alpha test are 
> enabled at
> +* the same time some how GPU get confuse on which order to 
> pick for
> +* z test
> +*/
> +   if (rctx->alphatest_state.sx_alpha_test_control) {
> +   db_render_override |= 
> S_02800C_FORCE_SHADER_Z_ORDER(1);
> +   }
> } else {
> db_render_override |= 
> S_02800C_FORCE_HIZ_ENABLE(V_02800C_FORCE_DISABLE);
> }
> @@ -3240,7 +3247,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
> struct r600_pipe_shader
> struct r600_context *rctx = (struct r600_context *)ctx;
> struct r600_pipe_state *rstate = &shader->rstate;
> struct r600_shader *rshader = &shader->shader;
> -   unsigned i, exports_ps, num_cout, spi_ps_in_control_0, spi_input_z, 
> spi_ps_in_control_1, db_shader_control;
> +   unsigned i, exports_ps, num_cout, spi_ps_in_control_0, spi_input_z, 
> spi_ps_in_control_1, db_shader_control = 0;
> int pos_index = -1, face_index = -1;
> int ninterp = 0;
> boolean have_linear = FALSE, have_centroid = FALSE, have_perspective 
> = FALSE;
> @@ -3250,7 +3257,6 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
> struct r600_pipe_shader
>
> rstate->nregs = 0;
>
> -   db_shader_control = S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z);
> for (i = 0; i < rshader->ninput; i++) {
> /* evergreen NUM_INTERP only contains values interpolated 
> into the LDS,
>POSITION goes via GPRs from the SC so isn't counted */
> @@ -3484,6 +3490,21 @@ void evergreen_update_db_shader_control(struct 
> r600_context * rctx)
> 
> V_02880C_EXPORT_DB_FULL) |
> 
> S_02880C_ALPHA_TO_MASK_DISABLE(rctx->framebuffer.cb0_is_integer);
>
> +   /* When alpha test is enabled we can't antrust the hw to make the 
> proper
> +* decision on the order in which ztest should be run related to 
> fragment
> +* shader execution.
> +*
> +* If alpha test is enabled perform early z rejection (RE_Z) but 
> don't early
> +* write to the zbuffer. Write to zbuffer is delayed after fragment 
> shader
> +* execution and thus after alpha test so if discarded by the alpha 
> test
> +* the z value is not written.
> +*/
> +   if (rctx->alphatest_state.sx_alpha_test_control) {
> +   db_shader_control |= S_02880C_Z_ORDER(V_02880C_RE_Z);
> +   } else {
> +   db_shader_control |= 
> S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z);
> +   }
> +
> if (db_shader_control != rctx->db_misc_state.db_shader_control) {
> rctx->db_misc_state.db_shader_control = db_shader_control;
> rctx->db_misc_state.atom.dirty = true;
> diff --git a/src/gallium/drivers/r600/r600_state.c 
> b/src/gallium/drivers/r600/r600_state.c
> index 5322850..8efd4b3 100644
> --- a/src/gallium/drivers/r600/r600_state.c
> +++ b/src/gallium/drivers/r600/r600_state.c
> @@ -1966,6 +1966,13 @@ static void r600_emit_db_misc_state(struct 
> r600_context *rctx, struct r600_atom
> if (rctx-&g

Re: [Mesa-dev] [PATCH] r600g: add cs memory usage accounting and limit it

2013-01-31 Thread Jerome Glisse
On Wed, Jan 30, 2013 at 10:35 PM, Marek Olšák  wrote:
> On Wed, Jan 30, 2013 at 6:14 PM,   wrote:
>> From: Jerome Glisse 
>>
>> We are now seing cs that can go over the vram+gtt size to avoid
>> failing flush early cs that goes over 70% (gtt+vram) usage. 70%
>> is use to allow some fragmentation.
>>
>> Signed-off-by: Jerome Glisse 
>> ---
>>  src/gallium/drivers/r600/evergreen_state.c|  4 
>>  src/gallium/drivers/r600/r600.h   |  1 +
>>  src/gallium/drivers/r600/r600_buffer.c|  1 +
>>  src/gallium/drivers/r600/r600_hw_context.c| 12 
>>  src/gallium/drivers/r600/r600_pipe.c  |  3 +++
>>  src/gallium/drivers/r600/r600_pipe.h  | 21 +
>>  src/gallium/drivers/r600/r600_state.c |  3 +++
>>  src/gallium/drivers/r600/r600_state_common.c  | 17 -
>>  src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 11 +++
>>  src/gallium/winsys/radeon/drm/radeon_winsys.h | 10 ++
>>  10 files changed, 82 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
>> b/src/gallium/drivers/r600/evergreen_state.c
>> index be1c427..84f8dce 100644
>> --- a/src/gallium/drivers/r600/evergreen_state.c
>> +++ b/src/gallium/drivers/r600/evergreen_state.c
>> @@ -1668,6 +1668,8 @@ static void evergreen_set_framebuffer_state(struct 
>> pipe_context *ctx,
>> surf = (struct r600_surface*)state->cbufs[i];
>> rtex = (struct r600_texture*)surf->base.texture;
>>
>> +   r600_context_add_resource_size(ctx, 
>> state->cbufs[i]->texture);
>> +
>> if (!surf->color_initialized) {
>> evergreen_init_color_surface(rctx, surf);
>> }
>> @@ -1699,6 +1701,8 @@ static void evergreen_set_framebuffer_state(struct 
>> pipe_context *ctx,
>> if (state->zsbuf) {
>> surf = (struct r600_surface*)state->zsbuf;
>>
>> +   r600_context_add_resource_size(ctx, state->zsbuf->texture);
>> +
>> if (!surf->depth_initialized) {
>> evergreen_init_depth_surface(rctx, surf);
>> }
>> diff --git a/src/gallium/drivers/r600/r600.h 
>> b/src/gallium/drivers/r600/r600.h
>> index a383c90..b9f7d3d 100644
>> --- a/src/gallium/drivers/r600/r600.h
>> +++ b/src/gallium/drivers/r600/r600.h
>> @@ -50,6 +50,7 @@ struct r600_resource {
>>
>> /* Resource state. */
>> unsigneddomains;
>> +   uint64_tsize;
>
> Don't add this. Use r600_resource::buf::size instead, which is already
> initialized.
>
>
>>  };
>>
>>  #define R600_BLOCK_MAX_BO  32
>> diff --git a/src/gallium/drivers/r600/r600_buffer.c 
>> b/src/gallium/drivers/r600/r600_buffer.c
>> index 6df0d91..92f549a 100644
>> --- a/src/gallium/drivers/r600/r600_buffer.c
>> +++ b/src/gallium/drivers/r600/r600_buffer.c
>> @@ -250,6 +250,7 @@ bool r600_init_resource(struct r600_screen *rscreen,
>> break;
>> }
>>
>> +   res->size = size;
>> res->buf = rscreen->ws->buffer_create(rscreen->ws, size, alignment,
>>use_reusable_pool,
>>initial_domain);
>> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
>> b/src/gallium/drivers/r600/r600_hw_context.c
>> index ebafd97..44d3b4d 100644
>> --- a/src/gallium/drivers/r600/r600_hw_context.c
>> +++ b/src/gallium/drivers/r600/r600_hw_context.c
>> @@ -359,6 +359,16 @@ out_err:
>>  void r600_need_cs_space(struct r600_context *ctx, unsigned num_dw,
>> boolean count_draw_in)
>>  {
>> +   if (!ctx->ws->cs_memory_below_limit(ctx->rings.gfx.cs, ctx->vram, 
>> ctx->gtt)) {
>> +   ctx->gtt = 0;
>> +   ctx->vram = 0;
>> +   ctx->rings.gfx.flush(ctx, RADEON_FLUSH_ASYNC);
>> +   return;
>> +   }
>> +   /* all will be accounted once relocation are emited */
>> +   ctx->gtt = 0;
>> +   ctx->vram = 0;
>> +
>> /* The number of dwords we already used in the CS so far. */
>> num_dw += ctx->rings.gfx.cs->cdw;
>>
>> @@ -784,6 +794,8 @@ void r600_begin_new_cs(struct r600_context *ctx)
>>
>> c

Re: [Mesa-dev] [PATCH 1/2] radeon/winsys: add dma ring support to winsys

2013-01-08 Thread Jerome Glisse
On Tue, Jan 8, 2013 at 10:15 AM, Marek Olšák  wrote:
> On Mon, Jan 7, 2013 at 9:30 PM,   wrote:
>> From: Jerome Glisse 
>>
>> Signed-off-by: Jerome Glisse 
>> ---
>>  src/gallium/drivers/r300/r300_context.c   |   2 +-
>>  src/gallium/drivers/r600/r600_pipe.c  |   2 +-
>>  src/gallium/drivers/radeonsi/radeonsi_pipe.c  |   2 +-
>>  src/gallium/winsys/radeon/drm/radeon_drm_bo.c |   2 +-
>>  src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 104 
>> +++---
>>  src/gallium/winsys/radeon/drm/radeon_drm_cs.h |   2 +-
>>  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c |   6 ++
>>  src/gallium/winsys/radeon/drm/radeon_winsys.h |  21 -
>>  8 files changed, 100 insertions(+), 41 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r300/r300_context.c 
>> b/src/gallium/drivers/r300/r300_context.c
>> index b498454..f0d738e 100644
>> --- a/src/gallium/drivers/r300/r300_context.c
>> +++ b/src/gallium/drivers/r300/r300_context.c
>> @@ -376,7 +376,7 @@ struct pipe_context* r300_create_context(struct 
>> pipe_screen* screen,
>>   sizeof(struct pipe_transfer), 64,
>>   UTIL_SLAB_SINGLETHREADED);
>>
>> -r300->cs = rws->cs_create(rws);
>> +r300->cs = rws->cs_create(rws, RING_GFX);
>>  if (r300->cs == NULL)
>>  goto fail;
>>
>> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
>> b/src/gallium/drivers/r600/r600_pipe.c
>> index 29ef988..7c4ec44 100644
>> --- a/src/gallium/drivers/r600/r600_pipe.c
>> +++ b/src/gallium/drivers/r600/r600_pipe.c
>> @@ -289,7 +289,7 @@ static struct pipe_context *r600_create_context(struct 
>> pipe_screen *screen, void
>> goto fail;
>> }
>>
>> -   rctx->cs = rctx->ws->cs_create(rctx->ws);
>> +   rctx->cs = rctx->ws->cs_create(rctx->ws, RING_GFX);
>> rctx->ws->cs_set_flush_callback(rctx->cs, r600_flush_from_winsys, 
>> rctx);
>>
>> rctx->uploader = u_upload_create(&rctx->context, 1024 * 1024, 256,
>> diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
>> b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
>> index d66e30f..cfa1ff7 100644
>> --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
>> +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
>> @@ -222,7 +222,7 @@ static struct pipe_context *r600_create_context(struct 
>> pipe_screen *screen, void
>> case TAHITI:
>> si_init_state_functions(rctx);
>> LIST_INITHEAD(&rctx->active_query_list);
>> -   rctx->cs = rctx->ws->cs_create(rctx->ws);
>> +   rctx->cs = rctx->ws->cs_create(rctx->ws, RING_GFX);
>> rctx->max_db = 8;
>> si_init_config(rctx);
>> break;
>> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
>> b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
>> index 897e962..6daafc3 100644
>> --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
>> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
>> @@ -453,7 +453,7 @@ static void *radeon_bo_map(struct 
>> radeon_winsys_cs_handle *buf,
>>  } else {
>>  /* Try to avoid busy-waiting in radeon_bo_wait. */
>>  if (p_atomic_read(&bo->num_active_ioctls))
>> -radeon_drm_cs_sync_flush(cs);
>> +radeon_drm_cs_sync_flush(rcs);
>>  }
>>
>>  radeon_bo_wait((struct pb_buffer*)bo, 
>> RADEON_USAGE_READWRITE);
>> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
>> b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
>> index c5e7f1e..5e2c471 100644
>> --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
>> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
>> @@ -90,6 +90,10 @@
>>  #define RADEON_CS_RING_COMPUTE  1
>>  #endif
>>
>> +#ifndef RADEON_CS_RING_DMA
>> +#define RADEON_CS_RING_DMA  2
>> +#endif
>> +
>>  #ifndef RADEON_CS_END_OF_FRAME
>>  #define RADEON_CS_END_OF_FRAME  0x04
>>  #endif
>> @@ -161,7 +165,7 @@ static void radeon_destroy_cs_context(struct 
>> radeon_cs_context *csc)
>>  DEBUG_GET_ONCE_BOOL_OPTION(thread, "RADEON_THREAD", TRUE)
>>  static PIPE_THREAD_ROUTINE(radeon_drm_cs_emit_ioctl, param);
>>
>> -static struct radeon_winsys_cs *radeon_drm_cs_create(struc

Re: [Mesa-dev] [PATCH 3/3] radeon/winsys: add async dma infrastructure

2013-01-07 Thread Jerome Glisse
On Mon, Jan 7, 2013 at 11:03 AM, Marek Olšák  wrote:
> On Mon, Jan 7, 2013 at 3:45 PM, Christian König  
> wrote:
>> On 07.01.2013 01:24, Marek Olšák wrote:
>>>
>>> On Sun, Jan 6, 2013 at 11:58 PM, Jerome Glisse  wrote:
>>>>
>>>> On Sun, Jan 6, 2013 at 4:00 PM, Marek Olšák  wrote:
>>>>>
>>>>> I agree with Christian. You can use a separate instance of
>>>>> radeon_winsys_cs for the DMA CS. The winsys exposes all the functions
>>>>> you need (except one) for you to coordinate work between 2 command
>>>>> streams in the pipe driver. You may only need to expose one additional
>>>>> winsys function to the driver for synchronization, it's called
>>>>> "radeon_drm_cs_sync_flush". I'm confident that this can be implemented
>>>>> and layered on top of the winsys, presumably with fewer lines of code
>>>>> and cleaner.
>>>>
>>>> The relocation add function need to access both the dma ring and the
>>>> cs ring no matter on which ring the relocation is added. Doing the
>>>> sync in the pipe driver would increase the code, each call site of
>>>> add_reloc would need to check if the bo is referenced by the other
>>>> ring and flush the other ring if so. Which also means that there is a
>>>> higher likelyhood that someone adding an add reloc forget about the
>>>> flushing.
>>>
>>> Well, in that case, you can define a new set of functions in the pipe
>>> driver, which are layered on top of radeon_winsys_cs and the existing
>>> interface radeon_winsys::cs_*.
>>>
>>> If you want to be super clean, you can add a new module that defines
>>> this command stream pair:
>>>
>>> struct r600_cs_with_dma {
>>> struct radeon_winsys_cs *cs_main, *cs_dma;
>>> };
>>>
>>> And define a set of functions which work with that, reimplementing all
>>> the cs_* functions by calling the existing functions of radeon_winsys.
>>> The pipe driver would then use the new CS functions everywhere instead
>>> of radeon_winsys.
>>>
>>> To me, the best design decision here is not to try to *hack* the
>>> existing winsys code to make it do what you want without giving it
>>> another thought. Adding another layer is preferable, because it keeps
>>> both parts simple and separated.
>>
>>
>> Well thinking about it more and more I don't think add_reloc is the right
>> place to do the sync anyway.
>>
>> Imagine a loop that wants to handle a bunch of buffers, first they are zero
>> cleared and then rendered to. Those buffers are unique, so we can zero clear
>> them all at once. In an ideal world they should all end up in the same DMA
>> command stream.
>>
>> Now comes a buffer that is first rendered to and then copied around (for
>> example), in this moment the DMA command stream needs to be flushed, cause
>> now a new DMA command stream starts that actually needs to run after the
>> rendering command stream.
>>
>> So instead of flushing when we see that a buffer gets added to a command
>> stream we need to remember in which oder the command stream needs to get
>> submitted and only flush when this order is going to change.
>
> I agree with all your points. add_reloc is a bad place for
> synchronization for yet another reason: you don't really know anything
> about what the driver is trying to do and what commands and
> relocations are likely to come next, as opposed to e.g. a write
> transfer where you are 100% sure that:
> - the source resource isn't referenced by a CS nor is it busy
> - the destination resource will likely be used pretty soon as a source
> for rendering
>
> In addition to that, I believe that using the async DMA is useless for
> anything but write-only transfers with a staging resource. Every other
> case is always synchronous with rendering and therefore defeats the
> purpose of the *async* DMA. I therefore propose:
>
> 1) Let's not use the async DMA in resource_copy_region *at all*.
>
> 2) Let's replace all the resource_copy_region and copy_buffer
> occurencies in transfer_unmap with async DMA copies. The function
> implementing the copy should do all the necessary synchronization
> between command streams by itself.
>
> Marek

I have v3 coming that should please you.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radeon/winsys: add async dma infrastructure

2013-01-05 Thread Jerome Glisse
On Sat, Jan 5, 2013 at 9:49 AM, Christian König  wrote:
> On 04.01.2013 23:19, j.gli...@gmail.com wrote:
> [SNIP]
>
>> diff --git a/src/gallium/drivers/r300/r300_emit.c
>> b/src/gallium/drivers/r300/r300_emit.c
>> index d1ed4b3..c824821 100644
>> --- a/src/gallium/drivers/r300/r300_emit.c
>> +++ b/src/gallium/drivers/r300/r300_emit.c
>> @@ -1184,7 +1184,8 @@ validate:
>>   assert(tex && tex->buf && "cbuf is marked, but NULL!");
>>   r300->rws->cs_add_reloc(r300->cs, tex->cs_buf,
>>   RADEON_USAGE_READWRITE,
>> -r300_surface(fb->cbufs[i])->domain);
>> +r300_surface(fb->cbufs[i])->domain,
>> +RADEON_RING_DMA);
>
>
> ??? DMA ring on R300? At least on first glance that looks quite odd, should
> probably be GFX ring instead.

Yeah it's cut and paste error i catched up that when testing on r3xx

>
>>   }
>>   /* ...depth buffer... */
>>   if (fb->zsbuf) {
>> @@ -1192,7 +1193,8 @@ validate:
>>   assert(tex && tex->buf && "zsbuf is marked, but NULL!");
>>   r300->rws->cs_add_reloc(r300->cs, tex->cs_buf,
>>   RADEON_USAGE_READWRITE,
>> -r300_surface(fb->zsbuf)->domain);
>> +r300_surface(fb->zsbuf)->domain,
>> +RADEON_RING_DMA);
>
>
> Same here and repeats on a couple of more places.
> [SNIP]
>
>
>> diff --git a/src/gallium/winsys/radeon/drm/radeon_winsys.h
>> b/src/gallium/winsys/radeon/drm/radeon_winsys.h
>> index 16536dc..5ff463e 100644
>> --- a/src/gallium/winsys/radeon/drm/radeon_winsys.h
>> +++ b/src/gallium/winsys/radeon/drm/radeon_winsys.h
>> @@ -43,11 +43,13 @@
>>   #include "pipebuffer/pb_buffer.h"
>>   #include "libdrm/radeon_surface.h"
>>   -#define RADEON_MAX_CMDBUF_DWORDS (16 * 1024)
>> +#define RADEON_MAX_CMDBUF_DWORDS(16 * 1024)
>>   -#define RADEON_FLUSH_ASYNC   (1 << 0)
>> -#define RADEON_FLUSH_KEEP_TILING_FLAGS (1 << 1) /* needs DRM 2.12.0 */
>> -#define RADEON_FLUSH_COMPUTE   (1 << 2)
>> +#define RADEON_FLUSH_ASYNC  (1 << 0)
>> +#define RADEON_FLUSH_KEEP_TILING_FLAGS  (1 << 1) /* needs DRM 2.12.0 */
>> +#define RADEON_FLUSH_COMPUTE(1 << 2)
>> +#define RADEON_FLUSH_DMA(1 << 3)
>> +#define RADEON_FLUSH_GFX(1 << 4)
>> /* Tiling flags. */
>>   enum radeon_bo_layout {
>> @@ -137,12 +139,19 @@ enum chip_class {
>>   TAHITI,
>>   };
>>   +enum radeon_ring_type {
>> +RADEON_RING_PM4 = 0,
>> +RADEON_RING_DMA = 1,
>> +};
>> +
>
>
> Don't use PM4 as identifier here, the PM4 packet format is used for other
> ring types beside GFX/Compute as well, but those rings can't necessary
> execute GFX/Compute commands.

I was looking for a 3 letter name that encompass gfx and compute

>>   struct winsys_handle;
>>   struct radeon_winsys_cs_handle;
>> struct radeon_winsys_cs {
>> -unsigned cdw;  /* Number of used dwords. */
>> -uint32_t *buf; /* The command buffer. */
>> +unsignedcdw;  /* Number of used dwords. */
>> +uint32_t*buf; /* The command buffer. */
>> +unsigneddma_cdw;  /* Number of used dwords. */
>> +uint32_t*dma_buf; /* The command buffer. */
>>   };
>
>
> Why like this? Can't we just have separate instances of the radeon_winsys_cs
> structure for each ring type we are dealing with?
>
> The rest looks quite good,
> Christian.

No we can't we need to keep track at the same time for same context of
the dma ring and the gfx/compute/uvd/other ring It's the relocation
code that needs that.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radeon/winsys: add async dma infrastructure

2013-01-04 Thread Jerome Glisse
On Fri, Jan 4, 2013 at 6:33 PM, Alex Deucher  wrote:
> On Fri, Jan 4, 2013 at 5:19 PM,   wrote:
>> From: Jerome Glisse 
>>
>> The design is to take advantage of the fact that kernel will emit
>> semaphore when buffer is referenced by different ring. So the only
>> thing we need to enforce synchronization btw dma and gfx/compute
>> ring is to make sure that we never reference same bo at the same
>> time on the dma and gfx ring.
>>
>> This is achieved by tracking relocation, when we add a relocation
>> to the dma ring for a bo we check first if the bo has an active
>> relocation on the gfx ring. If it's the case we flush the gfx ring.
>> We do the same when adding a bo to the gfx ring we check it does
>> not have a relocation on the dma ring if it has one we flush the
>> dma ring.
>>
>> This patch also simplify the helper query function to know if a bo
>> has pending write/read command.
>
> Looks good.  A couple of minor comments below.  BTW, any performance gains?
>

No, there isn't much benchmark that will trigger a lot of buffer copy
AFAICT. Here is a WIP patch for texture copy :
http://people.freedesktop.org/~glisse/0001-r600g-r7xx-use-async-dma-for-resource-copy.patch

Kernel mostly reject the command stream so far i need to check what's going on.

Cheers,
Jerome

> Alex
>
>>
>> Signed-off-by: Jerome Glisse 
>> ---
>>  src/gallium/drivers/r300/r300_emit.c   |  21 +-
>>  src/gallium/drivers/r300/r300_flush.c  |   7 +-
>>  src/gallium/drivers/r600/evergreen_hw_context.c|  39 +++
>>  src/gallium/drivers/r600/evergreend.h  |  16 ++
>>  src/gallium/drivers/r600/r600.h|  13 +
>>  src/gallium/drivers/r600/r600_blit.c   |  94 +--
>>  src/gallium/drivers/r600/r600_hw_context.c |  44 +++-
>>  src/gallium/drivers/r600/r600_pipe.c   |  13 +-
>>  src/gallium/drivers/r600/r600_pipe.h   |   2 +-
>>  src/gallium/drivers/r600/r600_texture.c|   2 +-
>>  src/gallium/drivers/r600/r600d.h   |  16 ++
>>  src/gallium/drivers/radeonsi/r600_hw_context.c |   2 +-
>>  .../drivers/radeonsi/r600_hw_context_priv.h|   2 +-
>>  src/gallium/drivers/radeonsi/r600_texture.c|   2 +-
>>  src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  13 +-
>>  src/gallium/winsys/radeon/drm/radeon_drm_bo.c  |  10 +-
>>  src/gallium/winsys/radeon/drm/radeon_drm_bo.h  |   2 +
>>  src/gallium/winsys/radeon/drm/radeon_drm_cs.c  | 270 
>> +
>>  src/gallium/winsys/radeon/drm/radeon_drm_cs.h  |  40 ++-
>>  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c  |   6 +
>>  src/gallium/winsys/radeon/drm/radeon_winsys.h  |  28 ++-
>>  21 files changed, 509 insertions(+), 133 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r300/r300_emit.c 
>> b/src/gallium/drivers/r300/r300_emit.c
>> index d1ed4b3..c824821 100644
>> --- a/src/gallium/drivers/r300/r300_emit.c
>> +++ b/src/gallium/drivers/r300/r300_emit.c
>> @@ -1184,7 +1184,8 @@ validate:
>>  assert(tex && tex->buf && "cbuf is marked, but NULL!");
>>  r300->rws->cs_add_reloc(r300->cs, tex->cs_buf,
>>  RADEON_USAGE_READWRITE,
>> -r300_surface(fb->cbufs[i])->domain);
>> +r300_surface(fb->cbufs[i])->domain,
>> +RADEON_RING_DMA);
>>  }
>>  /* ...depth buffer... */
>>  if (fb->zsbuf) {
>> @@ -1192,7 +1193,8 @@ validate:
>>  assert(tex && tex->buf && "zsbuf is marked, but NULL!");
>>  r300->rws->cs_add_reloc(r300->cs, tex->cs_buf,
>>  RADEON_USAGE_READWRITE,
>> -r300_surface(fb->zsbuf)->domain);
>> +r300_surface(fb->zsbuf)->domain,
>> +RADEON_RING_DMA);
>>  }
>>  }
>>  if (r300->textures_state.dirty) {
>> @@ -1204,18 +1206,21 @@ validate:
>>
>>  tex = r300_resource(texstate->sampler_views[i]->base.texture);
>>  r300->rws->cs_add_reloc(r300->cs, tex->cs_buf, 
>> RADEON_USAGE_READ,
>> -tex->domain);
>> +tex->domain,
>> +

Re: [Mesa-dev] [PATCH 1/3] r600g/radeon/winsys: indentation cleanup

2013-01-04 Thread Jerome Glisse
On Fri, Jan 4, 2013 at 5:19 PM,   wrote:
> From: Jerome Glisse 
>
> Signed-off-by: Jerome Glisse 

For the serie piglit says no regression on r7xx/evergreen. I need to
test r3xx/r5xx and SI.

Cheers,
Jerome

> ---
>  src/gallium/drivers/r600/r600_pipe.c  | 18 +-
>  src/gallium/drivers/r600/r600_pipe.h  |  2 +-
>  src/gallium/winsys/radeon/drm/radeon_drm_bo.c |  3 +--
>  src/gallium/winsys/radeon/drm/radeon_drm_cs.h |  2 +-
>  4 files changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
> b/src/gallium/drivers/r600/r600_pipe.c
> index 65dcbf8..e9d5e0a 100644
> --- a/src/gallium/drivers/r600/r600_pipe.c
> +++ b/src/gallium/drivers/r600/r600_pipe.c
> @@ -290,21 +290,21 @@ static struct pipe_context *r600_create_context(struct 
> pipe_screen *screen, void
> rctx->cs = rctx->ws->cs_create(rctx->ws);
> rctx->ws->cs_set_flush_callback(rctx->cs, r600_flush_from_winsys, 
> rctx);
>
> -rctx->uploader = u_upload_create(&rctx->context, 1024 * 1024, 256,
> - PIPE_BIND_INDEX_BUFFER |
> - PIPE_BIND_CONSTANT_BUFFER);
> -if (!rctx->uploader)
> -goto fail;
> +   rctx->uploader = u_upload_create(&rctx->context, 1024 * 1024, 256,
> +   PIPE_BIND_INDEX_BUFFER |
> +   PIPE_BIND_CONSTANT_BUFFER);
> +   if (!rctx->uploader)
> +   goto fail;
>
> rctx->allocator_fetch_shader = u_suballocator_create(&rctx->context, 
> 64 * 1024, 256,
>  0, 
> PIPE_USAGE_STATIC, FALSE);
> -if (!rctx->allocator_fetch_shader)
> -goto fail;
> +   if (!rctx->allocator_fetch_shader)
> +   goto fail;
>
> rctx->allocator_so_filled_size = 
> u_suballocator_create(&rctx->context, 4096, 4,
> -   0, 
> PIPE_USAGE_STATIC, TRUE);
> +   0, 
> PIPE_USAGE_STATIC, TRUE);
>  if (!rctx->allocator_so_filled_size)
> -goto fail;
> +   goto fail;
>
> rctx->blitter = util_blitter_create(&rctx->context);
> if (rctx->blitter == NULL)
> diff --git a/src/gallium/drivers/r600/r600_pipe.h 
> b/src/gallium/drivers/r600/r600_pipe.h
> index 6b7c053..934a6f5 100644
> --- a/src/gallium/drivers/r600/r600_pipe.h
> +++ b/src/gallium/drivers/r600/r600_pipe.h
> @@ -408,7 +408,7 @@ struct r600_context {
> struct radeon_winsys*ws;
> struct radeon_winsys_cs *cs;
> struct blitter_context  *blitter;
> -   struct u_upload_mgr *uploader;
> +   struct u_upload_mgr *uploader;
> struct u_suballocator   *allocator_so_filled_size;
> struct u_suballocator   *allocator_fetch_shader;
> struct util_slab_mempoolpool_transfers;
> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
> b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> index 07e92c5..897e962 100644
> --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> @@ -802,8 +802,7 @@ static void radeon_bo_set_tiling(struct pb_buffer *_buf,
>  sizeof(args));
>  }
>
> -static struct radeon_winsys_cs_handle *radeon_drm_get_cs_handle(
> -struct pb_buffer *_buf)
> +static struct radeon_winsys_cs_handle *radeon_drm_get_cs_handle(struct 
> pb_buffer *_buf)
>  {
>  /* return radeon_bo. */
>  return (struct radeon_winsys_cs_handle*)get_radeon_bo(_buf);
> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h 
> b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
> index 6336d3a..286eb6a 100644
> --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
> @@ -33,7 +33,7 @@
>  struct radeon_cs_context {
>  uint32_tbuf[RADEON_MAX_CMDBUF_DWORDS];
>
> -int fd;
> +int fd;
>  struct drm_radeon_cscs;
>  struct drm_radeon_cs_chunk  chunks[3];
>  uint64_tchunk_array[3];
> --
> 1.7.11.7
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: texture buffer object + glsl 1.40 enable support

2012-12-19 Thread Jerome Glisse
On Wed, Dec 19, 2012 at 12:33 PM, Tom Stellard  wrote:
> On Sun, Dec 16, 2012 at 08:33:23PM +1000, Dave Airlie wrote:
>> From: Dave Airlie 
>>
>> This adds TBO support to r600g, and with GLSL 1.40 enabled,
>> we now get 3.1 core profiles advertised for r600g.
>>
>> This code is evergreen only so far, but I don't think there is
>> much to make it work on r600/700/cayman other than testing.
>>
>> a) buffer txq is broken like cube map txq, this sucks, fix it the
>> exact same way.
>>
>> b) buffer fetches are done with a vertex clause,
>>
>> c) vertex swizzling offsets are different than texture swizzles,
>> but we still need to use the combiner, so make it configurable.
>>
>> d) add implementation of UCMP.
>>
>> TODO: r600/700/cayman testin
>> Signed-off-by: Dave Airlie 
>> ---
>>  src/gallium/drivers/r600/evergreen_state.c   | 55 
>>  src/gallium/drivers/r600/r600_asm.c  |  2 +-
>>  src/gallium/drivers/r600/r600_asm.h  |  2 +
>>  src/gallium/drivers/r600/r600_pipe.c |  4 +-
>>  src/gallium/drivers/r600/r600_pipe.h | 10 +++-
>>  src/gallium/drivers/r600/r600_shader.c   | 75 
>> 
>>  src/gallium/drivers/r600/r600_shader.h   |  1 +
>>  src/gallium/drivers/r600/r600_state_common.c | 58 +
>>  src/gallium/drivers/r600/r600_texture.c  | 16 --
>>  9 files changed, 204 insertions(+), 19 deletions(-)
>>
>
> [snip]
>
>> diff --git a/src/gallium/drivers/r600/r600_shader.c 
>> b/src/gallium/drivers/r600/r600_shader.c
>> index feb7001..60667e7 100644
>> --- a/src/gallium/drivers/r600/r600_shader.c
>> +++ b/src/gallium/drivers/r600/r600_shader.c
>> @@ -3819,6 +3819,71 @@ static inline unsigned tgsi_tex_get_src_gpr(struct 
>> r600_shader_ctx *ctx,
>>   return ctx->file_offset[inst->Src[index].Register.File] + 
>> inst->Src[index].Register.Index;
>>  }
>>
>> +static int do_vtx_fetch_inst(struct r600_shader_ctx *ctx, boolean 
>> src_requires_loading)
>> +{
>> + struct r600_bytecode_vtx vtx;
>> + struct r600_bytecode_alu alu;
>> + struct tgsi_full_instruction *inst = 
>> &ctx->parse.FullToken.FullInstruction;
>> + int src_gpr, r, i;
>> +
>> + src_gpr = tgsi_tex_get_src_gpr(ctx, 0);
>> + if (src_requires_loading) {
>> + for (i = 0; i < 4; i++) {
>> + memset(&alu, 0, sizeof(struct r600_bytecode_alu));
>> + alu.inst = 
>> CTX_INST(V_SQ_ALU_WORD1_OP2_SQ_OP2_INST_MOV);
>> + r600_bytecode_src(&alu.src[0], &ctx->src[0], i);
>> + alu.dst.sel = ctx->temp_reg;
>> + alu.dst.chan = i;
>> + if (i == 3)
>> + alu.last = 1;
>> + alu.dst.write = 1;
>> + r = r600_bytecode_add_alu(ctx->bc, &alu);
>> + if (r)
>> + return r;
>> + }
>> + src_gpr = ctx->temp_reg;
>> + }
>> +
>> + memset(&vtx, 0, sizeof(vtx));
>> + vtx.inst = 0;
>> + vtx.buffer_id = tgsi_tex_get_src_gpr(ctx, 1) + R600_MAX_CONST_BUFFERS;;
>> + vtx.fetch_type = 2; /* VTX_FETCH_NO_INDEX_OFFSET */
>> + vtx.src_gpr = src_gpr;
>> + vtx.mega_fetch_count = 16;
>> + vtx.dst_gpr = ctx->file_offset[inst->Dst[0].Register.File] + 
>> inst->Dst[0].Register.Index;
>> + vtx.dst_sel_x = (inst->Dst[0].Register.WriteMask & 1) ? 0 : 7; 
>>  /* SEL_X */
>> + vtx.dst_sel_y = (inst->Dst[0].Register.WriteMask & 2) ? 1 : 7; 
>>  /* SEL_Y */
>> + vtx.dst_sel_z = (inst->Dst[0].Register.WriteMask & 4) ? 2 : 7; 
>>  /* SEL_Z */
>> + vtx.dst_sel_w = (inst->Dst[0].Register.WriteMask & 8) ? 3 : 7; 
>>  /* SEL_W */
>> + vtx.use_const_fields = 1;
>> + vtx.srf_mode_all = 1;   /* SRF_MODE_NO_ZERO */
>> +
>
> According to the docs, srf_mode_all will be ignored if use_const_fields
> is set.  However, based on my tests while running compute shaders, other
> fields like data_format, which are supposed to be ignored weren't being
> ignored unless the were set to zero.  So, I think it would be safer
> here to set srf_mode_all to zero and make sure that bit gets set on
> the resource.
>
>
>> + if ((r = r600_bytecode_add_vtx(ctx->bc, &vtx)))
>> + return r;
>> + return 0;
>> +}
>> +
>
> Otherwise, this code for vtx fetch looks good to me.  One problem I ran into
> with vtx fetch instructions while working on compute shaders was that
> the GPU will hang if you write to vtx.src_gpr in the
> instruction group following the vtx fetch.  Here is a simple example:
>
> %T2_X = MOV %ZERO
> %T3_X = VTX_READ_eg %T2_X, 24
> %T2_X = MOV %ZERO
>
> I'm not sure if this happens on all GPU variants, but I was able to
> consistently reproduce this on my SUMO.  You may want to keep an eye
> out for this in case you run into any unexplainable hangs.
>

The vtx fetch group had the barrier flag set ?

Cheers,
Jerome

Re: [Mesa-dev] [PATCH] r600g: add cs tracing infrastructure for lockup pin pointing

2012-12-19 Thread Jerome Glisse
On Wed, Dec 19, 2012 at 12:17 PM,   wrote:
> From: Jerome Glisse 
>
> It's a build time option you need to set R600_TRACE_CS to 1 and it
> will print to stderr all cs along as cs trace point value which
> gave last offset into a cs process by the GPU.
>
> Signed-off-by: Jerome Glisse 

For information this is something i have been using for a while and i
am just getting tire of porting it over and over so i cleaned it up
into something that i believe is usefull. My rdb tools can be used to
annotate cs output given by this infrastructure: rdb_annotateib
hd2xxx.rdb dumpfile > dumpfile.readablebyhuman

I gave the the last dw before lockup. If you don't have many
application running at the same time it has proven to be accurate most
of the time.

Note you will need the kernel patch i just sent.

Cheers,
Jerome


> ---
>  src/gallium/drivers/r600/r600_hw_context.c  | 41 
> +
>  src/gallium/drivers/r600/r600_hw_context_priv.h |  5 +--
>  src/gallium/drivers/r600/r600_pipe.c| 20 
>  src/gallium/drivers/r600/r600_pipe.h| 16 ++
>  src/gallium/drivers/r600/r600_state_common.c| 26 
>  5 files changed, 106 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
> b/src/gallium/drivers/r600/r600_hw_context.c
> index cdd31a4..6c8cb9d 100644
> --- a/src/gallium/drivers/r600/r600_hw_context.c
> +++ b/src/gallium/drivers/r600/r600_hw_context.c
> @@ -27,6 +27,7 @@
>  #include "r600d.h"
>  #include "util/u_memory.h"
>  #include 
> +#include 
>
>  /* Get backends mask */
>  void r600_get_backend_mask(struct r600_context *ctx)
> @@ -369,6 +370,11 @@ void r600_need_cs_space(struct r600_context *ctx, 
> unsigned num_dw,
> for (i = 0; i < R600_NUM_ATOMS; i++) {
> if (ctx->atoms[i] && ctx->atoms[i]->dirty) {
> num_dw += ctx->atoms[i]->num_dw;
> +#if R600_TRACE_CS
> +   if (ctx->screen->trace_bo) {
> +   num_dw += R600_TRACE_CS_DWORDS;
> +   }
> +#endif
> }
> }
>
> @@ -376,6 +382,11 @@ void r600_need_cs_space(struct r600_context *ctx, 
> unsigned num_dw,
>
> /* The upper-bound of how much space a draw command would 
> take. */
> num_dw += R600_MAX_FLUSH_CS_DWORDS + R600_MAX_DRAW_CS_DWORDS;
> +#if R600_TRACE_CS
> +   if (ctx->screen->trace_bo) {
> +   num_dw += R600_TRACE_CS_DWORDS;
> +   }
> +#endif
> }
>
> /* Count in queries_suspend. */
> @@ -717,7 +728,37 @@ void r600_context_flush(struct r600_context *ctx, 
> unsigned flags)
> }
>
> /* Flush the CS. */
> +#if R600_TRACE_CS
> +   if (ctx->screen->trace_bo) {
> +   struct r600_screen *rscreen = ctx->screen;
> +   unsigned i;
> +
> +   for (i = 0; i < cs->cdw; i++) {
> +   fprintf(stderr, "[%4d] [%5d] 0x%08x\n", 
> rscreen->cs_count, i, cs->buf[i]);
> +   }
> +   rscreen->cs_count++;
> +   }
> +#endif
> ctx->ws->cs_flush(ctx->cs, flags);
> +#if R600_TRACE_CS
> +   if (ctx->screen->trace_bo) {
> +   struct r600_screen *rscreen = ctx->screen;
> +   unsigned i;
> +
> +   for (i = 0; i < 10; i++) {
> +   usleep(5);
> +   if (!ctx->ws->buffer_is_busy(rscreen->trace_bo->buf, 
> RADEON_USAGE_READWRITE)) {
> +   break;
> +   }
> +   }
> +   if (i == 10) {
> +   fprintf(stderr, "timeout on cs lockup likely happen 
> at cs %d dw %d\n",
> +   rscreen->trace_ptr[1], rscreen->trace_ptr[0]);
> +   } else {
> +   fprintf(stderr, "cs %d executed in %dms\n", 
> rscreen->trace_ptr[1], i * 5);
> +   }
> +   }
> +#endif
>
> r600_begin_new_cs(ctx);
>  }
> diff --git a/src/gallium/drivers/r600/r600_hw_context_priv.h 
> b/src/gallium/drivers/r600/r600_hw_context_priv.h
> index 050c472..692e6ec 100644
> --- a/src/gallium/drivers/r600/r600_hw_context_priv.h
> +++ b/src/gallium/drivers/r600/r600_hw_context_priv.h
> @@ -29,8 +29,9 @@
>  #include "r600_pipe.h"
>
>  /* the number of CS dwords for flushing and drawing */
> -#define R

Re: [Mesa-dev] [PATCH 1/2] r600g: rework flusing and synchronization pattern v4

2012-12-08 Thread Jerome Glisse
On Sat, Dec 8, 2012 at 7:27 PM, Marek Olšák  wrote:
> Hi Jerome,
>
> I'm okay with the simplification of r600_flush_emit, I'm not so okay
> with some other things. There's also some cruft unrelated to flushing.
>
> 1) R600_CONTEXT_FLUSH could have a better name, because it's not clear
> what it does. (it looks like it only flushed read-only bindings)

GPU_FLUSH ?

> 2) Don't use magic numbers when setting cp_coher_cntl unless you want
> to hide something from us / obfuscating the code. :)
>
> 3) The definition of R600_MAX_FLUSH_CS_DWORDS should be updated.

Yes i haven't recomputed worst case

> 4) SURFACE_BASE_UPDATE is emitted twice in emit_framebuffer_state. I
> don't think splitting one packet into two packets doing the same thing
> is needed.

It's need couple r6xx/r7xx gpu will lockup after couple hour of
stressing, wasn't seeing lockup with it.

> 5) RS780 and RS880 don't need SURFACE_BASE_UPDATE for streamout. Their
> streamout hardware was actually copied from R700. Doing "< CHIP_RS780"
> instead of "< CHIP_RV770" was correct. The same for r600_flush_emit.

fglrx mostly do the same on r7xx and r6xx for streamout as i am not
sure i have any stressing test for that i side on fglrx side.

> 6) In r600_context_flush, don't remove the comment about flushing
> framebuffer caches, because it's still done there.
>
> 7) Masking out R600_CONTEXT_FLUSH in r600_context_emit_fence is not
> correct. We should still flush the caches later if they're dirty and
> even if the fence was emitted. You can't see this regression in
> piglit, because we don't have a test for that.
True
> 8) There's some inconsistent flushing between graphics and compute
> colorbuffer bindings. For graphics, you use (WAIT_IDLE |
> FLUSH_AND_INV), which makes sense. For compute, you use
> R600_CONTEXT_FLUSH (which is used for vertex buffers and the like
> elsewhere, but not colorbuffers).

I haven't paid much attention to compute side, i should probably look at it.

> And one question:
>
> Why do you use set both FLUSH_AND_INV and STREAMOUT_FLUSH on
> Evergreen, while r600 only gets FLUSH_AND_INV? Did you overlook this?

No, just matching fglrx pattern, i don't think i tested without that
change, but it definitly match fglrx.

Cheers,
Jerome

> Marek
>
> On Thu, Dec 6, 2012 at 8:51 PM,   wrote:
>> From: Jerome Glisse 
>>
>> This bring r600g allmost inline with closed source driver when
>> it comes to flushing and synchronization pattern.
>>
>> Signed-off-by: Jerome Glisse 
>> ---
>>  src/gallium/drivers/r600/evergreen_compute.c   |   8 +-
>>  .../drivers/r600/evergreen_compute_internal.c  |   4 +-
>>  src/gallium/drivers/r600/evergreen_state.c |   4 +-
>>  src/gallium/drivers/r600/r600.h|  16 +--
>>  src/gallium/drivers/r600/r600_hw_context.c | 154 
>> -
>>  src/gallium/drivers/r600/r600_state.c  |  18 ++-
>>  src/gallium/drivers/r600/r600_state_common.c   |  19 ++-
>>  7 files changed, 61 insertions(+), 162 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
>> b/src/gallium/drivers/r600/evergreen_compute.c
>> index 44831a7..33a5910 100644
>> --- a/src/gallium/drivers/r600/evergreen_compute.c
>> +++ b/src/gallium/drivers/r600/evergreen_compute.c
>> @@ -98,7 +98,7 @@ static void evergreen_cs_set_vertex_buffer(
>>
>> /* The vertex instructions in the compute shaders use the texture 
>> cache,
>>  * so we need to invalidate it. */
>> -   rctx->flags |= R600_CONTEXT_TEX_FLUSH;
>> +   rctx->flags |= R600_CONTEXT_FLUSH;
>> state->enabled_mask |= 1 << vb_index;
>> state->dirty_mask |= 1 << vb_index;
>> state->atom.dirty = true;
>> @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
>> const uint *block_layout,
>>  */
>> r600_emit_command_buffer(ctx->cs, &ctx->start_compute_cs_cmd);
>>
>> -   ctx->flags |= R600_CONTEXT_CB_FLUSH;
>> +   ctx->flags |= R600_CONTEXT_FLUSH;
>> r600_flush_emit(ctx);
>>
>> /* Emit colorbuffers. */
>> @@ -409,7 +409,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
>> const uint *block_layout,
>>
>> /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 
>> 0x
>>  */
>> -   ctx->flags |= R600_CONTEXT_CB_FLUSH;
>> +   ctx->flags |= R600_CONTEXT_FLUSH;
>> r600_flush_emit(ct

Re: [Mesa-dev] Proposal: allow hidden security bugs on Mesa's Bugzilla

2012-11-30 Thread Jerome Glisse
On Fri, Nov 30, 2012 at 7:43 AM, Benoit Jacob  wrote:
> On 12-11-23 02:21 PM, Benoit Jacob wrote:
>> On 12-11-21 12:48 PM, Chad Versace wrote:
>>> On 11/20/2012 09:29 AM, Benoit Jacob wrote:
>>>
 Any questions?
 Do you support or oppose me asking FD.o admins to allow hidden bugs on
 Mesa's bugzilla?

 Benoit
>>> I support this. It seems a sensible proposal for addressing security bugs.
>>>
>> Thanks. I have just sent the request to FD.o admins.
>>
>> Benoit
>
> This option is now turned on on Bugzilla.
>
> See the new checkbox: "Mesa Security Group"
>
> Thanks!
> Benoit
>

How does one get into the security group ?

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake

2012-11-09 Thread Jerome Glisse
On Thu, Nov 01, 2012 at 03:13:31AM +0100, Marek Olšák wrote:
> On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher  wrote:
> > On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák  wrote:
> >> The problem was we set VRAM|GTT for relocations of STATIC resources.
> >> Setting just VRAM increases the framerate 4 times on my machine.
> >>
> >> I rewrote the switch statement and adjusted the domains for window
> >> framebuffers too.
> >
> > Reviewed-by: Alex Deucher 
> >
> > Stable branches?
> 
> Yes, good idea.
> 
> Marek

Btw as a follow up on this, i did some experiment with ttm and eviction.
Blocking any vram eviction improve average fps (20-30%) and minimum fps
(40-60%) but it diminish maximum fps (100%). Overall blocking eviction
just make framerate more consistant.

I then tried several heuristic on the eviction process (not evicting buffer
if buffer was use in the last 1ms, 10ms, 20ms ..., sorting lru differently
btw buffer used for rendering and auxiliary buffer use by kernel, ...
none of those heuristic improved anything. I also removed bo wait in the
eviction pipeline but still no improvement. Haven't time to look further
but anyway bottom line is that some benchmark are memory tight and constant
eviction hurt.

(used unigine heaven and reaction quake for benchmark)

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake

2012-10-31 Thread Jerome Glisse
On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák  wrote:
> The problem was we set VRAM|GTT for relocations of STATIC resources.
> Setting just VRAM increases the framerate 4 times on my machine.
>
> I rewrote the switch statement and adjusted the domains for window
> framebuffers too.

Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/r600_buffer.c  |   42 
> ---
>  src/gallium/drivers/r600/r600_texture.c |3 ++-
>  2 files changed, 24 insertions(+), 21 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_buffer.c 
> b/src/gallium/drivers/r600/r600_buffer.c
> index f4566ee..116ab51 100644
> --- a/src/gallium/drivers/r600/r600_buffer.c
> +++ b/src/gallium/drivers/r600/r600_buffer.c
> @@ -206,29 +206,31 @@ bool r600_init_resource(struct r600_screen *rscreen,
>  {
> uint32_t initial_domain, domains;
>
> -   /* Staging resources particpate in transfers and blits only
> -* and are used for uploads and downloads from regular
> -* resources.  We generate them internally for some transfers.
> -*/
> -   if (usage == PIPE_USAGE_STAGING) {
> +   switch(usage) {
> +   case PIPE_USAGE_STAGING:
> +   /* Staging resources participate in transfers, i.e. are used
> +* for uploads and downloads from regular resources.
> +* We generate them internally for some transfers.
> +*/
> +   initial_domain = RADEON_DOMAIN_GTT;
> domains = RADEON_DOMAIN_GTT;
> +   break;
> +   case PIPE_USAGE_DYNAMIC:
> +   case PIPE_USAGE_STREAM:
> +   /* Default to GTT, but allow the memory manager to move it to 
> VRAM. */
> initial_domain = RADEON_DOMAIN_GTT;
> -   } else {
> domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM;
> -
> -   switch(usage) {
> -   case PIPE_USAGE_DYNAMIC:
> -   case PIPE_USAGE_STREAM:
> -   case PIPE_USAGE_STAGING:
> -   initial_domain = RADEON_DOMAIN_GTT;
> -   break;
> -   case PIPE_USAGE_DEFAULT:
> -   case PIPE_USAGE_STATIC:
> -   case PIPE_USAGE_IMMUTABLE:
> -   default:
> -   initial_domain = RADEON_DOMAIN_VRAM;
> -   break;
> -   }
> +   break;
> +   case PIPE_USAGE_DEFAULT:
> +   case PIPE_USAGE_STATIC:
> +   case PIPE_USAGE_IMMUTABLE:
> +   default:
> +   /* Don't list GTT here, because the memory manager would put 
> some
> +* resources to GTT no matter what the initial domain is.
> +* Not listing GTT in the domains improves performance a lot. 
> */
> +   initial_domain = RADEON_DOMAIN_VRAM;
> +   domains = RADEON_DOMAIN_VRAM;
> +   break;
> }
>
> res->buf = rscreen->ws->buffer_create(rscreen->ws, size, alignment, 
> bind, initial_domain);
> diff --git a/src/gallium/drivers/r600/r600_texture.c 
> b/src/gallium/drivers/r600/r600_texture.c
> index 785eeff..2df390d 100644
> --- a/src/gallium/drivers/r600/r600_texture.c
> +++ b/src/gallium/drivers/r600/r600_texture.c
> @@ -421,9 +421,10 @@ r600_texture_create_object(struct pipe_screen *screen,
> return NULL;
> }
> } else if (buf) {
> +   /* This is usually the window framebuffer. We want it in 
> VRAM, always. */
> resource->buf = buf;
> resource->cs_buf = rscreen->ws->buffer_get_cs_handle(buf);
> -   resource->domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM;
> +   resource->domains = RADEON_DOMAIN_VRAM;
> }
>
> if (rtex->cmask_size) {
> --
> 1.7.9.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600 tiling halves the frame rate

2012-10-31 Thread Jerome Glisse
On Tue, Oct 30, 2012 at 8:49 PM, Tzvetan Mikov  wrote:
> On 10/30/2012 05:20 PM, Tzvetan Mikov wrote:
>>
>> Thanks a lot! I reproduced the same results here and I think I have
>> figured out what the problem is. The frame buffer is always created in
>> linear mode. The temporary hack included below doubles the performance
>> for me with EGL.
>>
>> Could you please check if it has the same result for you?
>>
>> If it does, what would be the next step to address this? I guess I could
>> try to prepare a real patch to fix this, as soon as I figure the right
>> way to do it... :-) I am new to Mesa, but I am making my way through the
>> code base.
>>
>> regards,
>> Tzvetan
>>
>>
>> commit 10bb3497caba1655022a53a3a04c81be6e122faa
>> Author: Tzvetan Mikov 
>> Date:   Tue Oct 30 17:12:42 2012 -0700
>>
>>  r600_texture.c: HACK to enforce tiling in the default case
>>
>> diff --git a/src/gallium/drivers/r600/r600_texture.c
>> b/src/gallium/drivers/r600/r600_texture.c
>> index 85e4e0c..f415de3 100644
>> --- a/src/gallium/drivers/r600/r600_texture.c
>> +++ b/src/gallium/drivers/r600/r600_texture.c
>> @@ -450,7 +450,7 @@ struct pipe_resource *r600_texture_create(struct
>> pipe_screen *screen,
>>   {
>>   struct r600_screen *rscreen = (struct r600_screen*)screen;
>>   struct radeon_surface surface;
>> -unsigned array_mode = 0;
>> +unsigned array_mode = V_038000_ARRAY_1D_TILED_THIN1;
>>   int r;
>>
>>   if (!(templ->flags & R600_RESOURCE_FLAG_TRANSFER)) {
>>
>>
>
> I just noticed that with this hack the display doesn't look quite right, so
> while it hopefully points in the right direction, the real fix is likely to
> be much more involved. My enthusiasm may have been premature :-)
>
> regards,
> Tzvetan

For it to look right we need mesa to call into the kernel to tell the
kernel what is the bo tiling format. We should do that for scanout
buffer. This will fix your issue and you probably want 2d tiled not 1d
for scanout.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600 tiling halves the frame rate

2012-10-30 Thread Jerome Glisse
On Tue, Oct 30, 2012 at 10:43 AM, Tzvetan Mikov  wrote:
> On 10/30/2012 07:12 AM, Patrick Baggett wrote:
>> Is your screen refresh rate 70 Hz? Because if so, that means that it's
>> syncing to the vblank on Mesa, and not doing so on the proprietary one.
>
> Unfortunately no. In fact the Gallium EGL/R600 doesn't support flip on
> vsync at all - eglSwapInterval is always 0. The output is a standard
> 60Hz LCD, plus I do get different, (but still low in absolute terms)
> frame rates with different chips. Off the top of my head:
> - HD5430 - 120 FPS
> - HD6450 - 140 FPS
> - HD6460 -  70 FPS
> - HD6750 - 400 FPS
> - HD6760 - 240 FPS
>
> I do think there is something fishy with the page flip though, which I
> am planning to investigate today. It is way too slow - a render loop
> which does nothing but a eglSwapBuffers() (no actual rendering
> whatsoever) runs at only 350 FPS. It should be either 60FPS, or thousands.
>
> regards,
> Tzvetan
>

So tested, it's something inside egl that lead to this, same program
as yours with glut on X11 with 2d tiling enabled and 2d color tiling
have a slight advantage 140fps vs 137fps (windowed so there is a blit
which would account for a hugue chunk of perf diff with fglrx).

However using egl i got 70fps with color tiling and 74fps without. So
something in egl is slowing things down.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600 tiling halves the frame rate

2012-10-26 Thread Jerome Glisse
On Fri, Oct 26, 2012 at 10:26 PM, Tzvetan Mikov  wrote:
>> -Original Message-
>> From: Jerome Glisse
>
>> > Can anyone shed some light on this? Is this by design - e.g. is
>> > this a case of "we know that tiling is currently slower than linear
>> > but the huge payoff is scheduled to arrive in a future revision"?
>> >
>> > Thanks!
>> > Tzvetan
>>
>> No, in all benchmark i made on various gpu from hd2xxx to hd6xxx
>> tiling always gave a performance boost btw 5% up to 20%.
>
> This is interesting. All I am doing is rotating a big texture on the
> screen. I am using EGL+Gallium, so it is as simple as it gets.
>
> The hack I am using to disable texture tiling is also extremely simple
> (see below). It speeds up the FPS measurably, up to the extreme
> case of doubling it on HD6460.
>
> What am I missing?
>
> Regards,
> Tzvetan
>

Could you provide a simple gl demo or point to one that shows the same
behavior with your patch. So i have something to know if i am
reproducing or not

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: avoid shader needing too many gpr to lockup the gpu

2012-10-26 Thread Jerome Glisse
On Fri, Oct 26, 2012 at 10:01 PM,   wrote:
> From: Jerome Glisse 
>
> On r6xx/r7xx shader resource management need to make sure that the
> shader does not goes over the gpr register limit. Each specific
> asic has a maxmimum register that can be split btw shader stage.
> For each stage the shader must not use more register than the
> limit programmed.
>
> Signed-off-by: Jerome Glisse 

I haven't yet fully tested it on wide range of GPU but it fixes piglit
case that were locking up o one can directly use quick-drivers. I
mostly would like feedback on if we should print a warning when we
discard a draw command because shader exceed limit.

Note that with this patch the test that were locking up fails but with
a simple patch on top of that (decreasing clause temp gpr to 2) they
pass.

Regards,
Jerome

> ---
>  src/gallium/drivers/r600/r600_pipe.h |  1 +
>  src/gallium/drivers/r600/r600_state.c| 60 
> +++-
>  src/gallium/drivers/r600/r600_state_common.c | 22 +-
>  3 files changed, 55 insertions(+), 28 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_pipe.h 
> b/src/gallium/drivers/r600/r600_pipe.h
> index ff2a5fd..2045af3 100644
> --- a/src/gallium/drivers/r600/r600_pipe.h
> +++ b/src/gallium/drivers/r600/r600_pipe.h
> @@ -363,6 +363,7 @@ struct r600_context {
> enum chip_class chip_class;
> boolean has_vertex_cache;
> boolean keep_tiling_flags;
> +   booldiscard_draw;
> unsigneddefault_ps_gprs, default_vs_gprs;
> unsignedr6xx_num_clause_temp_gprs;
> unsignedbackend_mask;
> diff --git a/src/gallium/drivers/r600/r600_state.c 
> b/src/gallium/drivers/r600/r600_state.c
> index 7d07008..43af934 100644
> --- a/src/gallium/drivers/r600/r600_state.c
> +++ b/src/gallium/drivers/r600/r600_state.c
> @@ -2189,30 +2189,54 @@ void r600_init_state_functions(struct r600_context 
> *rctx)
>  /* Adjust GPR allocation on R6xx/R7xx */
>  void r600_adjust_gprs(struct r600_context *rctx)
>  {
> -   unsigned num_ps_gprs = rctx->default_ps_gprs;
> -   unsigned num_vs_gprs = rctx->default_vs_gprs;
> +   unsigned num_ps_gprs = rctx->ps_shader->current->shader.bc.ngpr;
> +   unsigned num_vs_gprs = rctx->vs_shader->current->shader.bc.ngpr;
> +   unsigned new_num_ps_gprs = num_ps_gprs;
> +   unsigned new_num_vs_gprs = num_vs_gprs;
> +   unsigned cur_num_ps_gprs = 
> G_008C04_NUM_PS_GPRS(rctx->config_state.sq_gpr_resource_mgmt_1);
> +   unsigned cur_num_vs_gprs = 
> G_008C04_NUM_VS_GPRS(rctx->config_state.sq_gpr_resource_mgmt_1);
> +   unsigned def_num_ps_gprs = rctx->default_ps_gprs;
> +   unsigned def_num_vs_gprs = rctx->default_vs_gprs;
> +   unsigned def_num_clause_temp_gprs = rctx->r6xx_num_clause_temp_gprs;
> +   /* hardware will reserve twice num_clause_temp_gprs */
> +   unsigned max_gprs = def_num_ps_gprs + def_num_vs_gprs + 
> def_num_clause_temp_gprs * 2;
> unsigned tmp;
> -   int diff;
>
> -   if (rctx->ps_shader->current->shader.bc.ngpr > rctx->default_ps_gprs) 
> {
> -   diff = rctx->ps_shader->current->shader.bc.ngpr - 
> rctx->default_ps_gprs;
> -   num_vs_gprs -= diff;
> -   num_ps_gprs += diff;
> +   /* the sum of all SQ_GPR_RESOURCE_MGMT*.NUM_*_GPRS must <= to 
> max_gprs */
> +   if (new_num_ps_gprs > cur_num_ps_gprs || new_num_vs_gprs > 
> cur_num_vs_gprs) {
> +   /* try to use switch back to default */
> +   if (new_num_ps_gprs > def_num_ps_gprs || new_num_vs_gprs > 
> def_num_vs_gprs) {
> +   /* always privilege vs stage so that at worst we have 
> the
> +* pixel stage producing wrong output (not the vertex
> +* stage) */
> +   new_num_ps_gprs = max_gprs - (new_num_vs_gprs + 
> def_num_clause_temp_gprs * 2);
> +   new_num_vs_gprs = num_vs_gprs;
> +   } else {
> +   new_num_ps_gprs = def_num_ps_gprs;
> +   new_num_vs_gprs = def_num_vs_gprs;
> +   }
> +   } else {
> +   rctx->discard_draw = false;
> +   return;
> }
>
> -   if (rctx->vs_shader->current->shader.bc.ngpr > rctx->default_vs_gprs)
> -   {
> -   diff = rctx->vs_shader->current->shader.bc.ngpr - 
> rctx->default_vs_gprs;
> -   num_ps_

Re: [Mesa-dev] R600 tiling halves the frame rate

2012-10-26 Thread Jerome Glisse
On Fri, Oct 26, 2012 at 8:07 PM, Tzvetan Mikov  wrote:
> Hi,
> I have been running tests with Mesa 9.0 and Rdeon R600 (Radeon HD 6460) and I 
> accidentally noticed that a small hack I did to disable texture tiling, 
> actually *doubles* the frame rate. With different chips (e.g. 6750) the 
> difference is less pronounced, but in all cases texture tiling decreased the 
> performance noticeably in my tests.
>
> Can anyone shed some light on this? Is this by design - e.g. is this a case 
> of "we know that tiling is currently slower than linear but the huge payoff 
> is scheduled to arrive in a future revision"?
>
> Thanks!
> Tzvetan

No, in all benchmark i made on various gpu from hd2xxx to hd6xxx
tiling always gave a performance boost btw 5% up to 20%.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/14] r600g: remove the "atom" variable from r600_command_buffer

2012-10-09 Thread Jerome Glisse
On Sun, Oct 07, 2012 at 08:08:03PM +0200, Marek Olšák wrote:
> r600_command_buffer is not an atom.
> 
> The "atoms" have evolved into state slots (or groups of state slots) where
> you can bind states. There is a fixed amount of atoms (state slots)
> in the context.
> 
> The command buffers are nothing like that. They represent states, not state
> slots.
> 
> We could probably give r600_atom a better name someday.

For the serie:
Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/evergreen_compute.c |4 +--
>  src/gallium/drivers/r600/evergreen_state.c   |4 +--
>  src/gallium/drivers/r600/r600_hw_context.c   |4 +--
>  src/gallium/drivers/r600/r600_pipe.h |   44 
> +++---
>  src/gallium/drivers/r600/r600_state.c|2 +-
>  src/gallium/drivers/r600/r600_state_common.c |   13 +---
>  6 files changed, 34 insertions(+), 37 deletions(-)
> 
> diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
> b/src/gallium/drivers/r600/evergreen_compute.c
> index b7c7345..abd5b3c 100644
> --- a/src/gallium/drivers/r600/evergreen_compute.c
> +++ b/src/gallium/drivers/r600/evergreen_compute.c
> @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
> const uint *block_layout,
>* See evergreen_init_atom_start_compute_cs() in this file for the list
>* of registers initialized by the start_compute_cs_cmd atom.
>*/
> - r600_emit_atom(ctx, &ctx->start_compute_cs_cmd.atom);
> + r600_emit_command_buffer(ctx->cs, &ctx->start_compute_cs_cmd);
>  
>   ctx->flags |= R600_CONTEXT_CB_FLUSH;
>   r600_flush_emit(ctx);
> @@ -625,7 +625,7 @@ void evergreen_init_atom_start_compute_cs(struct 
> r600_context *ctx)
>   /* since all required registers are initialised in the
>* start_compute_cs_cmd atom, we can EMIT_EARLY here.
>*/
> - r600_init_command_buffer(ctx, cb, 1, 256);
> + r600_init_command_buffer(cb, 256);
>   cb->pkt_flags = RADEON_CP_PACKET3_COMPUTE_MODE;
>  
>   switch (ctx->family) {
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index e35314f..a073021 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -2373,7 +2373,7 @@ static void cayman_init_atom_start_cs(struct 
> r600_context *rctx)
>  {
>   struct r600_command_buffer *cb = &rctx->start_cs_cmd;
>  
> - r600_init_command_buffer(rctx, cb, 0, 256);
> + r600_init_command_buffer(cb, 256);
>  
>   /* This must be first. */
>   r600_store_value(cb, PKT3(PKT3_CONTEXT_CONTROL, 1, 0));
> @@ -2774,7 +2774,7 @@ void evergreen_init_atom_start_cs(struct r600_context 
> *rctx)
>   return;
>   }
>  
> - r600_init_command_buffer(rctx, cb, 0, 256);
> + r600_init_command_buffer(cb, 256);
>  
>   /* This must be first. */
>   r600_store_value(cb, PKT3(PKT3_CONTEXT_CONTROL, 1, 0));
> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
> b/src/gallium/drivers/r600/r600_hw_context.c
> index 8245059..723039a 100644
> --- a/src/gallium/drivers/r600/r600_hw_context.c
> +++ b/src/gallium/drivers/r600/r600_hw_context.c
> @@ -815,7 +815,7 @@ void r600_context_flush(struct r600_context *ctx, 
> unsigned flags)
>  {
>   struct radeon_winsys_cs *cs = ctx->cs;
>  
> - if (cs->cdw == ctx->start_cs_cmd.atom.num_dw)
> + if (cs->cdw == ctx->start_cs_cmd.num_dw)
>   return;
>  
>   ctx->timer_queries_suspended = false;
> @@ -875,7 +875,7 @@ void r600_begin_new_cs(struct r600_context *ctx)
>   ctx->flags = 0;
>  
>   /* Begin a new CS. */
> - r600_emit_atom(ctx, &ctx->start_cs_cmd.atom);
> + r600_emit_command_buffer(ctx->cs, &ctx->start_cs_cmd);
>  
>   /* Re-emit states. */
>   r600_atom_dirty(ctx, &ctx->alphatest_state.atom);
> diff --git a/src/gallium/drivers/r600/r600_pipe.h 
> b/src/gallium/drivers/r600/r600_pipe.h
> index 607116f..be7b891 100644
> --- a/src/gallium/drivers/r600/r600_pipe.h
> +++ b/src/gallium/drivers/r600/r600_pipe.h
> @@ -59,8 +59,8 @@ struct r600_atom {
>  /* This is an atom containing GPU commands that never change.
>   * This is supposed to be copied directly into the CS. */
>  struct r600_command_buffer {
> - struct r600_atom atom;
>   uint32_t *buf;
> + unsigned num_dw;
>   unsigned max_num_dw;
>   unsigned pkt_flags;
>  };
> @@ -504,6 +504,14 @@ struct r600_context {
>   int last_start_instance;
>  };
>  
> +static INLINE v

Re: [Mesa-dev] [PATCH] r600g: add in-place DB decompression and texturing with DB tiling

2012-10-04 Thread Jerome Glisse
On Wed, Oct 3, 2012 at 5:50 PM, Marek Olšák  wrote:
> The decompression is done in-place and only the compressed tiles are
> decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F.
>
> The texture unit is programmed to use non-displayable tiling and depth
> ordering of samples, so that it can fetch the texture in the native DB format.
>
> The latest version of the libdrm surface allocator is required for stencil
> texturing to work. The old one didn't create the mipmap tree correctly.
> We need a separate mipmap tree for stencil, because the stencil mipmap
> offsets are not really depth offsets/4.
>
> The DB->CB copy is still used for transfers.
> ---
>
> I sent the libdrm patches a few minutes ago. I guess I will have to make 
> another libdrm release.
>
> What's good about this is that it improves performance by 4-5% with the 
> 1024x768 resolution in Lightsmark on Evergreen. However, the larger the 
> resolution, the smaller the improvement is (something else becomes the 
> bottleneck). It also reduces the memory requirements for depth textures by 
> 50%, because the "flushed depth texture" isn't needed anymore.
>
> The catch is fetching the 4th stencil mipmap level gives wrong pixels in one 
> not-yet-committed test. What's weird is that all the other mipmaps (both 
> smaller and larger) are fetched correctly. That bug has yet to be fixed, but 
> who is using a stencil buffer with mipmaps anyway? :)

This 4th level might be the usual switching point btw 2d tiled and 1d
tiled ... ie we think the hw is still using 2d while it switched to 1d
(or the other way around)

Otherwise reviewed

Cheers,
Jerome

>
>  src/gallium/auxiliary/util/u_blitter.c |3 +-
>  .../drivers/r600/evergreen_compute_internal.c  |6 +-
>  src/gallium/drivers/r600/evergreen_state.c |   92 
> +++-
>  src/gallium/drivers/r600/evergreend.h  |   10 ++-
>  src/gallium/drivers/r600/r600_blit.c   |   89 ---
>  src/gallium/drivers/r600/r600_pipe.h   |1 +
>  src/gallium/drivers/r600/r600_resource.h   |   10 ++-
>  src/gallium/drivers/r600/r600_state.c  |   13 +--
>  src/gallium/drivers/r600/r600_texture.c|   60 -
>  9 files changed, 216 insertions(+), 68 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_blitter.c 
> b/src/gallium/auxiliary/util/u_blitter.c
> index 4ad7a6b..86109f0 100644
> --- a/src/gallium/auxiliary/util/u_blitter.c
> +++ b/src/gallium/auxiliary/util/u_blitter.c
> @@ -1602,7 +1602,8 @@ void util_blitter_custom_depth_stencil(struct 
> blitter_context *blitter,
> blitter_disable_render_cond(ctx);
>
> /* bind states */
> -   pipe->bind_blend_state(pipe, ctx->blend[PIPE_MASK_RGBA]);
> +   pipe->bind_blend_state(pipe, cbsurf ? ctx->blend[PIPE_MASK_RGBA] :
> + ctx->blend[0]);
> pipe->bind_depth_stencil_alpha_state(pipe, dsa_stage);
> ctx->bind_fs_state(pipe, blitter_get_fs_col(ctx, 0, FALSE));
> pipe->bind_vertex_elements_state(pipe, ctx->velem_state);
> diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
> b/src/gallium/drivers/r600/evergreen_compute_internal.c
> index 496d099..b937135 100644
> --- a/src/gallium/drivers/r600/evergreen_compute_internal.c
> +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
> @@ -480,7 +480,7 @@ void evergreen_set_tex_resource(
>
> unsigned format, endian;
> uint32_t word4 = 0, yuv_format = 0, pitch = 0;
> -   unsigned char swizzle[4], array_mode = 0, tile_type = 0;
> +   unsigned char swizzle[4], array_mode = 0, non_disp_tiling = 0;
> unsigned height, depth;
>
> swizzle[0] = 0;
> @@ -503,7 +503,7 @@ void evergreen_set_tex_resource(
> pitch = align(tmp->surface.level[0].nblk_x *
> util_format_get_blockwidth(tmp->resource.b.b.format), 8);
> array_mode = tmp->array_mode[0];
> -   tile_type = tmp->tile_type;
> +   non_disp_tiling = tmp->non_disp_tiling;
>
> assert(view->base.texture->target != PIPE_TEXTURE_1D_ARRAY);
> assert(view->base.texture->target != PIPE_TEXTURE_2D_ARRAY);
> @@ -513,7 +513,7 @@ void evergreen_set_tex_resource(
> evergreen_emit_raw_value(res,
> 
> (S_03_DIM(r600_tex_dim(view->base.texture->target)) |
> S_03_PITCH((pitch / 8) - 1) |
> -   S_03_NON_DISP_TILING_ORDER(tile_type) |
> +   
> S_03_NON_DISP_TILING_ORDER(non_disp_tiling) |
> S_03_TEX_WIDTH(view->base.texture->width0 
> - 1)));
> evergreen_emit_raw_value(res, (S_030004_TEX_HEIGHT(height - 1) |
> S_030004_TEX_DEPTH(depth - 1) |
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index c126e7d..5a14934 100644
> --

Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-13 Thread Jerome Glisse
On Wed, Sep 12, 2012 at 5:24 PM, Jerome Glisse  wrote:
> On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák  wrote:
>> On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse  wrote:
>>> On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák  wrote:
>>>> Please provide information about the GPU and the test which locks up. I'd
>>>> like to reproduce it. Also please explain what's the cause of the
>>>> lockup if you know it (which registers are not emitted in the correct
>>>> order and how it can fixed).
>>>>
>>>> Marek
>>>>
>>>
>>> For instance
>>> http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh
>>>
>>> will lockup probably any r6xx/r7xx (definitely rv670 & rv770)
>>>
>>> I know that the whole vgt register order is picky and that most of
>>> them need to be emitted before ta_cntl_aux and before cb/db. But the
>>> ordering relative to pa is kind of weird and moving when looking at
>>> fglrx.
>>
>> I tested RS880, which is very similar to RV670, and it didn't hang. I
>> can test RV670 later and if there's any issue, I'll fix it. I'd like
>> this patch to be fixed instead of dropped, that's why I'm asking and I
>> still haven't got a definitive answer how to change the patch, so that
>> it can be pushed. Besides that...
>>
>> Has it ever occured to you that the register ordering is changing in
>> fglrx, because the ordering doesn't matter at all, just like Alex
>> said, and the closed driver devs wrote it that way because they didn't
>> care about the ordering either?
>>
>> I think the lockups you are seeing on r600-r700 are actually caused by
>> something entirely different and it confuses you. See this thread from
>> the comment #9 onwards:
>> https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9
>>
>> Marek
>
> This modified version is fine (rv670,rv770, caicos)
> http://people.freedesktop.org/~glisse/0001-r600g-convert-the-remnants-of-VGT-state-into-immedia.patch
>
> Cheers,
> Jerome

This one also works

http://people.freedesktop.org/~glisse/0001-r600g-convert-the-remnants-of-VGT-state-into-immedia.patch

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-12 Thread Jerome Glisse
On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák  wrote:
> On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse  wrote:
>> On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák  wrote:
>>> Please provide information about the GPU and the test which locks up. I'd
>>> like to reproduce it. Also please explain what's the cause of the
>>> lockup if you know it (which registers are not emitted in the correct
>>> order and how it can fixed).
>>>
>>> Marek
>>>
>>
>> For instance
>> http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh
>>
>> will lockup probably any r6xx/r7xx (definitely rv670 & rv770)
>>
>> I know that the whole vgt register order is picky and that most of
>> them need to be emitted before ta_cntl_aux and before cb/db. But the
>> ordering relative to pa is kind of weird and moving when looking at
>> fglrx.
>
> I tested RS880, which is very similar to RV670, and it didn't hang. I
> can test RV670 later and if there's any issue, I'll fix it. I'd like
> this patch to be fixed instead of dropped, that's why I'm asking and I
> still haven't got a definitive answer how to change the patch, so that
> it can be pushed. Besides that...
>
> Has it ever occured to you that the register ordering is changing in
> fglrx, because the ordering doesn't matter at all, just like Alex
> said, and the closed driver devs wrote it that way because they didn't
> care about the ordering either?
>
> I think the lockups you are seeing on r600-r700 are actually caused by
> something entirely different and it confuses you. See this thread from
> the comment #9 onwards:
> https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9
>
> Marek

This modified version is fine (rv670,rv770, caicos)
http://people.freedesktop.org/~glisse/0001-r600g-convert-the-remnants-of-VGT-state-into-immedia.patch

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: add htile support v9

2012-09-12 Thread Jerome Glisse
On Tue, Jul 17, 2012 at 1:58 PM,   wrote:
> From: Jerome Glisse 
>
> htile is used for HiZ and HiS support and fast Z/S clears.
> This commit just adds the htile setup and Fast Z clear.
> We don't take full advantage of HiS with that patch.
>
> v2 really use fast clear, still random issue with some tiles
>need to try more flush combination, fix depth/stencil
>texture decompression
> v3 fix random issue on r6xx/r7xx
> v4 rebase on top of lastest mesa, disable CB export when clearing
>htile surface to avoid wasting bandwidth
> v5 resummarize htile surface when uploading z value. Fix z/stencil
>decompression, the custom blitter with custom dsa is no longer
>needed.
> v6 Reorganize render control/override update mecanism, fixing more
>issues in the process.
> v7 Add nop after depth surface base update to work around some htile
>flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
>have issue. Do not enable hyperz when flushing/uncompressing
>depth buffer.
> v8 Fix htile surface, preload and prefetch setup. Only set preload
>and prefetch on htile surface clear like fglrx. Record depth
>clear value per level. Support several level for the htile
>surface. First depth clear can't be a fast clear.
> v9 Fix comments, properly account new register in emit function,
>disable fast zclear if clearing different layer of texture
>array to different value
>
> Signed-off-by: Pierre-Eric Pelloux-Prayer 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Jerome Glisse 

Btw v11 version against newer mesa is at:
http://people.freedesktop.org/~glisse/0001-r600g-add-htile-support-v11.patch

Cheers,
Jerome

> ---
>  src/gallium/drivers/r600/evergreen_hw_context.c |6 +
>  src/gallium/drivers/r600/evergreen_state.c  |  102 -
>  src/gallium/drivers/r600/evergreend.h   |4 +
>  src/gallium/drivers/r600/r600_blit.c|   38 +++
>  src/gallium/drivers/r600/r600_hw_context.c  |   25 +
>  src/gallium/drivers/r600/r600_pipe.c|8 ++
>  src/gallium/drivers/r600/r600_pipe.h|   13 ++-
>  src/gallium/drivers/r600/r600_resource.h|7 ++
>  src/gallium/drivers/r600/r600_state.c   |  133 
> ---
>  src/gallium/drivers/r600/r600_texture.c |  103 ++
>  src/gallium/drivers/r600/r600d.h|6 +
>  11 files changed, 399 insertions(+), 46 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
> b/src/gallium/drivers/r600/evergreen_hw_context.c
> index 081701f..546c884 100644
> --- a/src/gallium/drivers/r600/evergreen_hw_context.c
> +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
> @@ -62,6 +62,9 @@ static const struct r600_reg evergreen_context_reg_list[] = 
> {
> {GROUP_FORCE_NEW_BLOCK, 0, 0},
> {R_028058_DB_DEPTH_SIZE, 0, 0},
> {R_02805C_DB_DEPTH_SLICE, 0, 0},
> +   {R_02802C_DB_DEPTH_CLEAR, 0, 0},
> +   {R_028ABC_DB_HTILE_SURFACE, 0, 0},
> +   {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
> {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
> {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
> {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
> @@ -319,6 +322,9 @@ static const struct r600_reg cayman_context_reg_list[] = {
> {GROUP_FORCE_NEW_BLOCK, 0, 0},
> {R_028058_DB_DEPTH_SIZE, 0, 0},
> {R_02805C_DB_DEPTH_SLICE, 0, 0},
> +   {R_02802C_DB_DEPTH_CLEAR, 0, 0},
> +   {R_028ABC_DB_HTILE_SURFACE, 0, 0},
> +   {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
> {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
> {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
> {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index a66387b..214d76b 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -710,13 +710,15 @@ static void *evergreen_create_blend_state(struct 
> pipe_context *ctx,
> }
> blend->cb_target_mask = target_mask;
>
> -   if (target_mask)
> +   if (target_mask) {
> color_control |= S_028808_MODE(V_028808_CB_NORMAL);
> -   else
> +   } else {
> color_control |= S_028808_MODE(V_028808_CB_DISABLE);
> +   }
>
> r600_pipe_state_add_reg(rstate, R_028808_CB_COLOR_CONTROL,
> color_control);
> +
> /* only have dual source on MRT0 */
> blend->dual_src_blend = util_blend_state_is_dual(state, 0);
> for (int i = 0; i < 8; i++) {
> @@ -1668,6 +1670,

Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák  wrote:
> On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse  wrote:
>> On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák  wrote:
>>> Please provide information about the GPU and the test which locks up. I'd
>>> like to reproduce it. Also please explain what's the cause of the
>>> lockup if you know it (which registers are not emitted in the correct
>>> order and how it can fixed).
>>>
>>> Marek
>>>
>>
>> For instance
>> http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh
>>
>> will lockup probably any r6xx/r7xx (definitely rv670 & rv770)
>>
>> I know that the whole vgt register order is picky and that most of
>> them need to be emitted before ta_cntl_aux and before cb/db. But the
>> ordering relative to pa is kind of weird and moving when looking at
>> fglrx.
>
> I tested RS880, which is very similar to RV670, and it didn't hang. I
> can test RV670 later and if there's any issue, I'll fix it. I'd like
> this patch to be fixed instead of dropped, that's why I'm asking and I
> still haven't got a definitive answer how to change the patch, so that
> it can be pushed. Besides that...
>
> Has it ever occured to you that the register ordering is changing in
> fglrx, because the ordering doesn't matter at all, just like Alex
> said, and the closed driver devs wrote it that way because they didn't
> care about the ordering either?

fglrx definitly emit register according to certain grouping. Thing is
there is a bunch of register that are emitted in 2/3 or 4 different
group at most of what i have seen. Otherwise all other register are
_always_ emitted as part of same group with the whole group being
emitted. The issue i have is understanding those register that are
emitted in few different ways and how fglrx choose btw those different
one.

>
> I think the lockups you are seeing on r600-r700 are actually caused by
> something entirely different and it confuses you. See this thread from
> the comment #9 onwards:
> https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9
>
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Tue, Sep 11, 2012 at 3:00 PM, Jerome Glisse  wrote:
> On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák  wrote:
>> On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse  wrote:
>>> On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák  wrote:
>>>> Please provide information about the GPU and the test which locks up. I'd
>>>> like to reproduce it. Also please explain what's the cause of the
>>>> lockup if you know it (which registers are not emitted in the correct
>>>> order and how it can fixed).
>>>>
>>>> Marek
>>>>
>>>
>>> For instance
>>> http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh
>>>
>>> will lockup probably any r6xx/r7xx (definitely rv670 & rv770)
>>>
>>> I know that the whole vgt register order is picky and that most of
>>> them need to be emitted before ta_cntl_aux and before cb/db. But the
>>> ordering relative to pa is kind of weird and moving when looking at
>>> fglrx.
>>
>> I tested RS880, which is very similar to RV670, and it didn't hang. I
>> can test RV670 later and if there's any issue, I'll fix it. I'd like
>> this patch to be fixed instead of dropped, that's why I'm asking and I
>> still haven't got a definitive answer how to change the patch, so that
>> it can be pushed. Besides that...
>>
>> Has it ever occured to you that the register ordering is changing in
>> fglrx, because the ordering doesn't matter at all, just like Alex
>> said, and the closed driver devs wrote it that way because they didn't
>> care about the ordering either?
>>
>> I think the lockups you are seeing on r600-r700 are actually caused by
>> something entirely different and it confuses you. See this thread from
>> the comment #9 onwards:
>> https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9
>>
>> Marek
>
> It's simple without that patch no lockup, with it lockup all the time.
> It's just a hard fact, i am not confused about anything, i know for a
> fact that reg grouping/order matter somehow. I run several automated
> tools that compare register value at draw call time btw r600g and
> fglrx while doing hyperz and there was no difference at all, down the
> last bit. One was locking up the other not.
>
> Cheers,
> Jerome

And if your curious r600g command stream good and bad and diff btw bad
and good are at:
http://people.freedesktop.org/~glisse/longprim/

If it's the bad that is emited before the fbo-stencil test then it
lockup, if it's the good one then no lockup.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák  wrote:
> On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse  wrote:
>> On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák  wrote:
>>> Please provide information about the GPU and the test which locks up. I'd
>>> like to reproduce it. Also please explain what's the cause of the
>>> lockup if you know it (which registers are not emitted in the correct
>>> order and how it can fixed).
>>>
>>> Marek
>>>
>>
>> For instance
>> http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh
>>
>> will lockup probably any r6xx/r7xx (definitely rv670 & rv770)
>>
>> I know that the whole vgt register order is picky and that most of
>> them need to be emitted before ta_cntl_aux and before cb/db. But the
>> ordering relative to pa is kind of weird and moving when looking at
>> fglrx.
>
> I tested RS880, which is very similar to RV670, and it didn't hang. I
> can test RV670 later and if there's any issue, I'll fix it. I'd like
> this patch to be fixed instead of dropped, that's why I'm asking and I
> still haven't got a definitive answer how to change the patch, so that
> it can be pushed. Besides that...
>
> Has it ever occured to you that the register ordering is changing in
> fglrx, because the ordering doesn't matter at all, just like Alex
> said, and the closed driver devs wrote it that way because they didn't
> care about the ordering either?
>
> I think the lockups you are seeing on r600-r700 are actually caused by
> something entirely different and it confuses you. See this thread from
> the comment #9 onwards:
> https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9
>
> Marek

It's simple without that patch no lockup, with it lockup all the time.
It's just a hard fact, i am not confused about anything, i know for a
fact that reg grouping/order matter somehow. I run several automated
tools that compare register value at draw call time btw r600g and
fglrx while doing hyperz and there was no difference at all, down the
last bit. One was locking up the other not.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák  wrote:
> Please provide information about the GPU and the test which locks up. I'd
> like to reproduce it. Also please explain what's the cause of the
> lockup if you know it (which registers are not emitted in the correct
> order and how it can fixed).
>
> Marek
>

For instance
http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh

will lockup probably any r6xx/r7xx (definitely rv670 & rv770)

I know that the whole vgt register order is picky and that most of
them need to be emitted before ta_cntl_aux and before cb/db. But the
ordering relative to pa is kind of weird and moving when looking at
fglrx.

Cheers,
Jerome

> On Tue, Sep 11, 2012 at 6:48 PM, Jerome Glisse  wrote:
>> On Mon, Sep 10, 2012 at 7:16 PM, Marek Olšák  wrote:
>>
>> NAK this one introduce lockup. As i said in another email register
>> group/order matter and with this patch i get 100% lockup rate in some
>> test case for instance the test case i reference in my other email
>>
>>> ---
>>>  src/gallium/drivers/r600/evergreen_hw_context.c |   16 ---
>>>  src/gallium/drivers/r600/r600.h |7 -
>>>  src/gallium/drivers/r600/r600_hw_context.c  |   15 ++
>>>  src/gallium/drivers/r600/r600_hw_context_priv.h |2 +-
>>>  src/gallium/drivers/r600/r600_pipe.h|8 +++---
>>>  src/gallium/drivers/r600/r600_state_common.c|   34 
>>> ---
>>>  6 files changed, 26 insertions(+), 56 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
>>> b/src/gallium/drivers/r600/evergreen_hw_context.c
>>> index 483021f..0c2159a 100644
>>> --- a/src/gallium/drivers/r600/evergreen_hw_context.c
>>> +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
>>> @@ -32,10 +32,6 @@ static const struct r600_reg cayman_config_reg_list[] = {
>>> {R_00913C_SPI_CONFIG_CNTL_1, REG_FLAG_ENABLE_ALWAYS | 
>>> REG_FLAG_FLUSH_CHANGE, 0},
>>>  };
>>>
>>> -static const struct r600_reg evergreen_ctl_const_list[] = {
>>> -   {R_03CFF4_SQ_VTX_START_INST_LOC, 0, 0},
>>> -};
>>> -
>>>  static const struct r600_reg evergreen_context_reg_list[] = {
>>> {R_028008_DB_DEPTH_VIEW, 0, 0},
>>> {R_028010_DB_RENDER_OVERRIDE2, 0, 0},
>>> @@ -63,10 +59,6 @@ static const struct r600_reg 
>>> evergreen_context_reg_list[] = {
>>> {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0},
>>> {R_028350_SX_MISC, 0, 0},
>>> {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> -   {R_028408_VGT_INDX_OFFSET, 0, 0},
>>> -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
>>> -   {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0},
>>> -   {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> {R_02861C_SPI_VS_OUT_ID_0, 0, 0},
>>> {R_028620_SPI_VS_OUT_ID_1, 0, 0},
>>> {R_028624_SPI_VS_OUT_ID_2, 0, 0},
>>> @@ -353,10 +345,6 @@ static const struct r600_reg cayman_context_reg_list[] 
>>> = {
>>> {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0},
>>> {R_028350_SX_MISC, 0, 0},
>>> {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> -   {R_028408_VGT_INDX_OFFSET, 0, 0},
>>> -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
>>> -   {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0},
>>> -   {GROUP_FORCE_NEW_BLOCK, 0, 0},
>>> {R_02861C_SPI_VS_OUT_ID_0, 0, 0},
>>> {R_028620_SPI_VS_OUT_ID_1, 0, 0},
>>> {R_028624_SPI_VS_OUT_ID_2, 0, 0},
>>> @@ -664,10 +652,6 @@ int evergreen_context_init(struct r600_context *ctx)
>>>
>>> Elements(evergreen_context_reg_list), PKT3_SET_CONTEXT_REG, 
>>> EVERGREEN_CONTEXT_REG_OFFSET);
>>> if (r)
>>> goto out_err;
>>> -   r = r600_context_add_block(ctx, evergreen_ctl_const_list,
>>> -  Elements(evergreen_ctl_const_list), 
>>> PKT3_SET_CTL_CONST, EVERGREEN_CTL_CONST_OFFSET);
>>> -   if (r)
>>> -   goto out_err;
>>>
>>> /* PS loop const */
>>> evergreen_loop_const_init(ctx, 0);
>>> diff --git a/src/gallium/drivers/r600/r600.h 
>>> b/src/gallium/drivers/r600/r600.h
>>> index 6363a03..83d21a4 100644
>>> --- a/src/gallium/drivers/r600/r600.h
>>> +++ b/src/gallium/drivers/r600/r600.h
>>> @@ -228,11 +228,4 @@ void _r600_pipe_state_add_re

Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Mon, Sep 10, 2012 at 7:16 PM, Marek Olšák  wrote:

NAK this one introduce lockup. As i said in another email register
group/order matter and with this patch i get 100% lockup rate in some
test case for instance the test case i reference in my other email

> ---
>  src/gallium/drivers/r600/evergreen_hw_context.c |   16 ---
>  src/gallium/drivers/r600/r600.h |7 -
>  src/gallium/drivers/r600/r600_hw_context.c  |   15 ++
>  src/gallium/drivers/r600/r600_hw_context_priv.h |2 +-
>  src/gallium/drivers/r600/r600_pipe.h|8 +++---
>  src/gallium/drivers/r600/r600_state_common.c|   34 
> ---
>  6 files changed, 26 insertions(+), 56 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
> b/src/gallium/drivers/r600/evergreen_hw_context.c
> index 483021f..0c2159a 100644
> --- a/src/gallium/drivers/r600/evergreen_hw_context.c
> +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
> @@ -32,10 +32,6 @@ static const struct r600_reg cayman_config_reg_list[] = {
> {R_00913C_SPI_CONFIG_CNTL_1, REG_FLAG_ENABLE_ALWAYS | 
> REG_FLAG_FLUSH_CHANGE, 0},
>  };
>
> -static const struct r600_reg evergreen_ctl_const_list[] = {
> -   {R_03CFF4_SQ_VTX_START_INST_LOC, 0, 0},
> -};
> -
>  static const struct r600_reg evergreen_context_reg_list[] = {
> {R_028008_DB_DEPTH_VIEW, 0, 0},
> {R_028010_DB_RENDER_OVERRIDE2, 0, 0},
> @@ -63,10 +59,6 @@ static const struct r600_reg evergreen_context_reg_list[] 
> = {
> {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0},
> {R_028350_SX_MISC, 0, 0},
> {GROUP_FORCE_NEW_BLOCK, 0, 0},
> -   {R_028408_VGT_INDX_OFFSET, 0, 0},
> -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
> -   {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0},
> -   {GROUP_FORCE_NEW_BLOCK, 0, 0},
> {R_02861C_SPI_VS_OUT_ID_0, 0, 0},
> {R_028620_SPI_VS_OUT_ID_1, 0, 0},
> {R_028624_SPI_VS_OUT_ID_2, 0, 0},
> @@ -353,10 +345,6 @@ static const struct r600_reg cayman_context_reg_list[] = 
> {
> {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0},
> {R_028350_SX_MISC, 0, 0},
> {GROUP_FORCE_NEW_BLOCK, 0, 0},
> -   {R_028408_VGT_INDX_OFFSET, 0, 0},
> -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
> -   {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0},
> -   {GROUP_FORCE_NEW_BLOCK, 0, 0},
> {R_02861C_SPI_VS_OUT_ID_0, 0, 0},
> {R_028620_SPI_VS_OUT_ID_1, 0, 0},
> {R_028624_SPI_VS_OUT_ID_2, 0, 0},
> @@ -664,10 +652,6 @@ int evergreen_context_init(struct r600_context *ctx)
>
> Elements(evergreen_context_reg_list), PKT3_SET_CONTEXT_REG, 
> EVERGREEN_CONTEXT_REG_OFFSET);
> if (r)
> goto out_err;
> -   r = r600_context_add_block(ctx, evergreen_ctl_const_list,
> -  Elements(evergreen_ctl_const_list), 
> PKT3_SET_CTL_CONST, EVERGREEN_CTL_CONST_OFFSET);
> -   if (r)
> -   goto out_err;
>
> /* PS loop const */
> evergreen_loop_const_init(ctx, 0);
> diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
> index 6363a03..83d21a4 100644
> --- a/src/gallium/drivers/r600/r600.h
> +++ b/src/gallium/drivers/r600/r600.h
> @@ -228,11 +228,4 @@ void _r600_pipe_state_add_reg(struct r600_context *ctx,
>  #define r600_pipe_state_add_reg_bo(state, offset, value, bo, usage) 
> _r600_pipe_state_add_reg_bo(rctx, state, offset, value, CTX_RANGE_ID(offset), 
> CTX_BLOCK_ID(offset), bo, usage)
>  #define r600_pipe_state_add_reg(state, offset, value) 
> _r600_pipe_state_add_reg(rctx, state, offset, value, CTX_RANGE_ID(offset), 
> CTX_BLOCK_ID(offset))
>
> -static inline void r600_pipe_state_mod_reg(struct r600_pipe_state *state,
> -  uint32_t value)
> -{
> -   state->regs[state->nregs].value = value;
> -   state->nregs++;
> -}
> -
>  #endif
> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
> b/src/gallium/drivers/r600/r600_hw_context.c
> index 57dcc7e..122f878 100644
> --- a/src/gallium/drivers/r600/r600_hw_context.c
> +++ b/src/gallium/drivers/r600/r600_hw_context.c
> @@ -233,10 +233,6 @@ static const struct r600_reg r600_config_reg_list[] = {
> {R_008C04_SQ_GPR_RESOURCE_MGMT_1, REG_FLAG_ENABLE_ALWAYS | 
> REG_FLAG_FLUSH_CHANGE, 0},
>  };
>
> -static const struct r600_reg r600_ctl_const_list[] = {
> -   {R_03CFF4_SQ_VTX_START_INST_LOC, 0, 0},
> -};
> -
>  static const struct r600_reg r600_context_reg_list[] = {
> {R_028A4C_PA_SC_MODE_CNTL, 0, 0},
> {GROUP_FORCE_NEW_BLOCK, 0, 0},
> @@ -461,9 +457,6 @@ static const struct r600_reg r600_context_reg_list[] = {
> {GROUP_FORCE_NEW_BLOCK, 0, 0},
> {R_028850_SQ_PGM_RESOURCES_PS, 0, 0},
> {R_028854_SQ_PGM_EXPORTS_PS, 0, 0},
> -   {R_028408_VGT_INDX_OFFSET, 0, 0},
> -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
> -  

Re: [Mesa-dev] [PATCH 00/19] r600g refactoring and cleanups

2012-09-11 Thread Jerome Glisse
On Mon, Sep 10, 2012 at 7:16 PM, Marek Olšák  wrote:
> Nothing too exciting. Besides cleanups, there are fine-grained sampler state 
> updates (it emits only the samplers which changed), support for geometry 
> shader resources (because it was easy; I am not working on GS right now), 
> atomization of some states, some fixes and a major cleanup in r600_draw_vbo.
>
> Tested on RS880 and REDWOOD.
>
> Please review.

For the first 18 patch :
Reviewed-by: Jerome Glisse 

NAK for the 19 see other reply

>
> Marek Olšák (19):
>   r600g: consolidate initialization of common state functions
>   r600g: cleanup state function names
>   r600g: put constant buffer state into an array indexed by shader type
>   r600g: consolidate set_sampler_views functions
>   r600g: consolidate set_viewport_state functions
>   r600g: do fine-grained sampler state updates
>   r600g: put sampler states and views into an array indexed by shader type
>   r600g: add support for geometry shader samplers and constant buffers
>   r600g: initialize the first CS just like any other CS
>   r600g: remove unused state ID definitions
>   r600g: atomize stencil ref state
>   r600g: atomize viewport state
>   r600g: atomize blend color
>   r600g: atomize clip state
>   r600g: fix the number of CS dwords of cb_misc_state
>   r600g: fix computing how much space is needed for a draw command
>   r600g: add clip_misc_state for clip registers emitted in draw_vbo
>   r600g: emit the primitive type and associated regs only if the type is 
> changed
>   r600g: convert the remnants of VGT state into immediate register writes
>
>  src/gallium/drivers/r600/evergreen_hw_context.c |  108 +
>  src/gallium/drivers/r600/evergreen_state.c  |  191 +++-
>  src/gallium/drivers/r600/evergreend.h   |2 +
>  src/gallium/drivers/r600/r600.h |8 +-
>  src/gallium/drivers/r600/r600_blit.c|   16 +-
>  src/gallium/drivers/r600/r600_buffer.c  |   31 +-
>  src/gallium/drivers/r600/r600_hw_context.c  |  133 +++---
>  src/gallium/drivers/r600/r600_hw_context_priv.h |3 +-
>  src/gallium/drivers/r600/r600_pipe.c|6 +-
>  src/gallium/drivers/r600/r600_pipe.h|  169 
>  src/gallium/drivers/r600/r600_shader.c  |3 +-
>  src/gallium/drivers/r600/r600_shader.h  |1 -
>  src/gallium/drivers/r600/r600_state.c   |  211 +++--
>  src/gallium/drivers/r600/r600_state_common.c|  526 
> ++-
>  src/gallium/drivers/r600/r600d.h|2 +
>  15 files changed, 615 insertions(+), 795 deletions(-)
>
> Marek
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: simplify flushing

2012-09-10 Thread Jerome Glisse
On Sun, Sep 9, 2012 at 1:03 AM, Marek Olšák  wrote:
> Based on the patch called "simplify and fix flushing and synchronization"
> by Jerome Glisse.
>
> Rebased, removed unneded code, simplified more and cleaned up.
>
> Also, SH_ACTION_ENA is not set when changing shaders (hw doesn't seem
> to need it). It's only used to flush constant buffers.

Looks good, still would like to do some stress testing will try to do
that today.
Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/evergreen_compute.c   |   20 +-
>  .../drivers/r600/evergreen_compute_internal.c  |4 +-
>  src/gallium/drivers/r600/evergreen_state.c |7 +-
>  src/gallium/drivers/r600/evergreend.h  |7 +-
>  src/gallium/drivers/r600/r600.h|   18 +-
>  src/gallium/drivers/r600/r600_hw_context.c |  218 
> +---
>  src/gallium/drivers/r600/r600_hw_context_priv.h|3 +-
>  src/gallium/drivers/r600/r600_pipe.c   |2 -
>  src/gallium/drivers/r600/r600_pipe.h   |4 -
>  src/gallium/drivers/r600/r600_state.c  |   21 +-
>  src/gallium/drivers/r600/r600_state_common.c   |   76 ++-
>  src/gallium/drivers/r600/r600d.h   |   12 ++
>  12 files changed, 210 insertions(+), 182 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
> b/src/gallium/drivers/r600/evergreen_compute.c
> index 3533312..1fb63d6 100644
> --- a/src/gallium/drivers/r600/evergreen_compute.c
> +++ b/src/gallium/drivers/r600/evergreen_compute.c
> @@ -96,7 +96,7 @@ static void evergreen_cs_set_vertex_buffer(
> vb->buffer = buffer;
> vb->user_buffer = NULL;
>
> -   r600_inval_vertex_cache(rctx);
> +   rctx->flags |= rctx->has_vertex_cache ? R600_CONTEXT_VTX_FLUSH : 
> R600_CONTEXT_TEX_FLUSH;
> state->enabled_mask |= 1 << vb_index;
> state->dirty_mask |= 1 << vb_index;
> r600_atom_dirty(rctx, &state->atom);
> @@ -332,8 +332,11 @@ static void compute_emit_cs(struct r600_context *ctx, 
> const uint *block_layout,
>  */
> r600_emit_atom(ctx, &ctx->start_compute_cs_cmd.atom);
>
> +   ctx->flags |= R600_CONTEXT_CB_FLUSH;
> +   r600_flush_emit(ctx);
> +
> /* Emit cb_state */
> -cb_state = ctx->states[R600_PIPE_STATE_FRAMEBUFFER];
> +   cb_state = ctx->states[R600_PIPE_STATE_FRAMEBUFFER];
> r600_context_pipe_state_emit(ctx, cb_state, 
> RADEON_CP_PACKET3_COMPUTE_MODE);
>
> /* Set CB_TARGET_MASK  XXX: Use cb_misc_state */
> @@ -384,15 +387,10 @@ static void compute_emit_cs(struct r600_context *ctx, 
> const uint *block_layout,
> /* Emit dispatch state and dispatch packet */
> evergreen_emit_direct_dispatch(ctx, block_layout, grid_layout);
>
> -   /* r600_flush_framebuffer() updates the cb_flush_flags and then
> -* calls r600_emit_atom() on the ctx->surface_sync_cmd.atom, which 
> emits
> -* a SURFACE_SYNC packet via r600_emit_surface_sync().
> -*
> -* XXX r600_emit_surface_sync() hardcodes the CP_COHER_SIZE to
> -* 0x, so we will need to add a field to struct
> -* r600_surface_sync_cmd if we want to manually set this value.
> +   /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 
> 0x
>  */
> -   r600_flush_framebuffer(ctx, true /* Flush now */);
> +   ctx->flags |= R600_CONTEXT_CB_FLUSH;
> +   r600_flush_emit(ctx);
>
>  #if 0
> COMPUTE_DBG("cdw: %i\n", cs->cdw);
> @@ -444,7 +442,7 @@ void evergreen_emit_cs_shader(
> r600_write_value(cs, r600_context_bo_reloc(rctx, 
> shader->shader_code_bo,
> RADEON_USAGE_READ));
>
> -   r600_inval_shader_cache(rctx);
> +   rctx->flags |= R600_CONTEXT_SHADERCONST_FLUSH;
>  }
>
>  static void evergreen_launch_grid(
> diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
> b/src/gallium/drivers/r600/evergreen_compute_internal.c
> index 50a60d3..dc95732 100644
> --- a/src/gallium/drivers/r600/evergreen_compute_internal.c
> +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
> @@ -562,7 +562,7 @@ void evergreen_set_tex_resource(
>  
> util_format_get_blockwidth(tmp->resource.b.b.format) *
>  view->base.texture->width0*height*depth;
>
> -   r600_inval_texture_cache(pipe->ctx);
> +   pipe->ctx->flags |= R600_CONTEXT_TEX_FLUSH;
>
> evergreen_emit_force_reloc(res);
> evergreen_e

Re: [Mesa-dev] [PATCH] r600g: order atom emission

2012-09-07 Thread Jerome Glisse
On Thu, Sep 6, 2012 at 11:32 AM, Alex Deucher  wrote:
> On Thu, Sep 6, 2012 at 10:54 AM, Jerome Glisse  wrote:
>> On Thu, Sep 6, 2012 at 6:20 AM, Dave Airlie  wrote:
>>> On Thu, Sep 6, 2012 at 5:21 PM, Philipp Klaus Krause  wrote:
>>>> On 06.09.2012 07:35, j.gli...@gmail.com wrote:
>>>>> From: Jerome Glisse 
>>>>>
>>>>> To avoid GPU lockup registers must be emited in a specific order
>>>>> (no kidding ...). This patch rework atom emission so order in which
>>>>> atom are emited in respect to each other is always the same. We
>>>>> don't have any informations on what is the correct order so order
>>>>> will need to be infered from fglrx command stream.
>>>>
>>>> Shouldn't this be stated in comments, so the next person who comes along
>>>> and makes a change in this code doesn't inadvertently change the order?
>>>
>>> Also a comment on what ordering matters most, like I suspect this is
>>> just hiding a real issue.
>>>
>>> Dave.
>>
>> No it's not hiding an issue, afaict it's how the hw works. The hw do
>> what some amd document call states validations. So here is how i
>> understand how things happen and i can be completely wrong. Hw process
>> register write in order it receive them and to avoid postponing state
>> validation the hw do state validation while processing register. That
>> means if writing register A trigger state validation that use some
>> field of register B the hw might not redo state validation when
>> register B is latter written. ie only some register trigger the state
>> validation no matter on what they depends on. I believe state
>> validation is only use as pipeline optimization by the hw, so the hw
>> knows it can take some short cut. But in some rare case if short cut
>> are taken for wrong reasons we end up in GPU lockup.
>>
>> No matter if my guess is right or wrong, i know for a fact that
>> register order is important in some situation, that's the hard bottom
>> line, no matter what is the reasons inside the hw.
>>
>> This patch is far from having all the order right, it's just a first
>> step, i am atomizing everything and it's what needed to go forward
>> without regression.
>
> I've talked to the internal hw and sw guys and they said there isn't
> any specific ordering required and the closed driver doesn't impose
> any specific order.  The pipeline doesn't get kicked off until a draw
> command is issued, so I don't see why the state update order would
> matter.  It's possible there are subtle ordering requirements and the
> closed driver just happened to get it right.  There are dependencies
> and hw bug workarounds however.  E.g., some blocks snoop registers
> from other blocks so you need to make sure those dependant registers
> have been initialized before drawing.  I don't know if it's the
> ordering so much as making sure we emit all the necessary state when
> needed.  The closed driver tends to update a lot more state the is
> minimally required for a lot of things.  That said, it probably
> wouldn't hurt to mirror the closed driver more closely.
>
> Alex

I don't know what are the reason but what register are emitted and
along which other register definitely matter. All files i am talking
in this mail are located at :
http://people.freedesktop.org/~glisse/registerposition/

So if you apply :
0001-r600g-FORCE-LOCKUP-BY-EMITTING-OR-NOT-REGISTER.patch

and run piglit test like in lockup-longprim.sh you will lockup the GPU
(i only tested on r6xx, r7xx so far).

I double checked through automated tools that no register that was
written by command stream from longprim piglist test are reprogram
properly by the fbo test (if you have my constant buffer size patch i
sent earlier).

The only diff with command stream is one where
R_02881C_PA_CL_VS_OUT_CNTL is emitted with each and the other only
once, when emitted with each draw it lockups.

bad command stream r600g-long-prim-simple-b.txt
good one r600g-long-prim-simple-g.txt
diff r600g-long-prim-simple-d.txt

Given the bad one emit more register some draw command are moved to
the second cs.

Emitting some other register along PA_CL_VS_OUT_CNTL fix the lockup
(don't have short list) but many other register behave the same as
PA_CL_VS_OUT_CNTL. So if order does not matter then register group
definitely does. I really wish that the hw were less picky about how
command stream are supposed to be formated. Anyhow given that we have
no information on what register need to be emitted together, mimicking
fglrx sounds like the way to go.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: order atom emission v2

2012-09-06 Thread Jerome Glisse
On Thu, Sep 6, 2012 at 8:32 PM, Dave Airlie  wrote:
> On Fri, Sep 7, 2012 at 10:03 AM, Marek Olšák  wrote:
>> On Fri, Sep 7, 2012 at 12:05 AM, Jerome Glisse  wrote:
>>> On Thu, Sep 6, 2012 at 4:10 PM, Marek Olšák  wrote:
>>>> On Thu, Sep 6, 2012 at 8:34 PM, Jerome Glisse  wrote:
>>>>> On Thu, Sep 6, 2012 at 2:29 PM, Marek Olšák  wrote:
>>>>>> This looks good to me. It's funny to see the r300g architecture being
>>>>>> re-implemented in r600g. :)
>>>>>>
>>>>>> There's one optimization that r300g has that this patch doesn't. r300g
>>>>>> keeps the index of the first and the last dirty atom and the loops
>>>>>> over the list of atoms look like this:
>>>>>> for (i = first_dirty; i <= last_dirty; i++)
>>>>>>
>>>>>> And after emission:
>>>>>> first_dirty = some large number;
>>>>>> last_dirty= 0;
>>>>>>
>>>>>> The atoms should be ordered according to how frequently they are
>>>>>> updated (except when the ordering is required by the hw). But most
>>>>>> importantly, if there are no state changes, the loops are trivially
>>>>>> skipped.
>>>>>>
>>>>>> Marek
>>>>>
>>>>> Don't think this optimization is worth it, there won't be much more
>>>>> than 32 atom in the end and it definitely can't be ordered from most
>>>>> frequent to less frequent as some of the stuff need to be at the last
>>>>> being emitted and they are frequent one (primitive type for instance).
>>>>
>>>> I didn't say all atoms *must* be sorted. I meant that some (most?)
>>>> atoms can be sorted, i.e. you can have some atoms at fixed positions
>>>> (like the primitype type or the seamless cubemap state), but you have
>>>> always at least *some* freedom where you put the rest. The ordering I
>>>> had in mind was actually from the least frequent to the most frequent,
>>>> in other words, from the framebuffer (least frequent) to shaders to
>>>> textures to constant buffers to vertex buffers (most frequent).
>>>>
>>>> Of course, the code should document which atoms must have fixed
>>>> positions along with an explanation. The comment that all atom
>>>> positions must not be changed isn't enough, because it's not true.
>>>>
>>>> Marek
>>>
>>> I won't try to find which atom can have complete floating position, i
>>> am just grouping together register that are always emitted together in
>>> fglrx and then i position this group relative to each other according
>>> to fglrx position. That means all atom are always emitted in a
>>> specific order. So there won't be any freedom. The only freedom i can
>>> think of is btw 2 position forced atom and that make the sorting
>>> completely useless and complicated.
>>
>> I'll add the optimization anyway (without sorting). Draw operations
>> without state changes or with only one state update are quite common.
>>
>> Anyway, it was said in the v1 thread that the hardware doesn't need
>> any specific ordering for proper functioning. While it may be
>> beneficial to emit one or two registers earlier than the others,
>> insisting on fixed ordering of all of them is not only limiting, it
>> seems useless and waste of time as well. What I don't understand: Why
>> do you blindly copy everything fglrx *seems* to be doing without any
>> real reason? It does not fix any bug, it does not improve performance,
>> it does not clean up the code... so why? I am all ears.
>
> At the very least, please document a list of lockups this avoids. Less
> magic more text.
>
> Dave.

I am doing all this for hyperz. So if it only fix hyperz that make me
happy enough.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: order atom emission v2

2012-09-06 Thread Jerome Glisse
On Thu, Sep 6, 2012 at 4:10 PM, Marek Olšák  wrote:
> On Thu, Sep 6, 2012 at 8:34 PM, Jerome Glisse  wrote:
>> On Thu, Sep 6, 2012 at 2:29 PM, Marek Olšák  wrote:
>>> This looks good to me. It's funny to see the r300g architecture being
>>> re-implemented in r600g. :)
>>>
>>> There's one optimization that r300g has that this patch doesn't. r300g
>>> keeps the index of the first and the last dirty atom and the loops
>>> over the list of atoms look like this:
>>> for (i = first_dirty; i <= last_dirty; i++)
>>>
>>> And after emission:
>>> first_dirty = some large number;
>>> last_dirty= 0;
>>>
>>> The atoms should be ordered according to how frequently they are
>>> updated (except when the ordering is required by the hw). But most
>>> importantly, if there are no state changes, the loops are trivially
>>> skipped.
>>>
>>> Marek
>>
>> Don't think this optimization is worth it, there won't be much more
>> than 32 atom in the end and it definitely can't be ordered from most
>> frequent to less frequent as some of the stuff need to be at the last
>> being emitted and they are frequent one (primitive type for instance).
>
> I didn't say all atoms *must* be sorted. I meant that some (most?)
> atoms can be sorted, i.e. you can have some atoms at fixed positions
> (like the primitype type or the seamless cubemap state), but you have
> always at least *some* freedom where you put the rest. The ordering I
> had in mind was actually from the least frequent to the most frequent,
> in other words, from the framebuffer (least frequent) to shaders to
> textures to constant buffers to vertex buffers (most frequent).
>
> Of course, the code should document which atoms must have fixed
> positions along with an explanation. The comment that all atom
> positions must not be changed isn't enough, because it's not true.
>
> Marek

I won't try to find which atom can have complete floating position, i
am just grouping together register that are always emitted together in
fglrx and then i position this group relative to each other according
to fglrx position. That means all atom are always emitted in a
specific order. So there won't be any freedom. The only freedom i can
think of is btw 2 position forced atom and that make the sorting
completely useless and complicated.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: order atom emission v2

2012-09-06 Thread Jerome Glisse
On Thu, Sep 6, 2012 at 2:29 PM, Marek Olšák  wrote:
> This looks good to me. It's funny to see the r300g architecture being
> re-implemented in r600g. :)
>
> There's one optimization that r300g has that this patch doesn't. r300g
> keeps the index of the first and the last dirty atom and the loops
> over the list of atoms look like this:
> for (i = first_dirty; i <= last_dirty; i++)
>
> And after emission:
> first_dirty = some large number;
> last_dirty= 0;
>
> The atoms should be ordered according to how frequently they are
> updated (except when the ordering is required by the hw). But most
> importantly, if there are no state changes, the loops are trivially
> skipped.
>
> Marek

Don't think this optimization is worth it, there won't be much more
than 32 atom in the end and it definitely can't be ordered from most
frequent to less frequent as some of the stuff need to be at the last
being emitted and they are frequent one (primitive type for instance).

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: order atom emission

2012-09-06 Thread Jerome Glisse
On Thu, Sep 6, 2012 at 11:32 AM, Alex Deucher  wrote:
> On Thu, Sep 6, 2012 at 10:54 AM, Jerome Glisse  wrote:
>> On Thu, Sep 6, 2012 at 6:20 AM, Dave Airlie  wrote:
>>> On Thu, Sep 6, 2012 at 5:21 PM, Philipp Klaus Krause  wrote:
>>>> On 06.09.2012 07:35, j.gli...@gmail.com wrote:
>>>>> From: Jerome Glisse 
>>>>>
>>>>> To avoid GPU lockup registers must be emited in a specific order
>>>>> (no kidding ...). This patch rework atom emission so order in which
>>>>> atom are emited in respect to each other is always the same. We
>>>>> don't have any informations on what is the correct order so order
>>>>> will need to be infered from fglrx command stream.
>>>>
>>>> Shouldn't this be stated in comments, so the next person who comes along
>>>> and makes a change in this code doesn't inadvertently change the order?
>>>
>>> Also a comment on what ordering matters most, like I suspect this is
>>> just hiding a real issue.
>>>
>>> Dave.
>>
>> No it's not hiding an issue, afaict it's how the hw works. The hw do
>> what some amd document call states validations. So here is how i
>> understand how things happen and i can be completely wrong. Hw process
>> register write in order it receive them and to avoid postponing state
>> validation the hw do state validation while processing register. That
>> means if writing register A trigger state validation that use some
>> field of register B the hw might not redo state validation when
>> register B is latter written. ie only some register trigger the state
>> validation no matter on what they depends on. I believe state
>> validation is only use as pipeline optimization by the hw, so the hw
>> knows it can take some short cut. But in some rare case if short cut
>> are taken for wrong reasons we end up in GPU lockup.
>>
>> No matter if my guess is right or wrong, i know for a fact that
>> register order is important in some situation, that's the hard bottom
>> line, no matter what is the reasons inside the hw.
>>
>> This patch is far from having all the order right, it's just a first
>> step, i am atomizing everything and it's what needed to go forward
>> without regression.
>
> I've talked to the internal hw and sw guys and they said there isn't
> any specific ordering required and the closed driver doesn't impose
> any specific order.  The pipeline doesn't get kicked off until a draw
> command is issued, so I don't see why the state update order would
> matter.  It's possible there are subtle ordering requirements and the
> closed driver just happened to get it right.  There are dependencies
> and hw bug workarounds however.  E.g., some blocks snoop registers
> from other blocks so you need to make sure those dependant registers
> have been initialized before drawing.  I don't know if it's the
> ordering so much as making sure we emit all the necessary state when
> needed.  The closed driver tends to update a lot more state the is
> minimally required for a lot of things.  That said, it probably
> wouldn't hurt to mirror the closed driver more closely.
>
> Alex
>

Yeah it's possible that it's also related to some register need to be
re-emitted, i often see that fglrx is re-emitting some register even
if it emitted it with same value just before and some register are
emitted several time around other register block.

Anyhow this patch is a first step to atomize everything and match
fglrx register pattern more closely.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: order atom emission

2012-09-06 Thread Jerome Glisse
On Thu, Sep 6, 2012 at 6:20 AM, Dave Airlie  wrote:
> On Thu, Sep 6, 2012 at 5:21 PM, Philipp Klaus Krause  wrote:
>> On 06.09.2012 07:35, j.gli...@gmail.com wrote:
>>> From: Jerome Glisse 
>>>
>>> To avoid GPU lockup registers must be emited in a specific order
>>> (no kidding ...). This patch rework atom emission so order in which
>>> atom are emited in respect to each other is always the same. We
>>> don't have any informations on what is the correct order so order
>>> will need to be infered from fglrx command stream.
>>
>> Shouldn't this be stated in comments, so the next person who comes along
>> and makes a change in this code doesn't inadvertently change the order?
>
> Also a comment on what ordering matters most, like I suspect this is
> just hiding a real issue.
>
> Dave.

No it's not hiding an issue, afaict it's how the hw works. The hw do
what some amd document call states validations. So here is how i
understand how things happen and i can be completely wrong. Hw process
register write in order it receive them and to avoid postponing state
validation the hw do state validation while processing register. That
means if writing register A trigger state validation that use some
field of register B the hw might not redo state validation when
register B is latter written. ie only some register trigger the state
validation no matter on what they depends on. I believe state
validation is only use as pipeline optimization by the hw, so the hw
knows it can take some short cut. But in some rare case if short cut
are taken for wrong reasons we end up in GPU lockup.

No matter if my guess is right or wrong, i know for a fact that
register order is important in some situation, that's the hard bottom
line, no matter what is the reasons inside the hw.

This patch is far from having all the order right, it's just a first
step, i am atomizing everything and it's what needed to go forward
without regression.

Note that i have been told that in the r100/r200 days same issue came
up and registers needed to be written in some specific order (well
only some register matter but i doubt we have a good knowledge on
that).

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/7] MSAA on R700 and improvements for Evergreen

2012-08-23 Thread Jerome Glisse
On Wed, Aug 22, 2012 at 9:54 PM, Marek Olšák  wrote:
> This series adds R700 MSAA support along with compression of MSAA 
> colorbuffers for R700 and Evergreen, which should save a lot of bandwidth 
> with MSAA. There are also some minor fixes.
>
> Please review.
>
> Marek Olšák (7):
>   gallium/u_blitter: initialize sample mask in resolve
>   r600g: set CB_TARGET_MASK to 0xf and not 0xff for resolve on evergreen
>   r600g: fix evergreen 8x MSAA sample positions
>   r600g: cleanup names around depth decompression
>   r600g: implement compression for MSAA colorbuffers for evergreen
>   r600g: change programming of CB_SHADER_MASK on r600-r700
>   r600g: implement MSAA for r700

For the serie :
Reviewed-by: Jerome Glisse 

What's wrong with r6xx ?

>
>  src/gallium/auxiliary/util/u_blitter.c  |   46 
>  src/gallium/auxiliary/util/u_blitter.h  |5 +
>  src/gallium/drivers/r600/evergreen_hw_context.c |   64 ++
>  src/gallium/drivers/r600/evergreen_state.c  |   87 ++--
>  src/gallium/drivers/r600/evergreend.h   |   76 ++-
>  src/gallium/drivers/r600/r600_blit.c|   97 -
>  src/gallium/drivers/r600/r600_hw_context.c  |   16 ++
>  src/gallium/drivers/r600/r600_pipe.c|6 +
>  src/gallium/drivers/r600/r600_pipe.h|   16 +-
>  src/gallium/drivers/r600/r600_resource.h|   14 +-
>  src/gallium/drivers/r600/r600_state.c   |  262 
> +++
>  src/gallium/drivers/r600/r600_state_common.c|   45 +++-
>  src/gallium/drivers/r600/r600_texture.c |  116 +-
>  src/gallium/drivers/r600/r600d.h|   20 ++
>  14 files changed, 770 insertions(+), 100 deletions(-)
>
> Marek
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] winsys/radeon: fix VA allocation

2012-08-03 Thread Jerome Glisse
On Fri, Aug 3, 2012 at 11:06 AM, Christian König
 wrote:
> Wait for VA use to end before reusing it.
>
> Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=45018
>
> Signed-off-by: Christian König 

Actually you right mesa can't free right away va, it needs to wait
kernel is done. But kernel was severly buggy too, never cleared the
pagetable when freeing object. I attached kernel patch. I am in
prossed of testing them.

Cheers,
Jerome

> ---
>  src/gallium/winsys/radeon/drm/radeon_drm_bo.c |   64 
> +
>  1 file changed, 43 insertions(+), 21 deletions(-)
>
> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
> b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> index 2626586..0c94461 100644
> --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> @@ -102,6 +102,7 @@ static INLINE struct radeon_bo *radeon_bo(struct 
> pb_buffer *bo)
>
>  struct radeon_bo_va_hole {
>  struct list_head list;
> +uint32_t handle;
>  uint64_t offset;
>  uint64_t size;
>  };
> @@ -204,7 +205,30 @@ static uint64_t radeon_bomgr_find_va(struct radeon_bomgr 
> *mgr, uint64_t size, ui
>  pipe_mutex_lock(mgr->bo_va_mutex);
>  /* first look for a hole */
>  LIST_FOR_EACH_ENTRY_SAFE(hole, n, &mgr->va_holes, list) {
> +if (hole->handle) {
> +struct drm_radeon_gem_busy busy_args;
> +struct drm_gem_close close_args;
> +
> +memset(&busy_args, 0, sizeof(busy_args));
> +busy_args.handle = hole->handle;
> +if (drmCommandWriteRead(mgr->rws->fd, DRM_RADEON_GEM_BUSY,
> +&busy_args, sizeof(busy_args)) != 0) {
> +continue;
> +}
> +
> +memset(&close_args, 0, sizeof(close_args));
> +close_args.handle = hole->handle;
> +drmIoctl(mgr->rws->fd, DRM_IOCTL_GEM_CLOSE, &close_args);
> +
> +hole->handle = 0;
> +}
>  offset = hole->offset;
> +   if ((offset + hole->size) == mgr->va_offset) {
> +mgr->va_offset = offset;
> +list_del(&hole->list);
> +FREE(hole);
> +continue;
> +   }
>  waste = 0;
>  if (alignment) {
>  waste = offset % alignment;
> @@ -280,23 +304,21 @@ static void radeon_bomgr_force_va(struct radeon_bomgr 
> *mgr, uint64_t va, uint64_
>  pipe_mutex_unlock(mgr->bo_va_mutex);
>  }
>
> -static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va, 
> uint64_t size)
> +static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va,
> + uint64_t size, uint32_t handle)
>  {
> +struct radeon_bo_va_hole *hole;
>  pipe_mutex_lock(mgr->bo_va_mutex);
> -if ((va + size) == mgr->va_offset) {
> -mgr->va_offset = va;
> -} else {
> -struct radeon_bo_va_hole *hole;
>
> -/* FIXME on allocation failure we just lose virtual address space
> - * maybe print a warning
> - */
> -hole = CALLOC_STRUCT(radeon_bo_va_hole);
> -if (hole) {
> -hole->size = size;
> -hole->offset = va;
> -list_add(&hole->list, &mgr->va_holes);
> -}
> +/* FIXME on allocation failure we just lose virtual address space
> + * maybe print a warning
> + */
> +hole = CALLOC_STRUCT(radeon_bo_va_hole);
> +if (hole) {
> +hole->handle = handle;
> +hole->size = size;
> +hole->offset = va;
> +list_add(&hole->list, &mgr->va_holes);
>  }
>  pipe_mutex_unlock(mgr->bo_va_mutex);
>  }
> @@ -320,12 +342,12 @@ static void radeon_bo_destroy(struct pb_buffer *_buf)
>  os_munmap(bo->ptr, bo->base.size);
>
>  if (mgr->va) {
> -radeon_bomgr_free_va(mgr, bo->va, bo->va_size);
> +radeon_bomgr_free_va(mgr, bo->va, bo->va_size, bo->handle);
> +} else {
> +/* Close object. */
> +args.handle = bo->handle;
> +drmIoctl(bo->rws->fd, DRM_IOCTL_GEM_CLOSE, &args);
>  }
> -
> -/* Close object. */
> -args.handle = bo->handle;
> -drmIoctl(bo->rws->fd, DRM_IOCTL_GEM_CLOSE, &args);
>  pipe_mutex_destroy(bo->map_mutex);
>  FREE(bo);
>  }
> @@ -540,7 +562,7 @@ static struct pb_buffer *radeon_bomgr_create_bo(struct 
> pb_manager *_mgr,
>  return NULL;
>  }
>  if (va.operation == RADEON_VA_RESULT_VA_EXIST) {
> -radeon_bomgr_free_va(mgr, bo->va, bo->va_size);
> +radeon_bomgr_free_va(mgr, bo->va, bo->va_size, 0);
>  bo->va = va.offset;
>  radeon_bomgr_force_va(mgr, bo->va, bo->va_size);
>  }
> @@ -865,7 +887,7 @@ done:
>  return NULL;
>  }
>  if (va.operation == RADEON_VA_RESULT_VA_EXIST) {
> -radeon_bomgr_free_va(mgr, bo->va, bo->va_size);
> +radeon_bomgr_free_va(mgr, bo->va, bo->va_size, 0

Re: [Mesa-dev] [PATCH 1/2] winsys/radeon: fix VA allocation

2012-08-03 Thread Jerome Glisse
On Fri, Aug 3, 2012 at 11:06 AM, Christian König
 wrote:
> Wait for VA use to end before reusing it.
>
> Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=45018
>
> Signed-off-by: Christian König 
> ---
>  src/gallium/winsys/radeon/drm/radeon_drm_bo.c |   64 
> +
>  1 file changed, 43 insertions(+), 21 deletions(-)
>
> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
> b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> index 2626586..0c94461 100644
> --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> @@ -102,6 +102,7 @@ static INLINE struct radeon_bo *radeon_bo(struct 
> pb_buffer *bo)
>
>  struct radeon_bo_va_hole {
>  struct list_head list;
> +uint32_t handle;
>  uint64_t offset;
>  uint64_t size;
>  };
> @@ -204,7 +205,30 @@ static uint64_t radeon_bomgr_find_va(struct radeon_bomgr 
> *mgr, uint64_t size, ui
>  pipe_mutex_lock(mgr->bo_va_mutex);
>  /* first look for a hole */
>  LIST_FOR_EACH_ENTRY_SAFE(hole, n, &mgr->va_holes, list) {
> +if (hole->handle) {
> +struct drm_radeon_gem_busy busy_args;
> +struct drm_gem_close close_args;
> +
> +memset(&busy_args, 0, sizeof(busy_args));
> +busy_args.handle = hole->handle;
> +if (drmCommandWriteRead(mgr->rws->fd, DRM_RADEON_GEM_BUSY,
> +&busy_args, sizeof(busy_args)) != 0) {
> +continue;
> +}
> +
> +memset(&close_args, 0, sizeof(close_args));
> +close_args.handle = hole->handle;
> +drmIoctl(mgr->rws->fd, DRM_IOCTL_GEM_CLOSE, &close_args);
> +
> +hole->handle = 0;
> +}
>  offset = hole->offset;
> +   if ((offset + hole->size) == mgr->va_offset) {
> +mgr->va_offset = offset;
> +list_del(&hole->list);
> +FREE(hole);
> +continue;
> +   }
>  waste = 0;
>  if (alignment) {
>  waste = offset % alignment;
> @@ -280,23 +304,21 @@ static void radeon_bomgr_force_va(struct radeon_bomgr 
> *mgr, uint64_t va, uint64_
>  pipe_mutex_unlock(mgr->bo_va_mutex);
>  }
>
> -static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va, 
> uint64_t size)
> +static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va,
> + uint64_t size, uint32_t handle)
>  {
> +struct radeon_bo_va_hole *hole;
>  pipe_mutex_lock(mgr->bo_va_mutex);
> -if ((va + size) == mgr->va_offset) {
> -mgr->va_offset = va;
> -} else {
> -struct radeon_bo_va_hole *hole;
>
> -/* FIXME on allocation failure we just lose virtual address space
> - * maybe print a warning
> - */
> -hole = CALLOC_STRUCT(radeon_bo_va_hole);
> -if (hole) {
> -hole->size = size;
> -hole->offset = va;
> -list_add(&hole->list, &mgr->va_holes);
> -}
> +/* FIXME on allocation failure we just lose virtual address space
> + * maybe print a warning
> + */
> +hole = CALLOC_STRUCT(radeon_bo_va_hole);
> +if (hole) {
> +hole->handle = handle;
> +hole->size = size;
> +hole->offset = va;
> +list_add(&hole->list, &mgr->va_holes);
>  }
>  pipe_mutex_unlock(mgr->bo_va_mutex);
>  }
> @@ -320,12 +342,12 @@ static void radeon_bo_destroy(struct pb_buffer *_buf)
>  os_munmap(bo->ptr, bo->base.size);
>
>  if (mgr->va) {
> -radeon_bomgr_free_va(mgr, bo->va, bo->va_size);
> +radeon_bomgr_free_va(mgr, bo->va, bo->va_size, bo->handle);
> +} else {
> +/* Close object. */
> +args.handle = bo->handle;
> +drmIoctl(bo->rws->fd, DRM_IOCTL_GEM_CLOSE, &args);
>  }
> -
> -/* Close object. */
> -args.handle = bo->handle;
> -drmIoctl(bo->rws->fd, DRM_IOCTL_GEM_CLOSE, &args);
>  pipe_mutex_destroy(bo->map_mutex);
>  FREE(bo);
>  }
> @@ -540,7 +562,7 @@ static struct pb_buffer *radeon_bomgr_create_bo(struct 
> pb_manager *_mgr,
>  return NULL;
>  }
>  if (va.operation == RADEON_VA_RESULT_VA_EXIST) {
> -radeon_bomgr_free_va(mgr, bo->va, bo->va_size);
> +radeon_bomgr_free_va(mgr, bo->va, bo->va_size, 0);
>  bo->va = va.offset;
>  radeon_bomgr_force_va(mgr, bo->va, bo->va_size);
>  }
> @@ -865,7 +887,7 @@ done:
>  return NULL;
>  }
>  if (va.operation == RADEON_VA_RESULT_VA_EXIST) {
> -radeon_bomgr_free_va(mgr, bo->va, bo->va_size);
> +radeon_bomgr_free_va(mgr, bo->va, bo->va_size, 0);
>  bo->va = va.offset;
>  radeon_bomgr_force_va(mgr, bo->va, bo->va_size);
>  }
> --
> 1.7.9.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> h

Re: [Mesa-dev] [PATCH 1/2] r600g: add htile support v9

2012-07-30 Thread Jerome Glisse
On Sun, Jul 29, 2012 at 1:50 PM, Marek Olšák  wrote:
> On Tue, Jul 17, 2012 at 7:58 PM,   wrote:
>> From: Jerome Glisse 
>>
>> htile is used for HiZ and HiS support and fast Z/S clears.
>> This commit just adds the htile setup and Fast Z clear.
>> We don't take full advantage of HiS with that patch.
>>
>> v2 really use fast clear, still random issue with some tiles
>>need to try more flush combination, fix depth/stencil
>>texture decompression
>> v3 fix random issue on r6xx/r7xx
>> v4 rebase on top of lastest mesa, disable CB export when clearing
>>htile surface to avoid wasting bandwidth
>> v5 resummarize htile surface when uploading z value. Fix z/stencil
>>decompression, the custom blitter with custom dsa is no longer
>>needed.
>> v6 Reorganize render control/override update mecanism, fixing more
>>issues in the process.
>> v7 Add nop after depth surface base update to work around some htile
>>flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
>>have issue. Do not enable hyperz when flushing/uncompressing
>>depth buffer.
>> v8 Fix htile surface, preload and prefetch setup. Only set preload
>>and prefetch on htile surface clear like fglrx. Record depth
>>clear value per level. Support several level for the htile
>>surface. First depth clear can't be a fast clear.
>> v9 Fix comments, properly account new register in emit function,
>>disable fast zclear if clearing different layer of texture
>>array to different value
>>
>> Signed-off-by: Pierre-Eric Pelloux-Prayer 
>> Signed-off-by: Alex Deucher 
>> Signed-off-by: Jerome Glisse 
>> ---
>>  src/gallium/drivers/r600/evergreen_hw_context.c |6 +
>>  src/gallium/drivers/r600/evergreen_state.c  |  102 -
>>  src/gallium/drivers/r600/evergreend.h   |4 +
>>  src/gallium/drivers/r600/r600_blit.c|   38 +++
>>  src/gallium/drivers/r600/r600_hw_context.c  |   25 +
>>  src/gallium/drivers/r600/r600_pipe.c|8 ++
>>  src/gallium/drivers/r600/r600_pipe.h|   13 ++-
>>  src/gallium/drivers/r600/r600_resource.h|7 ++
>>  src/gallium/drivers/r600/r600_state.c   |  133 
>> ---
>>  src/gallium/drivers/r600/r600_texture.c |  103 ++
>>  src/gallium/drivers/r600/r600d.h|6 +
>>  11 files changed, 399 insertions(+), 46 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
>> b/src/gallium/drivers/r600/evergreen_hw_context.c
>> index 081701f..546c884 100644
>> --- a/src/gallium/drivers/r600/evergreen_hw_context.c
>> +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
>> @@ -62,6 +62,9 @@ static const struct r600_reg evergreen_context_reg_list[] 
>> = {
>> {GROUP_FORCE_NEW_BLOCK, 0, 0},
>> {R_028058_DB_DEPTH_SIZE, 0, 0},
>> {R_02805C_DB_DEPTH_SLICE, 0, 0},
>> +   {R_02802C_DB_DEPTH_CLEAR, 0, 0},
>> +   {R_028ABC_DB_HTILE_SURFACE, 0, 0},
>> +   {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
>> {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
>> {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
>> {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
>> @@ -319,6 +322,9 @@ static const struct r600_reg cayman_context_reg_list[] = 
>> {
>> {GROUP_FORCE_NEW_BLOCK, 0, 0},
>> {R_028058_DB_DEPTH_SIZE, 0, 0},
>> {R_02805C_DB_DEPTH_SLICE, 0, 0},
>> +   {R_02802C_DB_DEPTH_CLEAR, 0, 0},
>> +   {R_028ABC_DB_HTILE_SURFACE, 0, 0},
>> +   {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
>> {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
>> {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
>> {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
>> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
>> b/src/gallium/drivers/r600/evergreen_state.c
>> index a66387b..214d76b 100644
>> --- a/src/gallium/drivers/r600/evergreen_state.c
>> +++ b/src/gallium/drivers/r600/evergreen_state.c
>> @@ -710,13 +710,15 @@ static void *evergreen_create_blend_state(struct 
>> pipe_context *ctx,
>> }
>> blend->cb_target_mask = target_mask;
>>
>> -   if (target_mask)
>> +   if (target_mask) {
>> color_control |= S_028808_MODE(V_028808_CB_NORMAL);
>> -   else
>> +   } else {
>> color_control |= S_028808_MODE(V_028808_CB_DISABLE);
>> +   }
>>
>>

Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-23 Thread Jerome Glisse
On Mon, Jul 23, 2012 at 5:28 PM, Marek Olšák  wrote:
> On Mon, Jul 23, 2012 at 4:25 PM, Jerome Glisse  wrote:
>> On Sun, Jul 22, 2012 at 8:58 PM, Marek Olšák  wrote:
>>> On Fri, Jul 20, 2012 at 4:54 AM, Jerome Glisse  wrote:
>>>> On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák  wrote:
>>>>> On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse  wrote:
>>>>>> On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák  wrote:
>>>>>>> I actually care a lot about lockups. Well, you are complaing about
>>>>>>> lockups, yet you have quite obvious bugs in your hyperz code, so let's
>>>>>>> fix them first. (I wouldn't even try and run the hyperz code in its
>>>>>>> current state. Please don't take that personally.) Then, if the
>>>>>>> lockups persist, we can start looking into *what* fixes them. You seem
>>>>>>> to think that this patch helps a lot, but you don't say why. Aren't
>>>>>>> you interested in what sequence of GPU commands helps? If I am
>>>>>>> counting correctly, there are 7 changes in behavior in this patch. It
>>>>>>> should be pretty easy to nail down the few that help, document them
>>>>>>> (like /* these two lines fix a lockup with hyperz */), and discard the
>>>>>>> rest. The documenting part is very important, so that the other
>>>>>>> developers won't break your code accidentally.
>>>>>>>
>>>>>>> Marek
>>>>>>>
>>>>>>
>>>>>> You haven't even try hyperz and you say i have an obvious bug, that's
>>>>>> kind of funny, but you would not know why. I try pretty much all of
>>>>>
>>>>> Oh come on, I already told you about all the bugs I found in the
>>>>> hyperz patch. You now know them too, and so does everybody else
>>>>> reading mesa-dev.
>>>>>
>>>>> Marek
>>>>>
>>>>
>>>> None of the issue you pointed out showed in piglit, none of them did
>>>> have impact on things like openarena, nexuiz, doomIII, lightmark, ...
>>>> so no issue you pointed does not cripple the hyperz patch, it's
>>>> working quite well for many things. Before you extrapolate, yes issue
>>>> you pointed out have impact in backward use of GL but none the less i
>>>> addressed them and i can tell you it does help a bit with lockup.
>>>
>>> I have no doubt that it helps with your lockups and I also have no
>>> doubt that the piece of code that helps can be bisected. I have
>>> mentioned 7 changes in the patch which are questionable, so the
>>> bisection should ideally take 3 steps. After we find the change which
>>> helps (and document it), we can discard the rest. That should give us
>>> the same stability as this patch does, but without unnecessary code
>>> which does cost GPU cycles (regardless of whether it is measurable on
>>> a particular machine or not).
>>>
>>> By the way, in draw_vbo, the emit functions should be called after
>>> r600_need_cs_space. Otherwise the command stream may overflow.
>>>
>>> Marek
>>
>> Again i haven't found a combination other than the outcome of the full
>> patch that helps more. So be my guest bisect on rv610, rv635, rv670,
>> rv710, rv740, rv770.
>
> So your patch doesn't fix any issue with evergreen? That's great.
> Thanks for keeping that to yourself. It's always a pleasure working
> with you. :) Now that we know the truth, the questionable changes to
> the evergreen code can be discarded freely.

As usual you make the worst assumption about me.

Cheers,
Jerome

> Concerning older chipsets, I can do the bisection only on rs880, rv670
> and rv730. That will have to suffice. One way or another, every single
> change must be done for a *reason* and that reason should be
> documented if it's not obvious. Please give me all the necessary
> information, so that I can start bisecting. That is what lockups your
> patch fixes and where (name apps or tests, a specific place in a game,
> etc.) on what chipsets and whether hyperz is enabled.
>
> It is very likely that all the changes I questioned in my first email
> do not make any difference with regard to lockups, because there are
> also other changes in your patch which may help too and which I fully
> agree with.
>
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-23 Thread Jerome Glisse
On Mon, Jul 23, 2012 at 5:28 PM, Marek Olšák  wrote:
> On Mon, Jul 23, 2012 at 4:25 PM, Jerome Glisse  wrote:
>> On Sun, Jul 22, 2012 at 8:58 PM, Marek Olšák  wrote:
>>> On Fri, Jul 20, 2012 at 4:54 AM, Jerome Glisse  wrote:
>>>> On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák  wrote:
>>>>> On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse  wrote:
>>>>>> On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák  wrote:
>>>>>>> I actually care a lot about lockups. Well, you are complaing about
>>>>>>> lockups, yet you have quite obvious bugs in your hyperz code, so let's
>>>>>>> fix them first. (I wouldn't even try and run the hyperz code in its
>>>>>>> current state. Please don't take that personally.) Then, if the
>>>>>>> lockups persist, we can start looking into *what* fixes them. You seem
>>>>>>> to think that this patch helps a lot, but you don't say why. Aren't
>>>>>>> you interested in what sequence of GPU commands helps? If I am
>>>>>>> counting correctly, there are 7 changes in behavior in this patch. It
>>>>>>> should be pretty easy to nail down the few that help, document them
>>>>>>> (like /* these two lines fix a lockup with hyperz */), and discard the
>>>>>>> rest. The documenting part is very important, so that the other
>>>>>>> developers won't break your code accidentally.
>>>>>>>
>>>>>>> Marek
>>>>>>>
>>>>>>
>>>>>> You haven't even try hyperz and you say i have an obvious bug, that's
>>>>>> kind of funny, but you would not know why. I try pretty much all of
>>>>>
>>>>> Oh come on, I already told you about all the bugs I found in the
>>>>> hyperz patch. You now know them too, and so does everybody else
>>>>> reading mesa-dev.
>>>>>
>>>>> Marek
>>>>>
>>>>
>>>> None of the issue you pointed out showed in piglit, none of them did
>>>> have impact on things like openarena, nexuiz, doomIII, lightmark, ...
>>>> so no issue you pointed does not cripple the hyperz patch, it's
>>>> working quite well for many things. Before you extrapolate, yes issue
>>>> you pointed out have impact in backward use of GL but none the less i
>>>> addressed them and i can tell you it does help a bit with lockup.
>>>
>>> I have no doubt that it helps with your lockups and I also have no
>>> doubt that the piece of code that helps can be bisected. I have
>>> mentioned 7 changes in the patch which are questionable, so the
>>> bisection should ideally take 3 steps. After we find the change which
>>> helps (and document it), we can discard the rest. That should give us
>>> the same stability as this patch does, but without unnecessary code
>>> which does cost GPU cycles (regardless of whether it is measurable on
>>> a particular machine or not).
>>>
>>> By the way, in draw_vbo, the emit functions should be called after
>>> r600_need_cs_space. Otherwise the command stream may overflow.
>>>
>>> Marek
>>
>> Again i haven't found a combination other than the outcome of the full
>> patch that helps more. So be my guest bisect on rv610, rv635, rv670,
>> rv710, rv740, rv770.
>
> So your patch doesn't fix any issue with evergreen? That's great.
> Thanks for keeping that to yourself. It's always a pleasure working
> with you. :) Now that we know the truth, the questionable changes to
> the evergreen code can be discarded freely.

No, it helps on evergreen too, redwood,juniper,turks and bart are the
only one i tested with. Evergreen is in a slightly better position but
when it comes to lockup there is no good metrics.

> Concerning older chipsets, I can do the bisection only on rs880, rv670
> and rv730. That will have to suffice. One way or another, every single
> change must be done for a *reason* and that reason should be
> documented if it's not obvious. Please give me all the necessary
> information, so that I can start bisecting. That is what lockups your
> patch fixes and where (name apps or tests, a specific place in a game,
> etc.) on what chipsets and whether hyperz is enabled.

Sorry no such things. It just helps, pick something test with and
without and you will see that with it lockup less often. I did not did
any of the change in isolation to fix a single case, it's just that
with all the change it helps. But of course you assume that i dumb and
i did spend no time testing, and just put together some random thing.

> It is very likely that all the changes I questioned in my first email
> do not make any difference with regard to lockups, because there are
> also other changes in your patch which may help too and which I fully
> agree with.
>
> Marek

As i said it's a package deal, i did not find a solution but i did
find something that improved the overall.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-23 Thread Jerome Glisse
On Sun, Jul 22, 2012 at 8:58 PM, Marek Olšák  wrote:
> On Fri, Jul 20, 2012 at 4:54 AM, Jerome Glisse  wrote:
>> On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák  wrote:
>>> On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse  wrote:
>>>> On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák  wrote:
>>>>> I actually care a lot about lockups. Well, you are complaing about
>>>>> lockups, yet you have quite obvious bugs in your hyperz code, so let's
>>>>> fix them first. (I wouldn't even try and run the hyperz code in its
>>>>> current state. Please don't take that personally.) Then, if the
>>>>> lockups persist, we can start looking into *what* fixes them. You seem
>>>>> to think that this patch helps a lot, but you don't say why. Aren't
>>>>> you interested in what sequence of GPU commands helps? If I am
>>>>> counting correctly, there are 7 changes in behavior in this patch. It
>>>>> should be pretty easy to nail down the few that help, document them
>>>>> (like /* these two lines fix a lockup with hyperz */), and discard the
>>>>> rest. The documenting part is very important, so that the other
>>>>> developers won't break your code accidentally.
>>>>>
>>>>> Marek
>>>>>
>>>>
>>>> You haven't even try hyperz and you say i have an obvious bug, that's
>>>> kind of funny, but you would not know why. I try pretty much all of
>>>
>>> Oh come on, I already told you about all the bugs I found in the
>>> hyperz patch. You now know them too, and so does everybody else
>>> reading mesa-dev.
>>>
>>> Marek
>>>
>>
>> None of the issue you pointed out showed in piglit, none of them did
>> have impact on things like openarena, nexuiz, doomIII, lightmark, ...
>> so no issue you pointed does not cripple the hyperz patch, it's
>> working quite well for many things. Before you extrapolate, yes issue
>> you pointed out have impact in backward use of GL but none the less i
>> addressed them and i can tell you it does help a bit with lockup.
>
> I have no doubt that it helps with your lockups and I also have no
> doubt that the piece of code that helps can be bisected. I have
> mentioned 7 changes in the patch which are questionable, so the
> bisection should ideally take 3 steps. After we find the change which
> helps (and document it), we can discard the rest. That should give us
> the same stability as this patch does, but without unnecessary code
> which does cost GPU cycles (regardless of whether it is measurable on
> a particular machine or not).
>
> By the way, in draw_vbo, the emit functions should be called after
> r600_need_cs_space. Otherwise the command stream may overflow.
>
> Marek

Again i haven't found a combination other than the outcome of the full
patch that helps more. So be my guest bisect on rv610, rv635, rv670,
rv710, rv740, rv770.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-19 Thread Jerome Glisse
On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák  wrote:
> On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse  wrote:
>> On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák  wrote:
>>> I actually care a lot about lockups. Well, you are complaing about
>>> lockups, yet you have quite obvious bugs in your hyperz code, so let's
>>> fix them first. (I wouldn't even try and run the hyperz code in its
>>> current state. Please don't take that personally.) Then, if the
>>> lockups persist, we can start looking into *what* fixes them. You seem
>>> to think that this patch helps a lot, but you don't say why. Aren't
>>> you interested in what sequence of GPU commands helps? If I am
>>> counting correctly, there are 7 changes in behavior in this patch. It
>>> should be pretty easy to nail down the few that help, document them
>>> (like /* these two lines fix a lockup with hyperz */), and discard the
>>> rest. The documenting part is very important, so that the other
>>> developers won't break your code accidentally.
>>>
>>> Marek
>>>
>>
>> You haven't even try hyperz and you say i have an obvious bug, that's
>> kind of funny, but you would not know why. I try pretty much all of
>
> Oh come on, I already told you about all the bugs I found in the
> hyperz patch. You now know them too, and so does everybody else
> reading mesa-dev.
>
> Marek
>

None of the issue you pointed out showed in piglit, none of them did
have impact on things like openarena, nexuiz, doomIII, lightmark, ...
so no issue you pointed does not cripple the hyperz patch, it's
working quite well for many things. Before you extrapolate, yes issue
you pointed out have impact in backward use of GL but none the less i
addressed them and i can tell you it does help a bit with lockup.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-19 Thread Jerome Glisse
On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák  wrote:
> I actually care a lot about lockups. Well, you are complaing about
> lockups, yet you have quite obvious bugs in your hyperz code, so let's
> fix them first. (I wouldn't even try and run the hyperz code in its
> current state. Please don't take that personally.) Then, if the
> lockups persist, we can start looking into *what* fixes them. You seem
> to think that this patch helps a lot, but you don't say why. Aren't
> you interested in what sequence of GPU commands helps? If I am
> counting correctly, there are 7 changes in behavior in this patch. It
> should be pretty easy to nail down the few that help, document them
> (like /* these two lines fix a lockup with hyperz */), and discard the
> rest. The documenting part is very important, so that the other
> developers won't break your code accidentally.
>
> Marek
>

You haven't even try hyperz and you say i have an obvious bug, that's
kind of funny, but you would not know why. I try pretty much all of
the thing my patch do in isolation and combination of each other and
the only way i got improvement is with something similar to this
patch. Remove one things and i can find things program that are more
likely to lockup.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-19 Thread Jerome Glisse
On Thu, Jul 19, 2012 at 9:00 PM, Marek Olšák  wrote:
> On Fri, Jul 20, 2012 at 1:34 AM, Jerome Glisse  wrote:
>> On Thu, Jul 19, 2012 at 6:07 PM, Marek Olšák  wrote:
>>> I have these issues with the patch:
>>>
>>> 1) On GPUs without a vertex cache, you flush the texture cache every
>>> draw operation. Are you kidding?
>>
>> Show me one app with perf regression due to that ? Or just go look at
>> what fglrx is doing.
>
> I don't believe that fglrx unconditionally emits SURFACE_SYNC with
> TC_ACTION_ENA before every DRAW packet. I just don't buy that. It's
> too stupid to be true. And considering that it wasn't needed before,
> it's not needed now either. Please give me some other argument than
> just fglrx.

No fglrx don't set it for each draw, fglrx set it if a bunch of reg is
touch. Given than right now we pretty much always touch one of those
reg btw draw it just turn up that my patch trigger the flush btw each
draw.

>>
>>> 2) All colorbuffers / streamout buffers are flushed, even those which
>>> are not enabled. E.g. instead of flushing only CB0 when there is only
>>> one, this code flushes all of them. Why? This either needs an
>>> explanation or it should only flush the buffers which are enabled
>>> (like the old code did).
>>
>> fglrx + no perf regression ...
>
> The "no perf regression" argument doesn't apply here, because it just
> might not be the bottleneck now. I'm willing to step aside from this
> one issue though.

I am just trying to stick to fglrx pattern.


>>
>>> 3) Please explain:
>>> - why you added PS_PARTIAL_FLUSH in r600_texture_barrier and
>>> r600_set_framebuffer_state.
>>
>> fglrx is doing something similar
>
> But not exactly the same thing, right? So there's no reason for it to be 
> there.

It's hard to do as fglrx as the pattern is evading me no matter how
much different app command stream i look at i always find an exception
to rule i formulating.

>>
>>> - why you added CACHE_FLUSH_AND_INV_EVENT in set_framebuffer_state for
>>> R700 and evergreen.
>>
>> fglrx ...
>>
>>> - why you applied the CB flush workarounds meant for RV6xx to all R600
>>> and R700 chipsets.
>>
>> fglrx ...
>>
>>> - why the streamout workaround for RV6xx (S_0085F0_DEST_BASE_0_ENA) is
>>> applied to all R600, R700, and evergreen chipsets.
>>
>> didn't hurt thought fglrx is not using that at all but i did not
>> wanted to remove it
>
> Well, you didn't remove it. You added it for those other chipsets.
> That's a difference. You don't even know what you did there, do you?
> :) All the things I mentioned are either half-assed or added for no
> reason. Fglrx might do all sorts of stupid things or for its own
> reasons, but that doesn't mean it's automatically good for us. Besides
> that, it's almost impossible to figure out why a CS was built up
> exactly the way it was without access to the driver code and to its 
> developers.

Oh yeah i don't have fucking clue, i am fucking cluesless, i am just a
fool that write fucking random line of code and have no fucking idea
of what i am doing. Of course you know better, please enlight me.

I am totaly on board with fglrx doing stupid things but yet it does
not lockup ... so one of those stupid things is important and until
someone figure which one i would rather do more stupid thing and not
lockup then trying to pretend that flushing is a bottleneck with the
driver right now.

>>
>>> - why R600_CONTEXT_FLUSH_AND_INV emits SURFACE_SYNC on evergreen,
>>> resulting in emission of SURFACE_SYNC twice in a row in most
>>> situations.
>>
>> fglrx is doing that and without that lockup ...
>
> Hm, now you're talking. So do you need:
>
> FLUSH_AND_INV +
> SURFACE_SYNC (COHER_CNTL = ~0)
>
> or do you need:
>
> FLUSH_AND_INV +
> SURFACE_SYNC (COHER_CNTL = ~0) +
> SURFACE_SYNC (COHER_CNTL = according to flags)
>
> for it not to lock up?

flush inv is always follow by surface sync with few exception (on
which i am not clear but there is always a surface sync before a draw
after a flush inv.

>>
>>> Flushing has always worked without all the changes (1, 2, 3) mentioned
>>> above, so please if you don't have a reasonable explanation, revert to
>>> the old behavior.
>>
>> Well if you have a better solution please show me ...
>
> I already showed you in the first reply. If you are unwilling to
> change your patches even a little bit, I'll happily take them over
> from you.
>
> Marek

Oh i will change them, just not the way you like, i am trying to avoid
lockup, you oubviously don't give a shit about that

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: hyperz

2012-07-14 Thread Jerome Glisse
On Sat, Jul 14, 2012 at 9:56 AM, Alex Deucher  wrote:
> On Fri, Jul 13, 2012 at 8:11 PM, Jerome Glisse  wrote:
>> On Fri, Jul 13, 2012 at 8:08 PM, Marek Olšák  wrote:
>>> Hi Jerome,
>>>
>>> I couldn't open the patch, because freedesktop.org doesn't seem to
>>> work for me today, it always times out.
>>>
>>> Anyway, non-working code shouldn't be merged into Mesa master, because
>>> it decreases the quality of the driver and is a pain to maintain. As
>>> as I said in another email, merging non-working code on purpose is a
>>> very bad idea. Please don't do it.
>>>
>>> Marek
>>
>> Code works, no regression, but if you enable hyperz get ready to
>> experience lockup, likelyhood depends on what you are doing.
>>
>> So no i don't consider this a non working code. It does work and
>> doesn't regress.
>
> Is it just 6xx/7xx that locks or also evergreen?  Also even if we
> don't turn on hyperz, it probably makes sense to always have an htile
> buffer bound as the htile cache (and backing htile buffer) is used for
> Z/S compression, culling, fast ops, etc. in addition to HiZ/S if a Z
> or S buffer is bound.
>
> Alex

Just enabling htile surface is enough to trigger the lockup, thus we
can't bind the htile buffer. Quite frankly i don't know how much
evergreen is an issue, i pretty much stuck with r6xx/r7xx as they were
always locking up with my test case. Thought i have been able to
lockup evergreen but i did have the feeling that it was lot less
likely to happen.

Basicly to trigger the lockup you have to switch btw a lot of depth
surface/htile surface, if you just have a single depth buffer you will
be fine. Thus most use case will just work properly.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: hyperz

2012-07-13 Thread Jerome Glisse
On Fri, Jul 13, 2012 at 8:08 PM, Marek Olšák  wrote:
> Hi Jerome,
>
> I couldn't open the patch, because freedesktop.org doesn't seem to
> work for me today, it always times out.
>
> Anyway, non-working code shouldn't be merged into Mesa master, because
> it decreases the quality of the driver and is a pain to maintain. As
> as I said in another email, merging non-working code on purpose is a
> very bad idea. Please don't do it.
>
> Marek

Code works, no regression, but if you enable hyperz get ready to
experience lockup, likelyhood depends on what you are doing.

So no i don't consider this a non working code. It does work and
doesn't regress.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: improve flushed depth texture handling v2

2012-07-10 Thread Jerome Glisse
On Tue, Jul 10, 2012 at 5:16 PM, Marek Olšák  wrote:
> On Tue, Jul 10, 2012 at 10:00 PM, Jerome Glisse  wrote:
>> On Tue, Jul 10, 2012 at 2:10 PM, Marek Olšák  wrote:
>>> On Tue, Jul 10, 2012 at 6:40 AM, Vadim Girlin  wrote:
>>>> On Sat, 2012-07-07 at 01:48 +0200, Marek Olšák wrote:
>>>>> On Wed, Jun 27, 2012 at 1:34 AM, Vadim Girlin  
>>>>> wrote:
>>>>> > Use r600_resource_texture::flished_depth_texture for GPU access, and
>>>>> > allocate it in the VRAM. For transfers we'll allocate untiled texture 
>>>>> > in the
>>>>> > GTT and store it in the r600_transfer::staging.
>>>>> >
>>>>> > Improves performance when flushed depth texture is frequently used by 
>>>>> > the
>>>>> > GPU (about 30% for Lightsmark).
>>>>> >
>>>>> > Signed-off-by: Vadim Girlin 
>>>>> > ---
>>>>> >
>>>>> > Fixes fbo-clear-formats, fbo-generatemipmap-formats, no regressions on
>>>>> > evergreen
>>>>>
>>>>> Hi,
>>>>>
>>>>> is there any reason this patch hasn't been committed yet?
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> I have some doubts because it was benchmarked by phoronix and there were
>>>> regressions, though I suspect that something is wrong with the results:
>>>>
>>>> http://www.phoronix.com/scan.php?page=article&item=amd_r600g_texdepth&num=4
>>>>
>>>> I was going to look into it but had no time yet. I'd like to be sure
>>>> that there are no regressions before committing.
>>>
>>> Well, there's nothing wrong with your patch. I wouldn't trust
>>> benchmarks run with the Unity desktop so much. I myself had to switch
>>> from Unity 2D to Xfce just to get consistent results when testing
>>> performance.
>>>
>>> Now that your patch separates flushing for texturing and transfers, I
>>> think we could make it a little bit faster by imlementing an in-place
>>> flush for texturing (that is without having to allocate another
>>> resource).
>>>
>>> Marek
>>
>> In place flush are useful for the case where you know you wont reuse
>> the depth buffer as a depth buffer, or if you know next operation will
>> be a gClear on the depth buffer. What i am worried about is that
>> recompression might not work in place, for it to work you need to have
>> db decompressed into db tiling format and not cb tiling format.
>
> The case where the depth is not reused is the most common one. It
> might even be the only one in practice. Depth textures are most
> commonly used for shadow mapping, which is the not-reusing case. They
> can also be used to implement deferred rendering (though that's not
> very common), which means the same as shadow mapping for us. Actually,
> no graphics algorithm comes to mind that would do write-texture-write
> with the same depth buffer.
>
> Marek

I am not saying it's not the most common one, i am saying that
recompressing might be more complex (recompress to different buffer
then copy back to original buffer, or copy buffer and uncompress from
copy).

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: improve flushed depth texture handling v2

2012-07-10 Thread Jerome Glisse
On Tue, Jul 10, 2012 at 2:10 PM, Marek Olšák  wrote:
> On Tue, Jul 10, 2012 at 6:40 AM, Vadim Girlin  wrote:
>> On Sat, 2012-07-07 at 01:48 +0200, Marek Olšák wrote:
>>> On Wed, Jun 27, 2012 at 1:34 AM, Vadim Girlin  wrote:
>>> > Use r600_resource_texture::flished_depth_texture for GPU access, and
>>> > allocate it in the VRAM. For transfers we'll allocate untiled texture in 
>>> > the
>>> > GTT and store it in the r600_transfer::staging.
>>> >
>>> > Improves performance when flushed depth texture is frequently used by the
>>> > GPU (about 30% for Lightsmark).
>>> >
>>> > Signed-off-by: Vadim Girlin 
>>> > ---
>>> >
>>> > Fixes fbo-clear-formats, fbo-generatemipmap-formats, no regressions on
>>> > evergreen
>>>
>>> Hi,
>>>
>>> is there any reason this patch hasn't been committed yet?
>>>
>>
>> Hi,
>>
>> I have some doubts because it was benchmarked by phoronix and there were
>> regressions, though I suspect that something is wrong with the results:
>>
>> http://www.phoronix.com/scan.php?page=article&item=amd_r600g_texdepth&num=4
>>
>> I was going to look into it but had no time yet. I'd like to be sure
>> that there are no regressions before committing.
>
> Well, there's nothing wrong with your patch. I wouldn't trust
> benchmarks run with the Unity desktop so much. I myself had to switch
> from Unity 2D to Xfce just to get consistent results when testing
> performance.
>
> Now that your patch separates flushing for texturing and transfers, I
> think we could make it a little bit faster by imlementing an in-place
> flush for texturing (that is without having to allocate another
> resource).
>
> Marek

In place flush are useful for the case where you know you wont reuse
the depth buffer as a depth buffer, or if you know next operation will
be a gClear on the depth buffer. What i am worried about is that
recompression might not work in place, for it to work you need to have
db decompressed into db tiling format and not cb tiling format.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] [RFC] r600g: improve handling of the shader exports

2012-06-26 Thread Jerome Glisse
On Tue, Jun 26, 2012 at 5:45 AM, Vadim Girlin  wrote:
> On Fri, 2012-06-22 at 14:24 -0400, Jerome Glisse wrote:
>> On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin  wrote:
>> >  r600g: avoid unnecessary shader exports
>> >  r600g: enable DUAL_EXPORT mode when possible
>> >
>> > First patch fixes the lockups with DUAL_EXPORT mode for me, also AFAICS it
>> > fixes some depth/stencil tests, though I'm not sure why, haven't looked
>> > into it (possibly unexpected color exports were written over the depth
>> > exports).
>> >
>> > Second patch enables DUAL_EXPORT mode when possible, giving about 40%
>> > improvement with the results of the "fill" demo (on juniper). Also it sets
>> > DB_SOURCE_FORMAT to the EXPORT_DB_TWO when in DUAL_EXPORT mode, though I'm 
>> > not sure yet if it has any effect on performance.
>> >
>> > I haven't tried to implement the same for pre-evergreen cards - I can't 
>> > test it
>> > anyway without r600 hw, but I guess it shouldn't be hard. AFAIK there will 
>> > be
>> > additional requirements for DUAL_EXPORT mode for r6xx (it's documented in 
>> > the
>> > R6xx_3D_Registers.pdf).
>> >
>> > There are no regressions with piglit on evergreen (juniper).
>>
>> r6xx/r7xx version WIP not working (well not improving perf)
>> http://people.freedesktop.org/~glisse/0003-r600g-enable-DUAL_EXPORT-mode-when-possible-on-r6xx-.patch
>
> AFAIK you've fixed that already, do you have any regressions with dual
> export on r6xx/7xx? There are some issues reported on rv770 with patch 1
> - http://lists.freedesktop.org/archives/mesa-dev/2012-June/023229.html
>
> Vadim
>

Yeah i have updated patches here that fix regression. Will send
shortly once i am confident.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] [RFC] r600g: improve handling of the shader exports

2012-06-22 Thread Jerome Glisse
On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin  wrote:
>  r600g: avoid unnecessary shader exports
>  r600g: enable DUAL_EXPORT mode when possible
>
> First patch fixes the lockups with DUAL_EXPORT mode for me, also AFAICS it
> fixes some depth/stencil tests, though I'm not sure why, haven't looked
> into it (possibly unexpected color exports were written over the depth
> exports).
>
> Second patch enables DUAL_EXPORT mode when possible, giving about 40%
> improvement with the results of the "fill" demo (on juniper). Also it sets
> DB_SOURCE_FORMAT to the EXPORT_DB_TWO when in DUAL_EXPORT mode, though I'm 
> not sure yet if it has any effect on performance.
>
> I haven't tried to implement the same for pre-evergreen cards - I can't test 
> it
> anyway without r600 hw, but I guess it shouldn't be hard. AFAIK there will be
> additional requirements for DUAL_EXPORT mode for r6xx (it's documented in the
> R6xx_3D_Registers.pdf).
>
> There are no regressions with piglit on evergreen (juniper).

r6xx/r7xx version WIP not working (well not improving perf)
http://people.freedesktop.org/~glisse/0003-r600g-enable-DUAL_EXPORT-mode-when-possible-on-r6xx-.patch

Cheers,
Jerome

>
>  src/gallium/drivers/r600/evergreen_state.c   |   56 
> --
>  src/gallium/drivers/r600/evergreend.h        |    7 
>  src/gallium/drivers/r600/r600_pipe.h         |    5 +++
>  src/gallium/drivers/r600/r600_shader.c       |   25 ++--
>  src/gallium/drivers/r600/r600_shader.h       |    7 +++-
>  src/gallium/drivers/r600/r600_state_common.c |    7 +++-
>  6 files changed, 88 insertions(+), 19 deletions(-)
>
> --
> 1.7.10.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: enable DUAL_EXPORT mode when possible

2012-06-22 Thread Jerome Glisse
On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin  wrote:
> It seems DUAL_EXPORT on evergreen may be enabled when all CBs use 16-bit 
> export
> mode (EXPORT_4C_16BPC), also there should be at least one CB, and the PS
> shouldn't export depth/stencil.
>
> Signed-off-by: Vadim Girlin 
Reviewed-by: Jerome Glisse 
> ---
>  src/gallium/drivers/r600/evergreen_state.c   |   46 
> ++
>  src/gallium/drivers/r600/evergreend.h        |    7 
>  src/gallium/drivers/r600/r600_pipe.h         |    5 +++
>  src/gallium/drivers/r600/r600_state_common.c |    3 ++
>  4 files changed, 55 insertions(+), 6 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index 3fe95e1..bddb67e 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -1458,7 +1458,6 @@ static void evergreen_cb(struct r600_context *rctx, 
> struct r600_pipe_state *rsta
>             (desc->channel[i].size < 17 &&
>              desc->channel[i].type == UTIL_FORMAT_TYPE_FLOAT))) {
>                color_info |= S_028C70_SOURCE_FORMAT(V_028C70_EXPORT_4C_16BPC);
> -               rctx->export_16bpc = true;
>        } else {
>                rctx->export_16bpc = false;
>        }
> @@ -1661,6 +1660,7 @@ static void evergreen_set_framebuffer_state(struct 
> pipe_context *ctx,
>        struct r600_context *rctx = (struct r600_context *)ctx;
>        struct r600_pipe_state *rstate = CALLOC_STRUCT(r600_pipe_state);
>        uint32_t tl, br;
> +       int i;
>
>        if (rstate == NULL)
>                return;
> @@ -1674,10 +1674,16 @@ static void evergreen_set_framebuffer_state(struct 
> pipe_context *ctx,
>
>        /* build states */
>        rctx->have_depth_fb = 0;
> +       rctx->export_16bpc = true;
>        rctx->nr_cbufs = state->nr_cbufs;
> -       for (int i = 0; i < state->nr_cbufs; i++) {
> +       for (i = 0; i < state->nr_cbufs; i++) {
>                evergreen_cb(rctx, rstate, state, i);
>        }
> +
> +       for (; i < 8 ; i++) {
> +               r600_pipe_state_add_reg(rstate, R_028C70_CB_COLOR0_INFO + i * 
> 0x3C, 0);
> +       }
> +
>        if (state->zsbuf) {
>                evergreen_db(rctx, rstate, state);
>        }
> @@ -2585,6 +2591,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
> struct r600_pipe_shader
>        int ninterp = 0;
>        boolean have_linear = FALSE, have_centroid = FALSE, have_perspective = 
> FALSE;
>        unsigned spi_baryc_cntl, sid, tmp, idx = 0;
> +       unsigned z_export = 0, stencil_export = 0;
>
>        rstate->nregs = 0;
>
> @@ -2633,13 +2640,16 @@ void evergreen_pipe_shader_ps(struct pipe_context 
> *ctx, struct r600_pipe_shader
>
>        for (i = 0; i < rshader->noutput; i++) {
>                if (rshader->output[i].name == TGSI_SEMANTIC_POSITION)
> -                       db_shader_control |= S_02880C_Z_EXPORT_ENABLE(1);
> +                       z_export = 1;
>                if (rshader->output[i].name == TGSI_SEMANTIC_STENCIL)
> -                       db_shader_control |= 
> S_02880C_STENCIL_EXPORT_ENABLE(1);
> +                       stencil_export = 1;
>        }
>        if (rshader->uses_kill)
>                db_shader_control |= S_02880C_KILL_ENABLE(1);
>
> +       db_shader_control |= S_02880C_Z_EXPORT_ENABLE(z_export);
> +       db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(stencil_export);
> +
>        exports_ps = 0;
>        for (i = 0; i < rshader->noutput; i++) {
>                if (rshader->output[i].name == TGSI_SEMANTIC_POSITION ||
> @@ -2711,8 +2721,9 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
> struct r600_pipe_shader
>        r600_pipe_state_add_reg(rstate,
>                                R_02884C_SQ_PGM_EXPORTS_PS,
>                                exports_ps);
> -       r600_pipe_state_add_reg(rstate, R_02880C_DB_SHADER_CONTROL,
> -                               db_shader_control);
> +
> +       shader->db_shader_control = db_shader_control;
> +       shader->ps_depth_export = z_export | stencil_export;
>
>        shader->sprite_coord_enable = rctx->sprite_coord_enable;
>        if (rctx->rasterizer)
> @@ -2798,3 +2809,26 @@ void *evergreen_create_db_flush_dsa(struct 
> r600_context *rctx)
>        /* Don't set the 'is_flush' flag in r600_pipe_dsa, evergreen doesn't 
> need it. */
>        return rstate;
>  }
> +
> +void evergreen_update_dual_export_state(struct r600_context * rctx)
> +{
> +       unsigned dual_export = rctx->export_

Re: [Mesa-dev] [PATCH 1/2] r600g: avoid unnecessary shader exports

2012-06-22 Thread Jerome Glisse
On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin  wrote:
> In some cases TGSI shader has more color outputs than the number of CBs,
> so it seems we need to limit the number of color exports. This requires
> different shader variants depending on the nr_cbufs, but on the other hand
> we are doing less exports, which are very costly.
>
> Signed-off-by: Vadim Girlin 
Reviewed-by: Jerome Glisse 

> ---
>  src/gallium/drivers/r600/evergreen_state.c   |   10 +++---
>  src/gallium/drivers/r600/r600_shader.c       |   25 ++---
>  src/gallium/drivers/r600/r600_shader.h       |    7 ++-
>  src/gallium/drivers/r600/r600_state_common.c |    4 ++--
>  4 files changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index b618ca8..3fe95e1 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -2641,18 +2641,14 @@ void evergreen_pipe_shader_ps(struct pipe_context 
> *ctx, struct r600_pipe_shader
>                db_shader_control |= S_02880C_KILL_ENABLE(1);
>
>        exports_ps = 0;
> -       num_cout = 0;
>        for (i = 0; i < rshader->noutput; i++) {
>                if (rshader->output[i].name == TGSI_SEMANTIC_POSITION ||
>                    rshader->output[i].name == TGSI_SEMANTIC_STENCIL)
>                        exports_ps |= 1;
> -               else if (rshader->output[i].name == TGSI_SEMANTIC_COLOR) {
> -                       if (rshader->fs_write_all)
> -                               num_cout = rshader->nr_cbufs;
> -                       else
> -                               num_cout++;
> -               }
>        }
> +
> +       num_cout = rshader->nr_ps_color_exports;
> +
>        exports_ps |= S_02884C_EXPORT_COLORS(num_cout);
>        if (!exports_ps) {
>                /* always at least export 1 component per pixel */
> diff --git a/src/gallium/drivers/r600/r600_shader.c 
> b/src/gallium/drivers/r600/r600_shader.c
> index 63b9a03..782113b 100644
> --- a/src/gallium/drivers/r600/r600_shader.c
> +++ b/src/gallium/drivers/r600/r600_shader.c
> @@ -801,6 +801,12 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
>                                ctx->cv_output = i;
>                                break;
>                        }
> +               } else if (ctx->type == TGSI_PROCESSOR_FRAGMENT) {
> +                       switch (d->Semantic.Name) {
> +                       case TGSI_SEMANTIC_COLOR:
> +                               ctx->shader->nr_ps_max_color_exports++;
> +                               break;
> +                       }
>                }
>                break;
>        case TGSI_FILE_CONSTANT:
> @@ -1153,8 +1159,10 @@ static int r600_shader_from_tgsi(struct r600_context * 
> rctx, struct r600_pipe_sh
>        ctx.colors_used = 0;
>        ctx.clip_vertex_write = 0;
>
> +       shader->nr_ps_color_exports = 0;
> +       shader->nr_ps_max_color_exports = 0;
> +
>        shader->two_side = (ctx.type == TGSI_PROCESSOR_FRAGMENT) && 
> rctx->two_side;
> -       shader->nr_cbufs = rctx->nr_cbufs;
>
>        /* register allocations */
>        /* Values [0,127] correspond to GPR[0..127].
> @@ -1289,6 +1297,9 @@ static int r600_shader_from_tgsi(struct r600_context * 
> rctx, struct r600_pipe_sh
>                }
>        }
>
> +       if (shader->fs_write_all && rctx->chip_class >= EVERGREEN)
> +               shader->nr_ps_max_color_exports = 8;
> +
>        if (ctx.fragcoord_input >= 0) {
>                if (ctx.bc->chip_class == CAYMAN) {
>                        for (j = 0 ; j < 4; j++) {
> @@ -1528,10 +1539,17 @@ static int r600_shader_from_tgsi(struct r600_context 
> * rctx, struct r600_pipe_sh
>                        break;
>                case TGSI_PROCESSOR_FRAGMENT:
>                        if (shader->output[i].name == TGSI_SEMANTIC_COLOR) {
> +                               /* never export more colors than the number 
> of CBs */
> +                               if (next_pixel_base >= rctx->nr_cbufs) {
> +                                       /* skip export */
> +                                       j--;
> +                                       continue;
> +                               }
>                                output[j].array_base = next_pixel_base++;
>                                output[j].type = 
> V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL;
> +                               shader->nr_ps_color_exports++;
>                  

Re: [Mesa-dev] [PATCH 1/4] mesa: Add support for GL_ARB_base_instance

2012-06-19 Thread Jerome Glisse
On Tue, Jun 19, 2012 at 4:46 PM, Jerome Glisse  wrote:
> On Mon, Jun 18, 2012 at 8:33 PM, Fredrik Höglund  wrote:
>> On Tuesday 19 June 2012, Brian Paul wrote:
>>> On 06/18/2012 02:50 PM, Fredrik Höglund wrote:
>>> > Reviewed-by: Brian Paul
>>> > ---
>>> >
>>> > v2: Change baseinstance to base_instance in _mesa_prims
>>> >      and to baseInstance in the vbo_exec functions.
>>> >
>>> >   src/mapi/glapi/gen/ARB_base_instance.xml |   40 +++
>>> >   src/mapi/glapi/gen/Makefile              |    1 +
>>> >   src/mapi/glapi/gen/gl_API.xml            |    3 +-
>>> >   src/mesa/main/dd.h                       |   10 +++
>>> >   src/mesa/main/dlist.c                    |   45 
>>> >   src/mesa/main/extensions.c               |    1 +
>>> >   src/mesa/main/mtypes.h                   |    1 +
>>> >   src/mesa/main/vtxfmt.c                   |    3 +
>>> >   src/mesa/vbo/vbo.h                       |    1 +
>>> >   src/mesa/vbo/vbo_exec_api.c              |    1 +
>>> >   src/mesa/vbo/vbo_exec_array.c            |  114 
>>> > +++---
>>> >   src/mesa/vbo/vbo_save_api.c              |    2 +
>>> >   src/mesa/vbo/vbo_split_inplace.c         |    6 +-
>>> >   13 files changed, 216 insertions(+), 12 deletions(-)
>>> >   create mode 100644 src/mapi/glapi/gen/ARB_base_instance.xml
>>>
>>> Looks good.  Do you need me to commit/push these for you?
>>
>> Yeah, I don't have commit access, so please do.
>>
>> Fredrik
>>
>
> This break gallium driver, nothing render with it
>
> Cheers,
> Jerome

Well nevermind, git clean -fdX did the trick sorry for the noise.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] mesa: Add support for GL_ARB_base_instance

2012-06-19 Thread Jerome Glisse
On Mon, Jun 18, 2012 at 8:33 PM, Fredrik Höglund  wrote:
> On Tuesday 19 June 2012, Brian Paul wrote:
>> On 06/18/2012 02:50 PM, Fredrik Höglund wrote:
>> > Reviewed-by: Brian Paul
>> > ---
>> >
>> > v2: Change baseinstance to base_instance in _mesa_prims
>> >      and to baseInstance in the vbo_exec functions.
>> >
>> >   src/mapi/glapi/gen/ARB_base_instance.xml |   40 +++
>> >   src/mapi/glapi/gen/Makefile              |    1 +
>> >   src/mapi/glapi/gen/gl_API.xml            |    3 +-
>> >   src/mesa/main/dd.h                       |   10 +++
>> >   src/mesa/main/dlist.c                    |   45 
>> >   src/mesa/main/extensions.c               |    1 +
>> >   src/mesa/main/mtypes.h                   |    1 +
>> >   src/mesa/main/vtxfmt.c                   |    3 +
>> >   src/mesa/vbo/vbo.h                       |    1 +
>> >   src/mesa/vbo/vbo_exec_api.c              |    1 +
>> >   src/mesa/vbo/vbo_exec_array.c            |  114 
>> > +++---
>> >   src/mesa/vbo/vbo_save_api.c              |    2 +
>> >   src/mesa/vbo/vbo_split_inplace.c         |    6 +-
>> >   13 files changed, 216 insertions(+), 12 deletions(-)
>> >   create mode 100644 src/mapi/glapi/gen/ARB_base_instance.xml
>>
>> Looks good.  Do you need me to commit/push these for you?
>
> Yeah, I don't have commit access, so please do.
>
> Fredrik
>

This break gallium driver, nothing render with it

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Unify SURFACE_SYNC packet emission for 3D and compute

2012-06-19 Thread Jerome Glisse
On Tue, Jun 19, 2012 at 2:06 PM, Tom Stellard  wrote:
> On Tue, Jun 19, 2012 at 07:57:50PM +0200, Marek Olšák wrote:
>> Hi Tom,
>>
>> This adds new calls to r600_inval_xxx_cache, which justs sets the
>> dirty flag in the atom "surface_sync_cmd" to true, but I couldn't find
>> where the compute code calls r600_emit_atom. The proper way to emit
>> dirty atoms is in r600_state_common.c:843-845.
>>
>
> The compute code is calling r600_flush_framebuffer() from
> compute_emit_cs, which is what calls r600_emit_atom() for
> surface_sync_cmd.
>
> -Tom

I am heavily refactoring all this for hyperz, but i can rebase once i
have it working.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Clarifications w.r.t MSAA

2012-06-12 Thread Jerome Glisse
On Tue, Jun 12, 2012 at 1:34 PM, Christoph Bumiller
 wrote:
> On 06/12/2012 06:52 PM, Jerome Glisse wrote:
>> On Tue, Jun 12, 2012 at 8:39 AM, Christoph Bumiller
>>  wrote:
>>> On 06/12/2012 02:25 PM, Olivier Galibert wrote:
>>>> On Tue, Jun 12, 2012 at 01:50:08PM +0200, Christoph Bumiller wrote:
>>>>>> First question: how many depths should be computed, and for which
>>>>>> coordinates? Which of these values is associated with which sample?
>>>>>
>>>>> One for each sample point. The depth buffer will be multisampled as well.
>>>>> Coverage sampling (CSAA) where you have extra coverage samples that do
>>>>> NOT (necessarily) correspond to color sample locations are not covered
>>>>> by the GL spec, it's vendor-specific.
>>>>
>>>> Ok.  So that means that if the shader writes z, you have to do full
>>>> supersampling then.
>>>>
>>>
>>> No, I don't think that's the case. You get per-sample depth values if
>>> you use fixed-pipe depth, but shader-computed depth should simply be
>>> replicated (to all samples covered by the shader invocation), like color
>>> outputs.
>>
>> I don't think thats how it wors, each sample will have its color and
>> depth value no matter if fixed pipeline or not. When resolving the
>
> Sorry, "fixed-pipe" was misleading, I meant the z-value from the
> rasterizer (which can be regarded as fixed functionality), not "without
> (custom) shaders".
>
> If the shader is only invoked once for each fragment (i.e.
> MinSampleShading == 1), all the samples that belong to that fragment
> will share the same color and depth values.
>

So i think we agree but according to spec  MinSampleShading=1 -> the
fragment shader is run once for each sample. MinSampleShading value is
a fraction of x/MIN_SAMPLE_SHADING_VALUE_ARB So if you have 8 sample
surface and you set MinSampleShading to 0.5 you will get the fragment
shader invoked for 4 sample. Note that according to spec
implementation might ignore the fraction and only cover the case
MinSampleShading==1

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Clarifications w.r.t MSAA

2012-06-12 Thread Jerome Glisse
On Tue, Jun 12, 2012 at 8:39 AM, Christoph Bumiller
 wrote:
> On 06/12/2012 02:25 PM, Olivier Galibert wrote:
>> On Tue, Jun 12, 2012 at 01:50:08PM +0200, Christoph Bumiller wrote:
 First question: how many depths should be computed, and for which
 coordinates? Which of these values is associated with which sample?
>>>
>>> One for each sample point. The depth buffer will be multisampled as well.
>>> Coverage sampling (CSAA) where you have extra coverage samples that do
>>> NOT (necessarily) correspond to color sample locations are not covered
>>> by the GL spec, it's vendor-specific.
>>
>> Ok.  So that means that if the shader writes z, you have to do full
>> supersampling then.
>>
>
> No, I don't think that's the case. You get per-sample depth values if
> you use fixed-pipe depth, but shader-computed depth should simply be
> replicated (to all samples covered by the shader invocation), like color
> outputs.

I don't think thats how it wors, each sample will have its color and
depth value no matter if fixed pipeline or not. When resolving the
msaa surface, you only use the sample that cover the surface to make
the average.

Anyway that's my understanding.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Four questions about DRI1 drivers

2012-03-01 Thread Jerome Glisse
On Thu, 2012-03-01 at 13:56 -0600, Patrick Baggett wrote:
> Now I'm curious. Is it the case that every DRI1 driver could be a DRI2
> driver with enough effort? Not talking about emulating hardware
> features.
> 
> 
> Patrick

DRI2 impose nothing on hw capabilities. So any hw can do DRI2 even hw
without 3d engine (see virtual gem for instance).

Cheers,
Jerome


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/12] R600g: cleanups and rework of queries

2012-02-22 Thread Jerome Glisse
On Tue, Feb 21, 2012 at 7:55 PM, Marek Olšák  wrote:

> Hi everyone,
>
> Besides the cleanups, there are fixes for create_context fail paths and
> rework of queries. The rework is the most important, because it eliminates
> buffer_map calls (and therefore buffer_wait) in begin_query.
>
> There are no piglit regressions on Evergreen.
>
> Please review.
>
>
Reviewed.

Do you test with 2d tiling on or off ?

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g tiling final

2012-02-03 Thread Jerome Glisse
Hi,

So tiling work is i believe done. I have run piglit accross wide range
of hw and sw combination. Bottom line is new mesa on top of either old
kernel or old ddx won't regress anything. New mesa on top of proper
kernel will get you 2D tiling for texture and anything allocated by
mesa, and if you have proper DDX with option ColorTiling2D enabled you
will also get 2D tiling for front buffer and depth/stencil buffer.

For libdrm you need the lastest master. I will do a libdrm release
on monday. Afterward i will commit mesa & ddx with proper autoconf
voodoo to check for the new libdrm.

kernel patches :
http://people.freedesktop.org/~glisse/tiling/0001-drm-radeon-kms-add-support-for-streamout-v7.patch
http://people.freedesktop.org/~glisse/tiling/0001-drm-radeon-add-support-for-evergreen-ni-tiling-infor.patch

mesa patch:
http://people.freedesktop.org/~glisse/tiling/0001-r600g-add-support-for-common-surface-allocator-for-t.patch

ddx patch:
http://people.freedesktop.org/~glisse/tiling/0001-r600-evergreen-use-common-surface-allocator-for-tili.patch

Link to piglit test:
http://people.freedesktop.org/~glisse/tiling/cayman/changes.html
http://people.freedesktop.org/~glisse/tiling/cedar/changes.html
http://people.freedesktop.org/~glisse/tiling/redwood/changes.html
http://people.freedesktop.org/~glisse/tiling/juniper/changes.html
http://people.freedesktop.org/~glisse/tiling/fusion/changes.html
http://people.freedesktop.org/~glisse/tiling/rv770/changes.html
http://people.freedesktop.org/~glisse/tiling/rv710/changes.html
http://people.freedesktop.org/~glisse/tiling/rv635/changes.html
http://people.freedesktop.org/~glisse/tiling/rv610/changes.html

first column GPU name is unpatched mesa,unpatched ddx,unpatched kernel

second column surf0-ddx0 is patched mesa,patched ddx with 2d tiling
disabled and new mesa code path disabled (basicly check that nothing
regress in old code path).

third column patched mesa, unpatched ddx using new mesa code path.
This check that mesa on top of old userspace doesn't break anything.

fourth column patched mesa, patched ddx, unpatched kernel. This check
that new mesa on top of old kernel works properly.

fith column is everything is patched and 2D tiling is enabled everywhere.


Note that few test just randomly switch from pass to fail (fbo-sys-blit*,
read-front, ...).

I also tested a lot the old userspace on top of new kernel for evergreen
to make sure that the command checker doesn't regress anything. While it
reject some command stream thus were wrong and never successfully completed.
Leading to no regression in piglit (basicly second column).

Fusion doesn't have unpatched kernel run as things keep locking up for me
with unpatched kernel.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/15] R600g cleanup and rework of cache flushing

2012-01-30 Thread Jerome Glisse
On Mon, Jan 30, 2012 at 09:23:03PM +0100, Marek Olšák wrote:
> Hi everyone,
> 
> This patch series is a follow-up to the previous one ("Remove all uses of the 
> register mask"). First, it cleans up some code and merges r600_context into 
> r600_pipe_context. The split of functionality between the two "contexts" made 
> absolutely no sense.
> 
> Next, it adds a new mechanism for emitting states. It's largely inspired by 
> r300g and it's really simple, yet robust. (some people should seriously learn 
> what "polymorphism" means and how it's used to write software before even 
> writing drivers, because I feel like I am the only one making use of it in 
> r600g, which is really a shame /rant) It can be used to schedule *any* 
> commands for execution before the next draw operation, not just register 
> updates. We'll use that more often in the future. For now, it's only used for 
> cache flushes.
> 
> Finally, this series completely reworks cache flushes. The problem with the 
> old code was that the flags "last_flush" and "binding", which were stored in 
> resource structs, were possible causes of race conditions. Not only does this 
> new code fix that, it also simplifies the whole thing. The flushes are done 
> explicitly when states are changed according to this scheme:
> bind_shader -> r600_inval_shader_cache
> set_constant_buffer -> r600_inval_shader_cache
> bind_vertex_elements -> r600_inval_shader_cache (for the fetch shader)
> bind_vertex_buffers -> r600_inval_vertex_cache
> bind_sampler_views -> r600_inval_texture_cache
> set_framebuffer -> r600_flush_framebuffer
> flush -> r600_flush_framebuffer
> 
> Besides that, SURFACE_SYNC is called at most once between draw operations and 
> flushes the whole memory range. The inval/flush functions only accumulate the 
> flush flags.
> 
> The rework also fixes flushes on RV670. The fbo-drawbuffers test no longer 
> causes issues. Flushing CB1_DEST_BASE was not enough, DEST_BASE_0 must be 
> flushed as well. This fixes 21 piglit tests on RV670. The flushing seems to 
> be fixed finally, but the piglit results are not yet up to par with RV730.
> 
> All this code has been tested on RV670, RV730, and REDWOOD.
> 

It makes no sense and it's over engineer if you forget the initial
design decision which was for a new kernel API which matched closely
what r600g had.

But i agree that against cs ioctl this design is just painful.

Anyway looks good from quick review.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] r600-r800 2D tiling

2012-01-16 Thread Jerome Glisse
On Mon, Jan 16, 2012 at 12:08:17PM +, Simon Farnsworth wrote:
> (resending due to my inability to work my e-mail client - I neither cc'd
> Jerome, nor used the correct identity, so the original appears to be held in
> moderation).
> 
> On Thursday 12 January 2012, Jerome Glisse  wrote:
> > Hi,
> > 
> > I don't cross post as i am pretty sure all interested people are reading
> > this mailing-list.
> > 
> > Attached is kernel, libdrm, ddx, mesa/r600g patches to enable 2D tiling
> > on r600 to cayman. I haven't yet done a full regression testing but 2D
> > tiling seems to work ok. I would like to get feedback on 2 things :
> > 
> > - the kernel API
> 
> I notice that you don't expose all the available Evergreen parameters to
> user control (TILE_SPLIT_BYTES, NUM_BANKS are both currently fixed by the
> kernel). Is this deliberate?
> 
> It looks like it's leftovers from a previous attempt to force Evergreen's
> flexible 2D tiling to behave like R600's fixed-by-hardware 2D tiling.

I need to add tile split to kernel API, num banks is not a surface parameter.
Well it is but it needs to be set to the same value as the global one. I think
it might only be usefull in multi-gpu case with different GPU (but that's
just a wild guess).

> 
> > - using libdrm/radeon as common place for surface allocation
> > 
> > The second question especialy impact the layering/abstraction of gallium
> > btw winsys as it make libdrm/radeon_surface API a part of the winsys.
> > The ddx doesn't need as much knowledge as mesa (pretty much the whole
> > mipmap tree is pointless to the ddx). So anyone have strong feeling
> > about moving the whole mipmap tree computation to this common code ?
> >
> I'm in favour - it means that all the code relating to the details of how
> modern Radeons tile surfaces is in one place.
> 
> I've looked at the API you introduce to handle this, and it should be very
> easy to port to a non-libdrm platform - the only element of the API that's
> currently tied to libdrm is radeon_surface_manager_new, so a new platform
> shouldn't struggle to adapt it.

I am in process of reworking a bit the API but it will be very close and
only the surface manager creator will have drm specific code.

> I do have one question; how are you intending to handle passing the tiling
> parameters from the DDX to Mesa for GLX_EXT_texture_from_pixmap? Right now,
> it works because the DDX uses the surface manager's defaults for tiling, as
> does Mesa; I would expect Mesa to read out the parameters as set in the
> kernel and use those.
> 
> At a future date, I can envisage the DDX wanting to choose a different
> tiling layout for DRI2 buffers, or XComposite backing pixmaps (e.g. because
> someone's benchmarked it and found that choosing something beyond the bare
> minimum that meets constraints improves performance); it would be a shame if
> we can't do this because Mesa's not flexible enough.

We don't use dri2 to communicate tiling info, we go through kernel for that.
So ddx call set_tiling ioctl and mesa call get_tiling, i haven't hooked up
the mesa side to extract various eg values yet, right now it works because
both ddx and mesa use same surface allocator param so they end up taking
same value for various eg fields. Again i am working on this. Hopefully
should be completely done this week.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] r600-r800 2D tiling

2012-01-13 Thread Jerome Glisse
On Fri, Jan 13, 2012 at 11:59:28AM +0100, Michel Dänzer wrote:
> On Don, 2012-01-12 at 14:50 -0500, Jerome Glisse wrote: 
> > 
> > Attached is kernel, libdrm, ddx, mesa/r600g patches to enable 2D tiling
> > on r600 to cayman. I haven't yet done a full regression testing but 2D
> > tiling seems to work ok. I would like to get feedback on 2 things :
> > 
> > - the kernel API
> > - using libdrm/radeon as common place for surface allocation
> 
> I generally like the idea of centralizing this in libdrm_radeon.
> 
> 
> > The second question especialy impact the layering/abstraction of gallium
> > btw winsys as it make libdrm/radeon_surface API a part of the winsys.
> 
> That's unfortunate, but then again the Radeon Gallium drivers have never
> been very clean in this regard. I guess the first one to want to use
> them on a non-DRM platform gets to clean that up. :)
> 
> 
> > To test you need to set ColorTiling2D to true in your xorg.conf, plan
> > is to get mesa 8.0 and newer with proper support for 2D tiling and
> > in 1 year, to move ColorTiling2D default value from false to true.
> > (assumption is that by then we could assume that someone with a working
> > ddx would also have a supported mesa)
> 
> Sounds good.
> 
> Note that the Mesa and X driver changes need to either continue building
> and working with older libdrm_radeon, or bump the libdrm_radeon version
> requirement in configure.ac.

Plan is to release updated libdrm before commiting to mesa, at which point
i will try to dust off my configure.ac foo.

I updated patches and are now at :
http://people.freedesktop.org/~glisse/tiling/

For them to work you need the ddx option and for mesa you need to set
R600_TILING=1 & R600_SURF=1. I will remove this once i am confident that
it works accross various GPU without regression.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: add support for virtual address space on cayman v8

2012-01-08 Thread Jerome Glisse
On Sat, Jan 7, 2012 at 8:08 PM, Marek Olšák  wrote:
> On Fri, Jan 6, 2012 at 4:42 PM,   wrote:
>> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
>> b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
>> index ccf9c4f..8ef0c18 100644
>> --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
>> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
>> @@ -30,6 +30,7 @@
>>  #include "util/u_hash_table.h"
>>  #include "util/u_memory.h"
>>  #include "util/u_simple_list.h"
>> +#include "util/u_double_list.h"
>>  #include "os/os_thread.h"
>>  #include "os/os_mman.h"
>>
>> @@ -67,6 +68,12 @@ static INLINE struct radeon_bo *radeon_bo(struct 
>> pb_buffer *bo)
>>     return (struct radeon_bo *)bo;
>>  }
>>
>> +struct radeon_bo_va_hole {
>> +    struct list_head list;
>> +    uint64_t         offset;
>> +    uint64_t         size;
>> +};
>> +
>>  struct radeon_bomgr {
>>     /* Base class. */
>>     struct pb_manager base;
>> @@ -77,6 +84,11 @@ struct radeon_bomgr {
>>     /* List of buffer handles and its mutex. */
>>     struct util_hash_table *bo_handles;
>>     pipe_mutex bo_handles_mutex;
>> +
>> +    /* is virtual address supported */
>> +    bool va;
>> +    unsigned va_offset;
>> +    struct list_head va_holes;
>>  };
>>
>>  static INLINE struct radeon_bomgr *radeon_bomgr(struct pb_manager *mgr)
>> @@ -151,9 +163,85 @@ static boolean radeon_bo_is_busy(struct pb_buffer *_buf,
>>     }
>>  }
>>
>> +static uint64_t radeon_bomgr_find_va(struct radeon_bomgr *mgr, uint64_t 
>> size)
>> +{
>> +    struct radeon_bo_va_hole *hole, *n;
>> +    uint64_t offset = 0;
>> +
>> +    pipe_mutex_lock(mgr->bo_handles_mutex);
>
> radeon_bomgr::bo_handles_mutex should only guard accesses to
> radeon_bomgr::bo_handles. I don't see a reason to reuse it. Could you
> please add another mutex for the va_* stuff?
>
>> +    /* first look for a hole */
>> +    LIST_FOR_EACH_ENTRY_SAFE(hole, n, &mgr->va_holes, list) {
>> +        if (hole->size == size) {
>> +            offset = hole->offset;
>> +            list_del(&hole->list);
>> +            FREE(hole);
>> +            pipe_mutex_unlock(mgr->bo_handles_mutex);
>> +            return offset;
>> +        }
>> +        if (hole->size > size) {
>> +            offset = hole->offset;
>> +            hole->size -= size;
>> +            hole->offset += size;
>> +            pipe_mutex_unlock(mgr->bo_handles_mutex);
>> +            return offset;
>> +        }
>> +    }
>> +
>> +    offset = mgr->va_offset;
>> +    mgr->va_offset += size;
>> +    pipe_mutex_unlock(mgr->bo_handles_mutex);
>> +    return offset;
>> +}
>> +
>> +static void radeon_bomgr_force_va(struct radeon_bomgr *mgr, uint64_t va, 
>> uint64_t size)
>> +{
>> +    pipe_mutex_lock(mgr->bo_handles_mutex);
>> +    if (va >= mgr->va_offset) {
>> +        mgr->va_offset = va + size;
>> +    } else {
>> +        struct radeon_bo_va_hole *hole, *n;
>> +        uint64_t stmp, etmp;
>> +
>> +        /* free all hole that fall into the range
>> +         * NOTE that we might loose virtual address space
>> +         */
>> +        LIST_FOR_EACH_ENTRY_SAFE(hole, n, &mgr->va_holes, list) {
>> +            stmp = hole->offset;
>> +            etmp = stmp + hole->size;
>> +            if (va >= stmp && va < etmp) {
>> +                list_del(&hole->list);
>> +                FREE(hole);
>> +            }
>> +        }
>> +    }
>> +    pipe_mutex_unlock(mgr->bo_handles_mutex);
>> +}
>> +
>> +static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va, 
>> uint64_t size)
>> +{
>> +    pipe_mutex_lock(mgr->bo_handles_mutex);
>> +    if ((va + size) == mgr->va_offset) {
>> +        mgr->va_offset = va;
>> +    } else {
>> +        struct radeon_bo_va_hole *hole;
>> +
>> +        /* FIXME on allocation failure we just loose virtual address space
>> +         * maybe print a warning
>> +         */
>> +        hole = CALLOC_STRUCT(radeon_bo_va_hole);
>> +        if (hole) {
>> +            hole->size = size;
>> +            hole->offset = va;
>> +            list_add(&hole->list, &mgr->va_holes);
>> +        }
>> +    }
>> +    pipe_mutex_unlock(mgr->bo_handles_mutex);
>> +}
>> +
>>  static void radeon_bo_destroy(struct pb_buffer *_buf)
>>  {
>>     struct radeon_bo *bo = radeon_bo(_buf);
>> +    struct radeon_bomgr *mgr = bo->mgr;
>>     struct drm_gem_close args;
>>
>>     memset(&args, 0, sizeof(args));
>> @@ -168,6 +256,10 @@ static void radeon_bo_destroy(struct pb_buffer *_buf)
>>     if (bo->ptr)
>>         os_munmap(bo->ptr, bo->base.size);
>>
>> +    if (mgr->va) {
>> +        radeon_bomgr_free_va(mgr, bo->va, bo->va_size);
>> +    }
>> +
>>     /* Close object. */
>>     args.handle = bo->handle;
>>     drmIoctl(bo->rws->fd, DRM_IOCTL_GEM_CLOSE, &args);
>> @@ -343,6 +435,7 @@ static struct pb_buffer *radeon_bomgr_create_bo(struct 
>> pb_manager *_mgr,
>>     struct radeon_bo *bo;
>>     struct drm_radeon_gem_create args;
>>     struct radeon_bo_desc *rdesc = (struct radeon_bo_desc*)desc;
>> +    int r;
>>
>>     memset(&

Re: [Mesa-dev] [PATCH 0/1] Delete i965g

2011-11-29 Thread Jerome Glisse
On Tue, Nov 29, 2011 at 10:12 AM, Jose Fonseca  wrote:
> The bulk is there but there are a few places missing.
>
> I'll update those, do some sanity checks and commit.
>
> Jose

Is there a good reason to delete i965g ? Maybe some people are interested in it.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Why failover module are not used?

2011-09-23 Thread Jerome Glisse
On Fri, Sep 23, 2011 at 3:18 AM,   wrote:
> Hi all,
>
>    In our mesa code, there is a pipe driver named failover which is not used
> at all.  I think the failover pipe driver is a good solution of the hardware
> without full capability to support GL2.0. But why it’s discarded? It’s
> because fallback solution isn’t needed for almost all hardware or because
> there is critical bug to stop using it?
>
>    Any answer will be appreciated.
>
>
>
> Thanks.
>
> Best Regards,
>
> Jacob He

I think it was decided that it's better not lie about hw capabilities
and have the hw driver reject unsupported shader/features.

Regards,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] r600g: implement fragment and vertex color clamp

2011-06-27 Thread Jerome Glisse
On Mon, Jun 27, 2011 at 8:38 AM, Roland Scheidegger  wrote:
> Am 25.06.2011 00:22, schrieb Vadim Girlin:
>> On 06/24/2011 11:38 PM, Jerome Glisse wrote:
>>> On Fri, Jun 24, 2011 at 12:29 PM, Vadim Girlin
>>> wrote:
>>>> Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440
>>>>
>>>> Signed-off-by: Vadim Girlin
>>>
>>> As discussed previously, there is better to handle this. I think best
>>> solution is to always add the instruction and to conditionally execute
>>> them thanks to the boolean constant. If this reveal to have a too big
>>> impact on shader, other solution i see is adding a cf block with those
>>> instructions and to enable or disable that block (cf_nop) and reupload
>>> shader that would avoid a rebuild.
>>
>> I know its not optimal to do a full rebuild, but rebuild is needed only
>> when the application will use the same shader in different clamping
>> states. It won't be a problem if the application doesn't change clamping
>> state or if it changes the state but uses each shader in one state only.
>> So assuming that typical app will not use one shader in both states, it
>> shouldn't be a problem. Is this assumption wrong? I'm not really sure
>> because I have no much experience in this. But if it's wrong then it's
>> probably better for performance to build and cache both versions.
> I tend to think you're right apps probably don't want to use the same
> shader both with and without clamping.

Well if boolean block (see COND field set to SQ_CF_COND_BOOL in
SQ_CF_WORD1) are free from perf point of view then i think it's best
to have one shader with the clamp instruction inside the boolean
enabled block. Only benchmark can tell.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] r600g patches

2011-06-24 Thread Jerome Glisse
On Fri, Jun 24, 2011 at 12:29 PM, Vadim Girlin  wrote:
> #1 fixes slots order for x & y writes in the LIT implementation.
> Without this patch "fp-lit-mask" piglit test fails after patch 3. It seems
> wrong order causes wrong PV.* values for the next instruction.
>
> #2 reduces unneeded calls to r600_spi_update.
>
> #3 implements color clamping in shaders by adding "MOV_SAT R,R"
> instructions for each color output before export. Shaders are rebuilt when
> clamping state changes.
>
> Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440
>
> There are no regressions with r600.tests on evergreen with these patches.
>
>  r600g: LIT: fix x&y slots order
>  r600g: optimize spi update
>  r600g: implement fragment and vertex color clamp
>
>  src/gallium/drivers/r600/evergreen_state.c   |    2 +
>  src/gallium/drivers/r600/r600_pipe.c         |    2 +-
>  src/gallium/drivers/r600/r600_pipe.h         |    8 +++-
>  src/gallium/drivers/r600/r600_shader.c       |   74 
> --
>  src/gallium/drivers/r600/r600_shader.h       |    1 +
>  src/gallium/drivers/r600/r600_state.c        |    2 +
>  src/gallium/drivers/r600/r600_state_common.c |   40 --
>  7 files changed, 106 insertions(+), 23 deletions(-)
>

Pushed the series thanks

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] r600g: implement fragment and vertex color clamp

2011-06-24 Thread Jerome Glisse
On Fri, Jun 24, 2011 at 12:29 PM, Vadim Girlin  wrote:
> Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440
>
> Signed-off-by: Vadim Girlin 

As discussed previously, there is better to handle this. I think best
solution is to always add the instruction and to conditionally execute
them thanks to the boolean constant. If this reveal to have a too big
impact on shader, other solution i see is adding a cf block with those
instructions and to enable or disable that block (cf_nop) and reupload
shader that would avoid a rebuild.

But as a mean time solution i think this patch is ok

Cheers,
Jerome

> ---
>  src/gallium/drivers/r600/evergreen_state.c   |    2 +
>  src/gallium/drivers/r600/r600_pipe.c         |    2 +-
>  src/gallium/drivers/r600/r600_pipe.h         |    7 +++-
>  src/gallium/drivers/r600/r600_shader.c       |   52 
> +++---
>  src/gallium/drivers/r600/r600_shader.h       |    1 +
>  src/gallium/drivers/r600/r600_state.c        |    2 +
>  src/gallium/drivers/r600/r600_state_common.c |   30 ++-
>  7 files changed, 87 insertions(+), 9 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index f86e4d4..dfe7896 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -256,6 +256,8 @@ static void *evergreen_create_rs_state(struct 
> pipe_context *ctx,
>        }
>
>        rstate = &rs->rstate;
> +       rs->clamp_vertex_color = state->clamp_vertex_color;
> +       rs->clamp_fragment_color = state->clamp_fragment_color;
>        rs->flatshade = state->flatshade;
>        rs->sprite_coord_enable = state->sprite_coord_enable;
>
> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
> b/src/gallium/drivers/r600/r600_pipe.c
> index 38801d6..12599bf 100644
> --- a/src/gallium/drivers/r600/r600_pipe.c
> +++ b/src/gallium/drivers/r600/r600_pipe.c
> @@ -377,6 +377,7 @@ static int r600_get_param(struct pipe_screen* pscreen, 
> enum pipe_cap param)
>        case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER:
>        case PIPE_CAP_SM3:
>        case PIPE_CAP_SEAMLESS_CUBE_MAP:
> +       case PIPE_CAP_FRAGMENT_COLOR_CLAMP_CONTROL:
>                return 1;
>
>        /* Supported except the original R600. */
> @@ -392,7 +393,6 @@ static int r600_get_param(struct pipe_screen* pscreen, 
> enum pipe_cap param)
>        /* Unsupported features. */
>        case PIPE_CAP_STREAM_OUTPUT:
>        case PIPE_CAP_PRIMITIVE_RESTART:
> -       case PIPE_CAP_FRAGMENT_COLOR_CLAMP_CONTROL:
>        case PIPE_CAP_TGSI_INSTANCEID:
>        case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
>        case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER:
> diff --git a/src/gallium/drivers/r600/r600_pipe.h 
> b/src/gallium/drivers/r600/r600_pipe.h
> index 63ddd39..dc9aad0 100644
> --- a/src/gallium/drivers/r600/r600_pipe.h
> +++ b/src/gallium/drivers/r600/r600_pipe.h
> @@ -88,6 +88,8 @@ struct r600_pipe_sampler_view {
>
>  struct r600_pipe_rasterizer {
>        struct r600_pipe_state          rstate;
> +       boolean                         clamp_vertex_color;
> +       boolean                         clamp_fragment_color;
>        boolean                         flatshade;
>        unsigned                        sprite_coord_enable;
>        float                           offset_units;
> @@ -125,6 +127,7 @@ struct r600_pipe_shader {
>        struct r600_bo                  *bo;
>        struct r600_bo                  *bo_fetch;
>        struct r600_vertex_element      vertex_elements;
> +       struct tgsi_token               *tokens;
>  };
>
>  struct r600_pipe_sampler_state {
> @@ -202,6 +205,8 @@ struct r600_pipe_context {
>        struct pipe_query               *saved_render_cond;
>        unsigned                        saved_render_cond_mode;
>        /* shader information */
> +       boolean                         clamp_vertex_color;
> +       boolean                         clamp_fragment_color;
>        boolean                         spi_dirty;
>        unsigned                        sprite_coord_enable;
>        boolean                         flatshade;
> @@ -265,7 +270,7 @@ void r600_init_query_functions(struct r600_pipe_context 
> *rctx);
>  void r600_init_context_resource_functions(struct r600_pipe_context *r600);
>
>  /* r600_shader.c */
> -int r600_pipe_shader_create(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader, const struct tgsi_token *tokens);
> +int r600_pipe_shader_create(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader);
>  void r600_pipe_shader_destroy(struct pipe_context *ctx, struct 
> r600_pipe_shader *shader);
>  int r600_find_vs_semantic_index(struct r600_shader *vs,
>                                struct r600_shader *ps, int id);
> diff --git a/src/gallium/drivers/r600/r600_shader.c 
> b/src/gallium/drivers/r600/r600_shader.c
> index 904cc69..2e5d4a6 100644
> --- a/src/gallium/drivers/r600/r600_shader.c
> +++ 

Re: [Mesa-dev] [PATCH] linker: Reject shaders that use too many varyings

2011-06-23 Thread Jerome Glisse
On Thu, Jun 23, 2011 at 10:38 AM, Roland Scheidegger  wrote:
> Am 23.06.2011 16:09, schrieb Jerome Glisse:
>> On Wed, Jun 22, 2011 at 10:49 PM, Alex Deucher  wrote:
>>> On Wed, Jun 22, 2011 at 10:12 PM, Roland Scheidegger  
>>> wrote:
>>>> Am 21.06.2011 20:59, schrieb Sven Arvidsson:
>>>>> This change broke a whole lot of stuff on r600g, for example Unigine
>>>>> Heaven:
>>>>>
>>>>>       shader uses too many varying components (36 > 32)
>>>>
>>>> It looks like the r600g driver claims to only support 10 varyings, which
>>>> the state tracker reduces to 8 (as it subtracts the supposedly included
>>>> color varyings).
>>>> At first sight I can't quite see why it's limited to 10, all r600 chips
>>>> should be able to handle 32 (dx10 requirement) but of course the driver
>>>> might not (mesa itself is limited to 16 it seems). If it worked just
>>>> fine before that suggests it indeed works just fine with more...
>>>> Someone more familiar with the driver should be able to tell if it's
>>>> safe to increase the limit to 32 (the state tracker will cap it to 16).
>>>
>>> The hardware definitely supports 32.  I'm not sure why it's currently
>>> set to 10; I don't see any limitations in the code off hand.
>>>
>>> Alex
>>
>> IIRC it's just cut & paste from r300g it can be safely bump
>
> Ok Marek bumped it to 34. That seems to be lying too I don't think it
> could handle 32 generic inputs and 2 colors. But there's no way to
> really express that right now.
>
> Roland
>

Also iirc r6xx/r7xx needs special code for handling varying over 16,
can't remember if we had proper code for that.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] linker: Reject shaders that use too many varyings

2011-06-23 Thread Jerome Glisse
On Wed, Jun 22, 2011 at 10:49 PM, Alex Deucher  wrote:
> On Wed, Jun 22, 2011 at 10:12 PM, Roland Scheidegger  
> wrote:
>> Am 21.06.2011 20:59, schrieb Sven Arvidsson:
>>> This change broke a whole lot of stuff on r600g, for example Unigine
>>> Heaven:
>>>
>>>       shader uses too many varying components (36 > 32)
>>
>> It looks like the r600g driver claims to only support 10 varyings, which
>> the state tracker reduces to 8 (as it subtracts the supposedly included
>> color varyings).
>> At first sight I can't quite see why it's limited to 10, all r600 chips
>> should be able to handle 32 (dx10 requirement) but of course the driver
>> might not (mesa itself is limited to 16 it seems). If it worked just
>> fine before that suggests it indeed works just fine with more...
>> Someone more familiar with the driver should be able to tell if it's
>> safe to increase the limit to 32 (the state tracker will cap it to 16).
>
> The hardware definitely supports 32.  I'm not sure why it's currently
> set to 10; I don't see any limitations in the code off hand.
>
> Alex

IIRC it's just cut & paste from r300g it can be safely bump

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Status of the GLSL->TGSI translator

2011-06-16 Thread Jerome Glisse
On Thu, Jun 16, 2011 at 10:08 AM, Brian Paul  wrote:
> On 06/15/2011 03:38 PM, Bryan Cain wrote:
>>
>> My work on the GLSL IR to TGSI translator I announced on the list this
>> April is now at the point where I think it is ready to be merged into
>> Mesa.  It is stable and doesn't regress any piglit tests on softpipe or
>> nv50.
>>
>> It adds native integer support as required by GLSL 1.30, although it is
>> currently disabled for all drivers since GLSL 1.30 support is not
>> complete yet and most Gallium drivers haven't implemented the TGSI
>> integer opcodes.  (This would be a good time for Gallium driver
>> developers to add support for TGSI's integer opcodes, which are
>> currently only implemented in softpipe.)
>>
>> Developing this necessitated significant changes elsewhere in Mesa, and
>> some small changes in Gallium.  This means that some of the commits in
>> my branch probably need to be reviewed by the developers of those
>> components.
>>
>> If I had commit access to Mesa, I would create a branch for this work in
>> the main Mesa repository.  But since I am still waiting on my
>> freedesktop.org account to be created, I have pushed the latest version
>> to the "glsl-to-tgsi" branch of my personal Mesa repository on GitHub:
>>
>> Git clone URL: git://github.com/Plombo/mesa.git
>> Web interface for viewing commits:
>> https://github.com/Plombo/mesa/commits/glsl-to-tgsi
>>
>> Hopefully my freedesktop.org account will be created soon (I have
>> already had my account request approved), so that I can push this to a
>> branch in the central repository.
>
> Looks like nice work, Bryan.
>
> Just a few minor questions/comments for now:
>
> 1. The st_fragment/vertex/geometry_program structs now have a glsl_to_tgsi
> field.  I did a grep, but I couldn't find where that field is assigned.  Can
> you clue me in?
>
> 2. The above mentioned program structs contains an old Mesa instruction
> program AND/OR(?) a GLSL IR.  Do both types of representations co-exist
> sometimes?  Perhaps you could update the comments on those structs to
> explain that.
>
> 3. Kind of a follow-on: for glDrawPixels and glBitmap we take the original
> program code (in Mesa form) and prepend extra instructions for fetching the
> fragment color or doing the fragment kill.  Do we always have the Mesa
> instructions for this?  It seems we don't normally want to generate Mesa
> instructions all the time but we still need them sometimes.

I must be missing something but why do we need to take the original program for
those ?

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >