Re: [Mesa-dev] [PATCH] glsl: propagate max_array_access through function calls

2013-08-28 Thread Dominik Behr
Ah it is by design. Sentinels are special nodes with no payload.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Marek Olšák
Yeah, st/mesa also compiles shaders on the first use, so we've got 3
places to fix: Wine, st/mesa, the driver.

Marek

On Wed, Aug 28, 2013 at 2:07 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 On 08/28/2013 02:59 AM, Marek Olšák wrote:

 First, you won't really see any significant continual difference in
 frame rate no matter how many shader variants you have unless you are
 very CPU-bound. The problem is shader compilation on the first use,
 that's where you get a big hiccup. Try Skyrim for example: You have to
 first look around and see every object that's around you and get
 unpleasant stuttering before you can actually go on and play the game.
 Yes, this also Wine's fault that it compiles shaders on the first use
 too, but we don't have to be as bad as Wine, do we? Valve also
 reported shader recompilations on the first use being a serious issue
 with open source drivers.


 I perfectly understand that deferred compilation is exactly the problem that
 makes the games freeze due to shader compilation on first use when something
 new appears on the screen, but I don't think we can solve this problem in
 the *driver* by trying to compile early, because AFAICS currently the
 shaders are passed to the driver too late anyway, and this happens not only
 with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump
 R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at
 the same time, what I see is that most of GL dumps happen while Heaven shows
 splash screen with loading progress, but most of the driver's dumps appear
 on the first frame and few more times during benchmark. It looks like
 compilation is deferred somewhere in the stack before the driver, or am I
 missing something?

 Vadim




 Marek

 On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com
 wrote:

 On 08/28/2013 12:43 AM, Marek Olšák wrote:


 Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
 with a Mesa driver that likes to compile shader variants on first use?
 It's HORRIBLE.



 I don't think that shader variants are bad, but it's definitely bad when
 we
 are compiling variants that are never used. Currently glxgears compiles
 18
 ps/vs shaders. In my branch with initial GS support [1] I switched
 handling
 of the shaders to deferred compilation, that is, shaders are compiled
 only
 before the actual draw. I found later that it's not really required for
 GS,
 but IIRC this change results in only 5 shaders being compiled for
 glxgears
 instead of 18. It seems most of the useless variants are results of state
 changes between creation of the shader state (initial compilation) and
 actual draw call.

 I had some concerns about increased overhead with those changes, and it's
 actually noticeable with drawoverhead demo, but I didn't see any
 regressions
 with a few real apps that I tested, e.g. glxgears even showed slightly
 better performance with these changes. Probably I also implemented it in
 a
 not very optimal way (I was mostly concentrated on GS support) and the
 overhead can be reduced.

 One more thing is duplicate shaders, I've analyzed shader dumps from
 Unigine
 Heaven 3.0 some time ago and found that from about 320 compiled shaders,
 only about 180 (50%) were unique, others were duplicates (detected by
 comparing the bytecode dumps for them in an automated way), maybe they
 had
 different shader keys (which still resulted in the same bytecode), but I
 suspect duplicate pipe shaders were also involved. Unfortunately I didn't
 have a time to investigate it more thoroughly since then.

 So my point is that we don't really need to eliminate shader variants,
 first
 we need to eliminate compilation of unused variants and duplicate
 shaders.
 Also we might want to consider offloading of the compilation to separate
 thread(s) and caching of shader binaries between runs.

 Vadim

   [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders



 What the patch does is probably the right solution. At least
 alpha-test state changes don't cause shader recompilation and
 re-binding, which also negatively affects performance. Ideally we
 shouldn't depend on the framebuffer state at all, but we need to
 emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
 should always be fine with key.nr_cbufs forced to 8 for any shader
 without that property. I expect app developers to do the right thing
 and not write outputs they don't need.

 Marek

 On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com
 wrote:


 Not that I'm qualified to review r600 code, but couldn't you create
 different shader variants depending on whether you need alpha test? At
 least I would assume shader exports aren't free.

 Roland

 Am 27.08.2013 19:56, schrieb Vadim Girlin:


 We need to export at least one color if the shader writes it,
 even when nr_cbufs==0.

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
 ---

 Tested on evergreen with multiple combinations of backends 

Re: [Mesa-dev] [PATCH] glx: make the interval of LIBGL_SHOW_FPS adjustable

2013-08-28 Thread Marek Olšák
Reviewed-by: Marek Olšák marek.ol...@amd.com

Marek

On Wed, Aug 28, 2013 at 6:14 AM, Chia-I Wu olva...@gmail.com wrote:
 LIBGL_SHOW_FPS=1 makes GLX print FPS every second while other values do
 nothing.  Extend it so that LIBGL_SHOW_FPS=N will print the FPS every N
 seconds.
 ---
  src/glx/dri2_glx.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

 diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
 index c54edac..54fc21c 100644
 --- a/src/glx/dri2_glx.c
 +++ b/src/glx/dri2_glx.c
 @@ -95,7 +95,7 @@ struct dri2_screen {
 void *driver;
 int fd;

 -   Bool show_fps;
 +   int show_fps_interval;
  };

  struct dri2_context
 @@ -764,6 +764,8 @@ unsigned dri2GetSwapEventType(Display* dpy, XID drawable)

  static void show_fps(struct dri2_drawable *draw)
  {
 +   const int interval =
 +  ((struct dri2_screen *) draw-base.psc)-show_fps_interval;
 struct timeval tv;
 uint64_t current_time;

 @@ -772,7 +774,7 @@ static void show_fps(struct dri2_drawable *draw)

 draw-frames++;

 -   if (draw-previous_time + 100 = current_time) {
 +   if (draw-previous_time + interval * 100 = current_time) {
if (draw-previous_time) {
   fprintf(stderr, libGL: FPS = %.1f\n,
   ((uint64_t)draw-frames * 100) /
 @@ -859,7 +861,7 @@ dri2SwapBuffers(__GLXDRIdrawable *pdraw, int64_t 
 target_msc, int64_t divisor,
  target_msc, divisor, remainder);
  }

 -if (psc-show_fps) {
 +if (psc-show_fps_interval) {
 show_fps(priv);
  }

 @@ -1283,7 +1285,9 @@ dri2CreateScreen(int screen, struct glx_display * priv)
 free(deviceName);

 tmp = getenv(LIBGL_SHOW_FPS);
 -   psc-show_fps = tmp  strcmp(tmp, 1) == 0;
 +   psc-show_fps_interval = (tmp) ? atoi(tmp) : 0;
 +   if (psc-show_fps_interval  0)
 +  psc-show_fps_interval = 0;

 return psc-base;

 --
 1.8.4.rc3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] glsl: memory leak in parsing extension statements?

2013-08-28 Thread Aras Pranckevicius
Hi,

Looking at the code, is there a potential memory leak in GLSL parser wrt
extension statements?

glsl_lexer.ll has:
PP[_a-zA-Z][_a-zA-Z0-9]* {
  yylval-identifier = strdup(yytext);
  return IDENTIFIER;
}

i.e. calls strdup on the token (there's one other place that calls strdup;
whereas most regular identifiers use ralloc_strdup for easier memory
management.


glsl_parser.yy has this:

  extension_statement:
   EXTENSION any_identifier COLON any_identifier EOL
   {
  if (!_mesa_glsl_process_extension($2,  @2, $4,  @4, state)) {
 YYERROR;
  }
   }
   ;


which looks like it processes the extension identifiers, but never
frees the memory.




-- 

Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] obtain def-use chain in glsl s-expression

2013-08-28 Thread Liu Xin

Hi, Mesa community,

I am not familiar with S-expression or other forms of lisp languages. I 
am working on GLSL IR transformation.  for example, i want to change a 
variable to a array of same type.


By now , i can find the definition of a variable. How can i update all 
uses of this variable in S-expression? I think all uses of this variable 
are in the form of ir_dereference_variable. The difficulty is how to 
collect d-u chain using hierarchical visitor.  I think some optimizer 
authors  must have the same problem. Could you give me a pass which 
solved my problem?  so I can take reference.


thanks,
--lx

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Christian König
Well, for this discussion let's just assume that we fixed the delay in 
the upper layers of the stack and the driver sees the shader code as 
soon as the application (if I understood it correctly Vadim has just 
volunteered for the job).


Also let's assume that shaders are small and having allot of shader 
variants around after they are compiled isn't bad.


In this case the probably best solution is to compile early and try to 
make the shaders as state invariant as possible, e.g. don't do 
optimizations like getting ride of extra exports for case where we don't 
need the alpha test or if it's just a dependency on a boolean then have 
both variants covered by the bytecode and use a bit constant to choose 
between the two etc...


As a second step the driver should create a optimized version of the 
shader in a background thread when we know all the state that is/was 
active when the shader is used.


Of course you need a bit of heuristic for this, cause sometimes it is 
better to switch between shader variants and other times it is better to 
have one variant covering all the different states and just use bit 
constants to choose between them.


Just some thoughts on this topic,
Christian.

PS: My mail server is once more driving me nuts, please ignore the extra 
copy if you get this mail twice.


Am 28.08.2013 02:07, schrieb Vadim Girlin:

On 08/28/2013 02:59 AM, Marek Olšák wrote:

First, you won't really see any significant continual difference in
frame rate no matter how many shader variants you have unless you are
very CPU-bound. The problem is shader compilation on the first use,
that's where you get a big hiccup. Try Skyrim for example: You have to
first look around and see every object that's around you and get
unpleasant stuttering before you can actually go on and play the game.
Yes, this also Wine's fault that it compiles shaders on the first use
too, but we don't have to be as bad as Wine, do we? Valve also
reported shader recompilations on the first use being a serious issue
with open source drivers.


I perfectly understand that deferred compilation is exactly the 
problem that makes the games freeze due to shader compilation on first 
use when something new appears on the screen, but I don't think we can 
solve this problem in the *driver* by trying to compile early, because 
AFAICS currently the shaders are passed to the driver too late anyway, 
and this happens not only with wine. E.g. when I run Heaven in a 
window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see 
Heaven's window and console output at the same time, what I see is 
that most of GL dumps happen while Heaven shows splash screen with 
loading progress, but most of the driver's dumps appear on the first 
frame and few more times during benchmark. It looks like compilation 
is deferred somewhere in the stack before the driver, or am I missing 
something?


Vadim




Marek

On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin 
vadimgir...@gmail.com wrote:

On 08/28/2013 12:43 AM, Marek Olšák wrote:


Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.



I don't think that shader variants are bad, but it's definitely bad 
when we
are compiling variants that are never used. Currently glxgears 
compiles 18
ps/vs shaders. In my branch with initial GS support [1] I switched 
handling
of the shaders to deferred compilation, that is, shaders are 
compiled only
before the actual draw. I found later that it's not really required 
for GS,
but IIRC this change results in only 5 shaders being compiled for 
glxgears
instead of 18. It seems most of the useless variants are results of 
state

changes between creation of the shader state (initial compilation) and
actual draw call.

I had some concerns about increased overhead with those changes, and 
it's
actually noticeable with drawoverhead demo, but I didn't see any 
regressions

with a few real apps that I tested, e.g. glxgears even showed slightly
better performance with these changes. Probably I also implemented 
it in a

not very optimal way (I was mostly concentrated on GS support) and the
overhead can be reduced.

One more thing is duplicate shaders, I've analyzed shader dumps from 
Unigine
Heaven 3.0 some time ago and found that from about 320 compiled 
shaders,

only about 180 (50%) were unique, others were duplicates (detected by
comparing the bytecode dumps for them in an automated way), maybe 
they had
different shader keys (which still resulted in the same bytecode), 
but I
suspect duplicate pipe shaders were also involved. Unfortunately I 
didn't

have a time to investigate it more thoroughly since then.

So my point is that we don't really need to eliminate shader 
variants, first
we need to eliminate compilation of unused variants and duplicate 
shaders.
Also we might want to consider offloading of the compilation to 
separate

thread(s) and caching of shader binaries 

Re: [Mesa-dev] tgsi dump and parsing

2013-08-28 Thread Jose Fonseca
- Original Message -
 On Wed, Aug 28, 2013 at 3:32 PM, Dave Airlie airl...@gmail.com wrote:
  IMM[0] FLT32 { 0x, 0x, 0x, 0x }  # 1.0, 3.0, 2.0, 4.0
 
  If you use %.9g instead of %.4f then floating point numbers will be
  preserved without loss of precision.
 
 
  I see a -nan in my tests that doesn't get reparsed so I expect hex is
  still better.
 
 
  oops to list as well this time, sorry.
 
 Just in case you are wondering its
 tests/shaders/glsl-const-builtin-inversesqrt.shader_test and
 tests/shaders/glsl-const-builtin-normalize.shader_test
 that throw up the -nan in the dumps.

We could teach tgsi_parse to understand `nan` too.

We could also have a new tgsi_compare() function that, instead of doing a bare 
memcmp, it would scan the tokens, and account for the ambiguity of NaNs in IMM 
FLT32.


I just feel a bit awkward that we have `IMM[x] INT32 {...}` and `IMM[x] FLT32 
{...}` but end up dumping floats as integers. The whole point of INT32/FLT32 is 
to allow humans to read the numbers, because it is just syntactic sugar: by 
definition a shader must behave precisely the same way regardless the IMMS have 
INT32 or FLT32, as in TGSI the type is not defined by the arguments but rather 
the opcodes.

Also, editing IMM FLT32 by hand will be much harder -- you'll need to convert 
floats their integer repreentation, as the floats in the comment will likely be 
ignored..


To me, it seems that would be trading off a concrete advantage -- the usability 
of the TGSI textual representation --, for this much more dubious advantage of 
perfect bit-by-bit reversibility of TGSI binary-text shaders.


That said, I don't feel strongly either way. 


Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2 03/15] mesa: Add a clone function to mesa hash

2013-08-28 Thread Brian Paul

On 08/27/2013 08:39 PM, Timothy Arceri wrote:

V2: const qualify table parameter

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
  src/mesa/main/hash.c |   28 
  src/mesa/main/hash.h |3 +++
  2 files changed, 31 insertions(+)



Reviewed-by: Brian Paul bri...@vmware.com

Do you need someone to commit/push your patches for you?

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2 03/15] mesa: Add a clone function to mesa hash

2013-08-28 Thread Timothy Arceri
- Original Message -

From: Brian Paul bri...@vmware.com

On 08/27/2013 08:39 PM, Timothy Arceri wrote:
 V2: const qualify table parameter

 Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
 ---
   src/mesa/main/hash.c |   28 
   src/mesa/main/hash.h |    3 +++
   2 files changed, 31 insertions(+)


Reviewed-by: Brian Paul bri...@vmware.com

Do you need someone to commit/push your patches for you?

-Brian

Hi Brian,

Yes I need someone to commit for me.

Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Henri Verbeet
On 28 August 2013 12:17, Marek Olšák mar...@gmail.com wrote:
 Yeah, st/mesa also compiles shaders on the first use, so we've got 3
 places to fix: Wine, st/mesa, the driver.

For what it's worth, while Wine definitely has some room for
improvement in this regard, in some cases we don't get the shaders any
earlier from the application either.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] gallivm: refactor num_lods handling

2013-08-28 Thread sroland
From: Roland Scheidegger srol...@vmware.com

This is just preparation for per-pixel (or per-quad in case of multiple quads)
min/mag filter since some assumptions about number of miplevels being equal
to number of lods no longer holds true.
This change does not change behavior yet (though theoretically when forcing
per-element path it might be slower with different min/mag filter since the
code will respect this setting even when there's no mip maps now in this case,
so some lod calcs will be done per-element just ultimately still the same
filter used for all pixels).
---
 src/gallium/auxiliary/gallivm/lp_bld_sample.c |  126 +-
 src/gallium/auxiliary/gallivm/lp_bld_sample.h |   13 +-
 src/gallium/auxiliary/gallivm/lp_bld_sample_aos.c |   20 +--
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  141 -
 4 files changed, 169 insertions(+), 131 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index 89d7249..e1cfd78 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -217,7 +217,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
struct lp_build_context *float_size_bld = bld-float_size_in_bld;
struct lp_build_context *float_bld = bld-float_bld;
struct lp_build_context *coord_bld = bld-coord_bld;
-   struct lp_build_context *levelf_bld = bld-levelf_bld;
+   struct lp_build_context *rho_bld = bld-lodf_bld;
const unsigned dims = bld-dims;
LLVMValueRef ddx_ddy[2];
LLVMBuilderRef builder = bld-gallivm-builder;
@@ -231,7 +231,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
LLVMValueRef first_level, first_level_vec;
unsigned length = coord_bld-type.length;
unsigned num_quads = length / 4;
-   boolean rho_per_quad = levelf_bld-type.length != length;
+   boolean rho_per_quad = rho_bld-type.length != length;
unsigned i;
LLVMValueRef i32undef = 
LLVMGetUndef(LLVMInt32TypeInContext(gallivm-context));
LLVMValueRef rho_xvec, rho_yvec;
@@ -259,18 +259,18 @@ lp_build_rho(struct lp_build_sample_context *bld,
*/
   if (rho_per_quad) {
  rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type,
- levelf_bld-type, cube_rho, 0);
+ rho_bld-type, cube_rho, 0);
   }
   else {
  rho = lp_build_swizzle_scalar_aos(coord_bld, cube_rho, 0, 4);
   }
   if (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) {
- rho = lp_build_sqrt(levelf_bld, rho);
+ rho = lp_build_sqrt(rho_bld, rho);
   }
   /* Could optimize this for single quad just skip the broadcast */
   cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type,
-levelf_bld-type, float_size, 
index0);
-  rho = lp_build_mul(levelf_bld, cubesize, rho);
+rho_bld-type, float_size, index0);
+  rho = lp_build_mul(rho_bld, cubesize, rho);
}
else if (derivs  !(bld-static_texture_state-target == 
PIPE_TEXTURE_CUBE)) {
   LLVMValueRef ddmax[3], ddx[3], ddy[3];
@@ -311,9 +311,9 @@ lp_build_rho(struct lp_build_sample_context *bld,
  * otherwise would also need different code to per-pixel lod case.
  */
 rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type,
-levelf_bld-type, rho, 0);
+rho_bld-type, rho, 0);
  }
- rho = lp_build_sqrt(levelf_bld, rho);
+ rho = lp_build_sqrt(rho_bld, rho);
 
   }
   else {
@@ -329,7 +329,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
  * rho_vec contains per-pixel rho, convert to scalar per quad.
  */
 rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type,
-levelf_bld-type, rho, 0);
+rho_bld-type, rho, 0);
  }
   }
}
@@ -404,7 +404,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
 
  if (rho_per_quad) {
 rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type,
-levelf_bld-type, rho, 0);
+rho_bld-type, rho, 0);
  }
  else {
 /*
@@ -416,7 +416,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
  */
 rho = lp_build_swizzle_scalar_aos(coord_bld, rho, 0, 4);
  }
- rho = lp_build_sqrt(levelf_bld, rho);
+ rho = lp_build_sqrt(rho_bld, rho);
   }
   else {
  ddx_ddy[0] = lp_build_abs(coord_bld, ddx_ddy[0]);
@@ -497,7 +497,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
 }
 if (rho_per_quad) {
rho = 

[Mesa-dev] [PATCH 2/2] gallivm: don't calculate square root of rho if we use accurate rho method

2013-08-28 Thread sroland
From: Roland Scheidegger srol...@vmware.com

While a sqrt here and there shouldn't hurt much (depending on the cpu) it is
possible to completely omit it since rho is only used for calculating lod and
there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for
calculating lod this means we get a simple mul instead of sqrt (in case of
nearest mip filter in fact we don't need to replace the sqrt with something
else at all), only in some not very useful path this doesn't work (combined
brilinear calculation of int level and fractional lod, accurate rho calc but
brilinear filtering seems odd).
Apart from being faster as an added bonus this should increase our crappy
fractional accuracy of lod, since fast_log2 is only good for ~3bits and this
should increase accuracy by one bit (though not used if dimension is just one
as we'd need an extra mul there as we never had the squared rho in the first
place).
---
 src/gallium/auxiliary/gallivm/lp_bld_arit.c   |   20 +--
 src/gallium/auxiliary/gallivm/lp_bld_arit.h   |3 +-
 src/gallium/auxiliary/gallivm/lp_bld_sample.c |   76 +
 3 files changed, 56 insertions(+), 43 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
index 09107ff..c295e22 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
@@ -3381,7 +3381,8 @@ lp_build_fast_log2(struct lp_build_context *bld,
  */
 LLVMValueRef
 lp_build_ilog2(struct lp_build_context *bld,
-   LLVMValueRef x)
+   LLVMValueRef x,
+   boolean x_is_squared)
 {
LLVMBuilderRef builder = bld-gallivm-builder;
LLVMValueRef sqrt2 = lp_build_const_vec(bld-gallivm, bld-type, M_SQRT2);
@@ -3391,11 +3392,20 @@ lp_build_ilog2(struct lp_build_context *bld,
 
assert(lp_check_value(bld-type, x));
 
-   /* x * 2^(0.5)   i.e., add 0.5 to the log2(x) */
-   x = LLVMBuildFMul(builder, x, sqrt2, );
+   if (x_is_squared) {
+  struct lp_type i_type = lp_int_type(bld-type);
+  LLVMValueRef one = lp_build_const_int_vec(bld-gallivm, i_type, 1);
+  /* ipart = log2(x) + 0.5 = 0.5*(log2(x^2) + 1.0) */
+  ipart = lp_build_extract_exponent(bld, x, 1);
+  ipart = LLVMBuildAShr(builder, ipart, one, );
+   }
 
-   /* ipart = floor(log2(x) + 0.5)  */
-   ipart = lp_build_extract_exponent(bld, x, 0);
+   else {
+  /* x * 2^(0.5)   i.e., add 0.5 to the log2(x) */
+  x = LLVMBuildFMul(builder, x, sqrt2, );
+  /* ipart = floor(log2(x) + 0.5)  */
+  ipart = lp_build_extract_exponent(bld, x, 0);
+   }
 
return ipart;
 }
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.h 
b/src/gallium/auxiliary/gallivm/lp_bld_arit.h
index d98025e..931175c 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.h
@@ -323,7 +323,8 @@ lp_build_fast_log2(struct lp_build_context *bld,
 
 LLVMValueRef
 lp_build_ilog2(struct lp_build_context *bld,
-   LLVMValueRef x);
+   LLVMValueRef x,
+   boolean x_is_squared);
 
 void
 lp_build_exp2_approx(struct lp_build_context *bld,
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index e1cfd78..c34833a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -232,6 +232,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
unsigned length = coord_bld-type.length;
unsigned num_quads = length / 4;
boolean rho_per_quad = rho_bld-type.length != length;
+   boolean no_rho_opt = (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX)  (dims 
 1);
unsigned i;
LLVMValueRef i32undef = 
LLVMGetUndef(LLVMInt32TypeInContext(gallivm-context));
LLVMValueRef rho_xvec, rho_yvec;
@@ -264,12 +265,13 @@ lp_build_rho(struct lp_build_sample_context *bld,
   else {
  rho = lp_build_swizzle_scalar_aos(coord_bld, cube_rho, 0, 4);
   }
-  if (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) {
- rho = lp_build_sqrt(rho_bld, rho);
-  }
   /* Could optimize this for single quad just skip the broadcast */
   cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type,
 rho_bld-type, float_size, index0);
+  if (no_rho_opt) {
+ /* skipping sqrt hence returning rho squared */
+ cubesize = lp_build_mul(rho_bld, cubesize, cubesize);
+  }
   rho = lp_build_mul(rho_bld, cubesize, rho);
}
else if (derivs  !(bld-static_texture_state-target == 
PIPE_TEXTURE_CUBE)) {
@@ -281,7 +283,11 @@ lp_build_rho(struct lp_build_sample_context *bld,
  floatdim = lp_build_extract_broadcast(gallivm, 
bld-float_size_in_type,
coord_bld-type, float_size, 
indexi);
 
- if ((gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX)  (dims  1)) {
+ /*
+  * note that 

Re: [Mesa-dev] obtain def-use chain in glsl s-expression

2013-08-28 Thread Kenneth Graunke

On 08/27/2013 11:34 PM, Liu Xin wrote:

Hi, Mesa community,

I am not familiar with S-expression or other forms of lisp languages.


That's OK - the IR has no resemblance to actual Scheme or Lisp 
programming.  We just print and read the () syntax because it's simple.



I am working on GLSL IR transformation.  for example, i want to change a
variable to a array of same type.

By now , i can find the definition of a variable. How can i update all
uses of this variable in S-expression? I think all uses of this variable
are in the form of ir_dereference_variable.


That's right.  ir_dereference_variable is an actual use of a variable.


The difficulty is how to  collect d-u chain using hierarchical visitor.


Yeah...sadly, the compiler doesn't have UD chains.  Ian was working on 
those a few years back, but the code never landed.



I think some optimizer
authors  must have the same problem. Could you give me a pass which
solved my problem?  so I can take reference.


You might look at opt_array_splitting.  It uses two visitors:

First, ir_array_reference_visitor walks over the IR and finds variables 
it might want to transform, storing those in a hash table.  As a second 
pass, ir_array_splitting_visitor walks over the IR and actually 
transforms things.


ir_array_splitting_visitor is also an ir_rvalue visitor, which is useful 
for transforming expression trees.  You get passed an ir_rvalue ** 
pointer, and can replace a whole subexpression tree with something else. 
 In your case, you'll probably find ir_dereference_variables and 
replace them with ir_dereference_arrays.  (In the printed IR, replace 
(var_ref foo) with (array_ref (var_ref new_foo_array) ...subscript...).)


Good luck!

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw: fix segfaults with aaline and aapoint stages disabled

2013-08-28 Thread Jose Fonseca


- Original Message -
 There are drivers not using these optional stages.
 
 Broken by a3ae5dc7dd5c2f8893f86a920247e690e550ebd4.
 
 Cc: mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/auxiliary/draw/draw_context.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_context.c
 b/src/gallium/auxiliary/draw/draw_context.c
 index d1fac0c..641dd82 100644
 --- a/src/gallium/auxiliary/draw/draw_context.c
 +++ b/src/gallium/auxiliary/draw/draw_context.c
 @@ -564,8 +564,10 @@ draw_prepare_shader_outputs(struct draw_context *draw)
 draw_remove_extra_vertex_attribs(draw);
 draw_prim_assembler_prepare_outputs(draw-ia);
 draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled);
 -   draw_aapoint_prepare_outputs(draw, draw-pipeline.aapoint);
 -   draw_aaline_prepare_outputs(draw, draw-pipeline.aaline);
 +   if (draw-pipeline.aapoint)
 +  draw_aapoint_prepare_outputs(draw, draw-pipeline.aapoint);
 +   if (draw-pipeline.aaline)
 +  draw_aaline_prepare_outputs(draw, draw-pipeline.aaline);
  }
  
  /**
 --
 1.8.1.2
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 

Reviewed-by: Jose Fonseca jfons...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Vendor-neutral OpenGL dispatching library

2013-08-28 Thread Brian Nguyen
Last September, Andy Ritger proposed updating the Linux OpenGL ABI to allow for
multiple vendors to co-exist within a single process and OpenGL applications to
dispatch commands to different vendors with per-context granularity. The current
proposal [1] calls for a vendor-neutral API library which acts as an
intermediate layer between the application and OpenGL vendor implementations
that manages this dispatching.

I have written a work-in-progress library based on this proposal which
implements this API library for GLX. This library leverages some code from
Mesa's glapi module to handle TLS and core OpenGL dispatching, as well as the
BSD-licensed uthash library [2] and the X.org Xserver's list.h [3]. The library
source can be found at this location:

http://github.com/NVIDIA/libglvnd

In this repository, the file README.md describes the library's code organization
and architecture as well as remaining open issues and implementation TODOs.
What do people think about this?  We are hoping to gather feedback to help
facilitate discussion of the implementation of the new ABI during XDC 2013.
Any concerns, suggestions, or other comments would be much appreciated.

Thanks,
Brian

[1] 
https://github.com/aritger/linux-opengl-abi-proposal/blob/master/linux-opengl-abi-proposal.txt
[2] http://troydhanson.github.io/uthash/
[3] 
http://cgit.freedesktop.org/xorg/xserver/tree/include/list.h?id=74469895e39fa38337f59edd64c4031ab9bb51d8





---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] tgsi dump and parsing

2013-08-28 Thread Jose Fonseca
Yes, if we change the representation, we should keep backwards compatability in 
tgsi text parsing 

Jose

- Original Message -
 There are some TGSI shaders parsed by tgsi_text_translate which
 declare floating-point immediates. Any incompatible change to the
 parser would break them.
 
 Marek
 
 On Wed, Aug 28, 2013 at 3:57 PM, Jose Fonseca jfons...@vmware.com wrote:
  - Original Message -
  On Wed, Aug 28, 2013 at 3:32 PM, Dave Airlie airl...@gmail.com wrote:
   IMM[0] FLT32 { 0x, 0x, 0x, 0x }  # 1.0, 3.0, 2.0, 4.0
  
   If you use %.9g instead of %.4f then floating point numbers will be
   preserved without loss of precision.
  
  
   I see a -nan in my tests that doesn't get reparsed so I expect hex is
   still better.
  
  
   oops to list as well this time, sorry.
 
  Just in case you are wondering its
  tests/shaders/glsl-const-builtin-inversesqrt.shader_test and
  tests/shaders/glsl-const-builtin-normalize.shader_test
  that throw up the -nan in the dumps.
 
  We could teach tgsi_parse to understand `nan` too.
 
  We could also have a new tgsi_compare() function that, instead of doing a
  bare memcmp, it would scan the tokens, and account for the ambiguity of
  NaNs in IMM FLT32.
 
 
  I just feel a bit awkward that we have `IMM[x] INT32 {...}` and `IMM[x]
  FLT32 {...}` but end up dumping floats as integers. The whole point of
  INT32/FLT32 is to allow humans to read the numbers, because it is just
  syntactic sugar: by definition a shader must behave precisely the same way
  regardless the IMMS have INT32 or FLT32, as in TGSI the type is not
  defined by the arguments but rather the opcodes.
 
  Also, editing IMM FLT32 by hand will be much harder -- you'll need to
  convert floats their integer repreentation, as the floats in the comment
  will likely be ignored..
 
 
  To me, it seems that would be trading off a concrete advantage -- the
  usability of the TGSI textual representation --, for this much more
  dubious advantage of perfect bit-by-bit reversibility of TGSI
  binary-text shaders.
 
 
  That said, I don't feel strongly either way.
 
 
  Jose
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] i965: Avoid flushing the batch for every blorp op.

2013-08-28 Thread Paul Berry
On 27 August 2013 15:21, Eric Anholt e...@anholt.net wrote:

 This brings over the batch-wrap-prevention and aperture space checking
 code from the normal brw_draw.c path, so that we don't need to flush the
 batch every time.

 There's a risk here if the intel_emit_post_sync_nonzero_flush() call isn't
 high enough up in the state emit sequences -- before, we implicitly had
 one at the batch flush before any state was emitted, so Mesa's workaround
 emits didn't really matter.

 Improves cairo-gl performance by 13.7733% +/- 1.74876% (n=30/32)
 Improves minecraft apitrace performance by 1.03183% +/- 0.482297% (n=90).
 Reduces low-resolution GLB 2.7 performance by 1.17553% +/- 0.432263% (n=88)
 Reduces Lightsmark performance by 3.70246% +/- 0.322432% (n=126)
 No statistically significant performance difference on unigine tropics
 (n=10)
 No statistically significant performance difference on openarena (n=755)

 The two apps that are hurt happen to include stalls on busy buffer
 objects, so I think this is an effect of missing out on an opportune
 flush.
 ---
  src/mesa/drivers/dri/i965/brw_blorp.cpp  | 50
 
  src/mesa/drivers/dri/i965/brw_blorp.h|  4 ---
  src/mesa/drivers/dri/i965/gen6_blorp.cpp | 12 
  src/mesa/drivers/dri/i965/gen7_blorp.cpp |  1 -
  4 files changed, 50 insertions(+), 17 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp
 b/src/mesa/drivers/dri/i965/brw_blorp.cpp
 index 1576ff2..c566d1d 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
 @@ -21,6 +21,7 @@
   * IN THE SOFTWARE.
   */

 +#include errno.h
  #include intel_batchbuffer.h
  #include intel_fbo.h

 @@ -191,6 +192,26 @@ intel_hiz_exec(struct brw_context *brw, struct
 intel_mipmap_tree *mt,
  void
  brw_blorp_exec(struct brw_context *brw, const brw_blorp_params *params)
  {
 +   struct gl_context *ctx = brw-ctx;
 +   uint32_t estimated_max_batch_usage = 1500;
 +   bool check_aperture_failed_once = false;
 +
 +   /* Flush the sampler and render caches.  We definitely need to flush
 the
 +* sampler cache so that we get updated contents from the render cache
 for
 +* the glBlitFramebuffer() source.  Also, we are sometimes warned in
 the
 +* docs to flush the cache between reinterpretations of the same
 surface
 +* data with different formats, which blorp does for stencil and depth
 +* data.
 +*/
 +   intel_batchbuffer_emit_mi_flush(brw);
 +
 +retry:
 +   intel_batchbuffer_require_space(brw, estimated_max_batch_usage, false);
 +   intel_batchbuffer_save_state(brw);
 +   drm_intel_bo *saved_bo = brw-batch.bo;
 +   uint32_t saved_used = brw-batch.used;
 +   uint32_t saved_state_batch_offset = brw-batch.state_batch_offset;
 +
 switch (brw-gen) {
 case 6:
gen6_blorp_exec(brw, params);
 @@ -204,6 +225,35 @@ brw_blorp_exec(struct brw_context *brw, const
 brw_blorp_params *params)
break;
 }


Would it be feasible to add an assertion here to verify that the amount of
batch space actually used by this blorp call is less than or equal to
estimated_max_batch_usage?  That would give me a lot of increased
confidence that the magic number 1500 is correct.

With the added assertion, the series is:

Reviewed-by: Paul Berry stereotype...@gmail.com


 +   /* Make sure we didn't wrap the batch unintentionally, and make sure we
 +* reserved enough space that a wrap will never happen.
 +*/
 +   assert(brw-batch.bo == saved_bo);
 +   assert((brw-batch.used - saved_used) * 4 +
 +  (saved_state_batch_offset - brw-batch.state_batch_offset) 
 +  estimated_max_batch_usage);
 +   /* Shut up compiler warnings on release build */
 +   (void)saved_bo;
 +   (void)saved_used;
 +   (void)saved_state_batch_offset;
 +
 +   /* Check if the blorp op we just did would make our batch likely to
 fail to
 +* map all the BOs into the GPU at batch exec time later.  If so,
 flush the
 +* batch and try again with nothing else in the batch.
 +*/
 +   if (dri_bufmgr_check_aperture_space(brw-batch.bo, 1)) {
 +  if (!check_aperture_failed_once) {
 + check_aperture_failed_once = true;
 + intel_batchbuffer_reset_to_saved(brw);
 + intel_batchbuffer_flush(brw);
 + goto retry;
 +  } else {
 + int ret = intel_batchbuffer_flush(brw);
 + WARN_ONCE(ret == -ENOSPC,
 +   i965: blorp emit exceeded available aperture
 space\n);
 +  }
 +   }
 +
 if (unlikely(brw-always_flush_batch))
intel_batchbuffer_flush(brw);

 diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
 b/src/mesa/drivers/dri/i965/brw_blorp.h
 index dceb4fc..e03e27f 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp.h
 +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
 @@ -370,10 +370,6 @@ void
  gen6_blorp_init(struct brw_context *brw);

  void
 -gen6_blorp_emit_batch_head(struct brw_context *brw,
 -   const brw_blorp_params *params);
 -
 

[Mesa-dev] [PATCH] radeonsi: Do not suspend timer queries

2013-08-28 Thread Niels Ole Salscheider
Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeonsi/r600.h|  1 +
 src/gallium/drivers/radeonsi/r600_hw_context.c | 28 ++
 src/gallium/drivers/radeonsi/r600_query.c  |  7 +--
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  2 +-
 src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  4 ++--
 src/gallium/drivers/radeonsi/si_state_draw.c   |  2 +-
 6 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600.h 
b/src/gallium/drivers/radeonsi/r600.h
index ce0468d..ac3b2f1 100644
--- a/src/gallium/drivers/radeonsi/r600.h
+++ b/src/gallium/drivers/radeonsi/r600.h
@@ -102,6 +102,7 @@ void si_context_emit_fence(struct r600_context *ctx, struct 
si_resource *fence,
unsigned offset, unsigned value);
 
 void r600_context_draw_opaque_count(struct r600_context *ctx, struct 
r600_so_target *t);
+bool si_is_timer_query(unsigned type);
 bool si_query_needs_begin(unsigned type);
 void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean 
count_draw_in);
 
diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index 59b2d70..f050b3b 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -110,6 +110,13 @@ err:
return;
 }
 
+bool si_is_timer_query(unsigned type)
+{
+   return type == PIPE_QUERY_TIME_ELAPSED ||
+   type == PIPE_QUERY_TIMESTAMP ||
+   type == PIPE_QUERY_TIMESTAMP_DISJOINT;
+}
+
 bool si_query_needs_begin(unsigned type)
 {
return type != PIPE_QUERY_TIMESTAMP;
@@ -139,7 +146,7 @@ void si_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
}
 
/* Count in queries_suspend. */
-   num_dw += ctx-num_cs_dw_queries_suspend;
+   num_dw += ctx-num_cs_dw_nontimer_queries_suspend;
 
/* Count in streamout_end at the end of CS. */
num_dw += ctx-num_cs_dw_streamout_end;
@@ -211,7 +218,7 @@ void si_context_flush(struct r600_context *ctx, unsigned 
flags)
return;
 
/* suspend queries */
-   if (ctx-num_cs_dw_queries_suspend) {
+   if (ctx-num_cs_dw_nontimer_queries_suspend) {
r600_context_queries_suspend(ctx);
queries_suspended = true;
}
@@ -506,7 +513,9 @@ void r600_query_begin(struct r600_context *ctx, struct 
r600_query *query)
cs-buf[cs-cdw++] = PKT3(PKT3_NOP, 0, 0);
cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, 
RADEON_USAGE_WRITE);
 
-   ctx-num_cs_dw_queries_suspend += query-num_cs_dw;
+   if (!si_is_timer_query(query-type)) {
+   ctx-num_cs_dw_nontimer_queries_suspend += query-num_cs_dw;
+   }
 }
 
 void r600_query_end(struct r600_context *ctx, struct r600_query *query)
@@ -565,7 +574,10 @@ void r600_query_end(struct r600_context *ctx, struct 
r600_query *query)
cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, 
RADEON_USAGE_WRITE);
 
query-results_end = (query-results_end + query-result_size) % 
query-buffer-b.b.width0;
-   ctx-num_cs_dw_queries_suspend -= query-num_cs_dw;
+
+   if (si_query_needs_begin(query-type)  
!si_is_timer_query(query-type)) {
+   ctx-num_cs_dw_nontimer_queries_suspend -= query-num_cs_dw;
+   }
 }
 
 void r600_query_predication(struct r600_context *ctx, struct r600_query 
*query, int operation,
@@ -712,19 +724,19 @@ void r600_context_queries_suspend(struct r600_context 
*ctx)
 {
struct r600_query *query;
 
-   LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) {
+   LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) {
r600_query_end(ctx, query);
}
-   assert(ctx-num_cs_dw_queries_suspend == 0);
+   assert(ctx-num_cs_dw_nontimer_queries_suspend == 0);
 }
 
 void r600_context_queries_resume(struct r600_context *ctx)
 {
struct r600_query *query;
 
-   assert(ctx-num_cs_dw_queries_suspend == 0);
+   assert(ctx-num_cs_dw_nontimer_queries_suspend == 0);
 
-   LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) {
+   LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) {
r600_query_begin(ctx, query);
}
 }
diff --git a/src/gallium/drivers/radeonsi/r600_query.c 
b/src/gallium/drivers/radeonsi/r600_query.c
index 927577c..aa51e74 100644
--- a/src/gallium/drivers/radeonsi/r600_query.c
+++ b/src/gallium/drivers/radeonsi/r600_query.c
@@ -50,7 +50,10 @@ static void r600_begin_query(struct pipe_context *ctx, 
struct pipe_query *query)
memset(rquery-result, 0, sizeof(rquery-result));
rquery-results_start = rquery-results_end;
r600_query_begin(rctx, (struct r600_query *)query);
-   LIST_ADDTAIL(rquery-list, rctx-active_query_list);
+
+   if (!si_is_timer_query(rquery-type)) {
+   

[Mesa-dev] [PATCH] radeon/uvd: fix MPEG2/4 ref frame index limit

2013-08-28 Thread Christian König
From: Christian König christian.koe...@amd.com

Otherwise the first few frames have an incorrect reference index.

Signed-off-by: Christian König christian.koe...@amd.com
---
 src/gallium/drivers/radeon/radeon_uvd.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index f3652a6..3e00977 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -493,8 +493,8 @@ uint8_t pquant
 /* extract the frame number from a referenced video buffer */
 static uint32_t get_ref_pic_idx(struct ruvd_decoder *dec, struct 
pipe_video_buffer *ref)
 {
-   uint32_t min = dec-frame_number - NUM_MPEG2_REFS;
-   uint32_t max = dec-frame_number - 1;
+   uint32_t min = MAX2(dec-frame_number, NUM_MPEG2_REFS) - NUM_MPEG2_REFS;
+   uint32_t max = MAX2(dec-frame_number, 1) - 1;
uintptr_t frame;
 
/* seems to be the most sane fallback */
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] r600g,radeonsi: remove unused variables

2013-08-28 Thread Marek Olšák
---
 src/gallium/drivers/r600/r600_pipe.h | 3 ---
 src/gallium/drivers/radeonsi/radeonsi_pipe.h | 5 -
 2 files changed, 8 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 21d68c9..1564cc3 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -417,9 +417,6 @@ struct r600_fence_block {
struct list_headhead;
 };
 
-#define R600_CONSTANT_ARRAY_SIZE 256
-#define R600_RESOURCE_ARRAY_SIZE 160
-
 struct r600_constbuf_state
 {
struct r600_atomatom;
diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
index f9e4999..cd5a4f7 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
@@ -102,8 +102,6 @@ struct r600_textures_info {
uint32_tdepth_texture_mask; /* which textures 
are depth */
uint32_tcompressed_colortex_mask;
unsignedn_samplers;
-   boolsamplers_dirty;
-   boolis_array_sampler[NUM_TEX_UNITS];
 };
 
 struct r600_fence {
@@ -120,9 +118,6 @@ struct r600_fence_block {
struct list_headhead;
 };
 
-#define R600_CONSTANT_ARRAY_SIZE 256
-#define R600_RESOURCE_ARRAY_SIZE 160
-
 struct r600_constbuf_state
 {
struct pipe_constant_buffer cb[2];
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] radeonsi: cleanup initialization of SGPR shader parameters

2013-08-28 Thread Marek Olšák
---
 src/gallium/drivers/radeonsi/radeonsi_shader.c | 32 +++---
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 2b1928a..13bc92c 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -1349,7 +1349,7 @@ static void create_function(struct si_shader_context 
*si_shader_ctx)
struct lp_build_tgsi_context *bld_base = 
si_shader_ctx-radeon_bld.soa.bld_base;
struct gallivm_state *gallivm = bld_base-base.gallivm;
LLVMTypeRef params[20], f32, i8, i32, v2i32, v3i32;
-   unsigned i;
+   unsigned i, last_sgpr, num_params;
 
i8 = LLVMInt8TypeInContext(gallivm-context);
i32 = LLVMInt32TypeInContext(gallivm-context);
@@ -1361,17 +1361,21 @@ static void create_function(struct si_shader_context 
*si_shader_ctx)
params[SI_PARAM_SAMPLER] = params[SI_PARAM_CONST];
params[SI_PARAM_RESOURCE] = LLVMPointerType(LLVMVectorType(i8, 32), 
CONST_ADDR_SPACE);
 
-   if (si_shader_ctx-type == TGSI_PROCESSOR_VERTEX) {
-   params[SI_PARAM_VERTEX_BUFFER] = params[SI_PARAM_SAMPLER];
+   switch (si_shader_ctx-type) {
+   case TGSI_PROCESSOR_VERTEX:
+   params[SI_PARAM_VERTEX_BUFFER] = params[SI_PARAM_CONST];
params[SI_PARAM_START_INSTANCE] = i32;
+   last_sgpr = SI_PARAM_START_INSTANCE;
params[SI_PARAM_VERTEX_ID] = i32;
params[SI_PARAM_DUMMY_0] = i32;
params[SI_PARAM_DUMMY_1] = i32;
params[SI_PARAM_INSTANCE_ID] = i32;
-   radeon_llvm_create_func(si_shader_ctx-radeon_bld, params, 9);
+   num_params = SI_PARAM_INSTANCE_ID+1;
+   break;
 
-   } else {
+   case TGSI_PROCESSOR_FRAGMENT:
params[SI_PARAM_PRIM_MASK] = i32;
+   last_sgpr = SI_PARAM_PRIM_MASK;
params[SI_PARAM_PERSP_SAMPLE] = v2i32;
params[SI_PARAM_PERSP_CENTER] = v2i32;
params[SI_PARAM_PERSP_CENTROID] = v2i32;
@@ -1388,18 +1392,20 @@ static void create_function(struct si_shader_context 
*si_shader_ctx)
params[SI_PARAM_ANCILLARY] = f32;
params[SI_PARAM_SAMPLE_COVERAGE] = f32;
params[SI_PARAM_POS_FIXED_PT] = f32;
-   radeon_llvm_create_func(si_shader_ctx-radeon_bld, params, 20);
+   num_params = SI_PARAM_POS_FIXED_PT+1;
+   break;
+
+   default:
+   assert(0  unimplemented shader);
+   return;
}
 
+   assert(num_params = Elements(params));
+   radeon_llvm_create_func(si_shader_ctx-radeon_bld, params, num_params);
radeon_llvm_shader_type(si_shader_ctx-radeon_bld.main_fn, 
si_shader_ctx-type);
-   for (i = SI_PARAM_CONST; i = SI_PARAM_VERTEX_BUFFER; ++i) {
-   LLVMValueRef P = 
LLVMGetParam(si_shader_ctx-radeon_bld.main_fn, i);
-   LLVMAddAttribute(P, LLVMInRegAttribute);
-   }
 
-   if (si_shader_ctx-type == TGSI_PROCESSOR_VERTEX) {
-   LLVMValueRef P = LLVMGetParam(si_shader_ctx-radeon_bld.main_fn,
- SI_PARAM_START_INSTANCE);
+   for (i = 0; i = last_sgpr; ++i) {
+   LLVMValueRef P = 
LLVMGetParam(si_shader_ctx-radeon_bld.main_fn, i);
LLVMAddAttribute(P, LLVMInRegAttribute);
}
 
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] radeonsi: simplify and improve flushing

2013-08-28 Thread Marek Olšák
This mimics r600g. The R600_CONTEXT_xxx flags are added to rctx-b.flags
and si_emit_cache_flush emits the packets. That's it. The shared radeon code
tells us when the streamout cache should be flushed, so we have to check
the flags anyway.

There is a new atom cache_flush, because caches must be flushed *after*
resource descriptors are changed in memory.

Functional changes:

* Write caches are flushed at the end of CS and read caches are flushed
  at its beginning.

* Sampler view states are removed from si_state, they only held the flush
  flags.

* Everytime a shader is changed, the I cache is flushed. Is this needed?
  Due to a hw bug, this also flushes the K cache.

* The WRITE_DATA packet is changed to use TC, which fixes a rendering issue
  in openarena. I'm not sure how TC interacts with CP DMA, but for now it
  seems to work better than any other solution I tried. (BTW CIK allows us
  to use TC for CP DMA.)

* Flush the K cache instead of the texture cache when updating resource
  descriptors (due to a hw bug, this also flushes the I cache).
  I think the K cache flush is correct here, but I'm not sure if the texture
  cache should be flushed too (probably not considering we use TC
  for WRITE_DATA, but we don't use TC for CP DMA).

* The number of resource contexts is decreased to 16. With all of these cache
  changes, 4 doesn't work, but 8 works, which suggests I'm actually doing
  the right thing here and the pipeline isn't drained during flushes.
---
 src/gallium/drivers/radeon/r600_pipe_common.h  |   1 +
 src/gallium/drivers/radeonsi/r600.h|   3 -
 src/gallium/drivers/radeonsi/r600_hw_context.c |  45 +++---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |   4 +
 src/gallium/drivers/radeonsi/radeonsi_pipe.h   |   8 +-
 src/gallium/drivers/radeonsi/radeonsi_pm4.c|  11 ---
 src/gallium/drivers/radeonsi/radeonsi_pm4.h|   2 -
 src/gallium/drivers/radeonsi/si_commands.c |   9 --
 src/gallium/drivers/radeonsi/si_descriptors.c  |  16 ++--
 src/gallium/drivers/radeonsi/si_state.c|  46 +-
 src/gallium/drivers/radeonsi/si_state.h|   9 +-
 src/gallium/drivers/radeonsi/si_state_draw.c   | 111 -
 12 files changed, 125 insertions(+), 140 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 4b993ee..bd13488 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -42,6 +42,7 @@
 #define R600_CONTEXT_INV_VERTEX_CACHE  (1  0)
 #define R600_CONTEXT_INV_TEX_CACHE (1  1)
 #define R600_CONTEXT_INV_CONST_CACHE   (1  2)
+#define R600_CONTEXT_INV_SHADER_CACHE  (1  3)
 /* read-write caches */
 #define R600_CONTEXT_STREAMOUT_FLUSH   (1  8)
 #define R600_CONTEXT_FLUSH_AND_INV (1  9)
diff --git a/src/gallium/drivers/radeonsi/r600.h 
b/src/gallium/drivers/radeonsi/r600.h
index ebadd97..46cfb14 100644
--- a/src/gallium/drivers/radeonsi/r600.h
+++ b/src/gallium/drivers/radeonsi/r600.h
@@ -69,9 +69,6 @@ struct r600_query {
struct list_headlist;
 };
 
-#define R600_CONTEXT_DST_CACHES_DIRTY  (1  1)
-#define R600_CONTEXT_CHECK_EVENT_FLUSH (1  2)
-
 struct r600_context;
 struct r600_screen;
 
diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index 5631bdb..5826349 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -150,7 +150,7 @@ void si_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
}
 
/* Count in framebuffer cache flushes at the end of CS. */
-   num_dw += 7; /* one SURFACE_SYNC and CACHE_FLUSH_AND_INV (r6xx-only) */
+   num_dw += ctx-atoms.cache_flush-num_dw;
 
/* Save 16 dwords for the fence mechanism. */
num_dw += 16;
@@ -167,37 +167,6 @@ void si_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
}
 }
 
-static void r600_flush_framebuffer(struct r600_context *ctx)
-{
-   struct si_pm4_state *pm4;
-
-   if (!(ctx-flags  R600_CONTEXT_DST_CACHES_DIRTY))
-   return;
-
-   pm4 = si_pm4_alloc_state(ctx);
-
-   if (pm4 == NULL)
-   return;
-
-   si_cmd_surface_sync(pm4, S_0085F0_CB0_DEST_BASE_ENA(1) |
-   S_0085F0_CB1_DEST_BASE_ENA(1) |
-   S_0085F0_CB2_DEST_BASE_ENA(1) |
-   S_0085F0_CB3_DEST_BASE_ENA(1) |
-   S_0085F0_CB4_DEST_BASE_ENA(1) |
-   S_0085F0_CB5_DEST_BASE_ENA(1) |
-   S_0085F0_CB6_DEST_BASE_ENA(1) |
-   S_0085F0_CB7_DEST_BASE_ENA(1) |
-   S_0085F0_DB_ACTION_ENA(1) |
-   S_0085F0_DB_DEST_BASE_ENA(1));
-   si_cmd_flush_and_inv_cb_meta(pm4);
-

[Mesa-dev] [PATCH 0/6] radeonsi: Minor cleanups and improvements

2013-08-28 Thread Marek Olšák
This series contains the changes my transform feedback work depends on, but 
there are some useful fixes too, making it worth comitting earlier.

The last patch is the most important one, because it fixes the issues we had 
with the emission of resource descriptors that we had to use 256 resource 
contexts as a workaround. Further testing has shown that even 256 wasn't 
enough. With that patch, we only need 8 or 16 contexts as originally expected.

I also made the first step towards sharing code between r600g and radeonsi and 
it's what made this series so big:

54 files changed, 2448 insertions(+), 2532 deletions(-)

Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] radeonsi: convert constant buffers to si_descriptors

2013-08-28 Thread Marek Olšák
There is a new class si_buffer_resources, which should be good enough for
implementing any kind of buffer bindings (constant buffers, vertex buffers,
streamout buffers, shader storage buffers, etc.)

I don't even keep a copy of pipe_constant_buffer - we don't need it.

The main motivation behind this is to have a well-tested infrastrusture
for setting up streamout buffers.
---
 src/gallium/drivers/radeonsi/radeonsi_pipe.h  |  10 +-
 src/gallium/drivers/radeonsi/si_descriptors.c | 143 +-
 src/gallium/drivers/radeonsi/si_state.c   |  42 
 src/gallium/drivers/radeonsi/si_state.h   |  15 ++-
 src/gallium/drivers/radeonsi/si_state_draw.c  |  80 ++
 5 files changed, 162 insertions(+), 128 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
index ef531fb..e6e99c7 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
@@ -115,13 +115,6 @@ struct r600_fence_block {
struct list_headhead;
 };
 
-struct r600_constbuf_state
-{
-   struct pipe_constant_buffer cb[2];
-   uint32_tenabled_mask;
-   uint32_tdirty_mask;
-};
-
 #define SI_NUM_ATOMS(rctx) 
(sizeof((rctx)-atoms)/sizeof((rctx)-atoms.array[0]))
 #define SI_NUM_SHADERS (PIPE_SHADER_FRAGMENT+1)
 
@@ -138,6 +131,7 @@ struct r600_context {
 
union {
struct {
+   struct r600_atom *const_buffers[SI_NUM_SHADERS];
struct r600_atom *sampler_views[SI_NUM_SHADERS];
};
struct r600_atom *array[0];
@@ -164,7 +158,7 @@ struct r600_context {
/* shader information */
unsignedsprite_coord_enable;
unsignedexport_16bpc;
-   struct r600_constbuf_state  constbuf_state[PIPE_SHADER_TYPES];
+   struct si_buffer_resources  const_buffers[SI_NUM_SHADERS];
struct r600_textures_info   samplers[SI_NUM_SHADERS];
struct r600_resource*border_color_table;
unsignedborder_color_offset;
diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index db0da75..2983d75 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -32,7 +32,7 @@
 
 #define SI_NUM_CONTEXTS 256
 
-static const uint32_t null_desc[8]; /* zeros */
+static uint32_t null_desc[8]; /* zeros */
 
 /* Set this if you want the 3D engine to wait until CP DMA is done.
  * It should be set on the last CP DMA packet. */
@@ -170,7 +170,7 @@ static void si_emit_shader_pointer(struct r600_context 
*rctx,
 
 static void si_emit_descriptors(struct r600_context *rctx,
struct si_descriptors *desc,
-   const uint32_t **descriptors)
+   uint32_t **descriptors)
 {
struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs;
uint64_t va_base;
@@ -325,6 +325,135 @@ void si_set_sampler_view(struct r600_context *rctx, 
unsigned shader,
si_update_descriptors(views-desc);
 }
 
+/* BUFFER RESOURCES */
+
+static void si_emit_buffer_resources(struct r600_context *rctx, struct 
r600_atom *atom)
+{
+   struct si_buffer_resources *buffers = (struct si_buffer_resources*)atom;
+
+   si_emit_descriptors(rctx, buffers-desc, buffers-desc_data);
+}
+
+static void si_init_buffer_resources(struct r600_context *rctx,
+struct si_buffer_resources *buffers,
+unsigned num_buffers, unsigned shader,
+unsigned shader_userdata_index,
+enum radeon_bo_usage shader_usage)
+{
+   int i;
+
+   buffers-num_buffers = num_buffers;
+   buffers-shader_usage = shader_usage;
+   buffers-buffers = CALLOC(num_buffers, sizeof(struct pipe_resource*));
+   buffers-desc_storage = CALLOC(num_buffers, sizeof(uint32_t) * 4);
+
+   /* si_emit_descriptors only accepts an array of arrays.
+* This adds such an array. */
+   buffers-desc_data = CALLOC(num_buffers, sizeof(uint32_t*));
+   for (i = 0; i  num_buffers; i++) {
+   buffers-desc_data[i] = buffers-desc_storage[i*4];
+   }
+
+   si_init_descriptors(rctx, buffers-desc,
+   si_get_shader_user_data_base(shader) +
+   shader_userdata_index*4, 4, num_buffers,
+   si_emit_buffer_resources);
+}
+
+static void si_release_buffer_resources(struct si_buffer_resources *buffers)
+{
+   int i;
+
+   for (i = 0; i  Elements(buffers-buffers); i++) {
+   pipe_resource_reference(buffers-buffers[i], NULL);
+   }
+
+   FREE(buffers-buffers);
+   

Re: [Mesa-dev] [PATCH] radeonsi: Do not suspend timer queries

2013-08-28 Thread Marek Olšák
Reviewed-by: Marek Olšák marek.ol...@amd.com

Marek

On Wed, Aug 28, 2013 at 6:42 PM, Niels Ole Salscheider
niels_...@salscheider-online.de wrote:
 Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
 ---
  src/gallium/drivers/radeonsi/r600.h|  1 +
  src/gallium/drivers/radeonsi/r600_hw_context.c | 28 
 ++
  src/gallium/drivers/radeonsi/r600_query.c  |  7 +--
  src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  2 +-
  src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  4 ++--
  src/gallium/drivers/radeonsi/si_state_draw.c   |  2 +-
  6 files changed, 30 insertions(+), 14 deletions(-)

 diff --git a/src/gallium/drivers/radeonsi/r600.h 
 b/src/gallium/drivers/radeonsi/r600.h
 index ce0468d..ac3b2f1 100644
 --- a/src/gallium/drivers/radeonsi/r600.h
 +++ b/src/gallium/drivers/radeonsi/r600.h
 @@ -102,6 +102,7 @@ void si_context_emit_fence(struct r600_context *ctx, 
 struct si_resource *fence,
 unsigned offset, unsigned value);

  void r600_context_draw_opaque_count(struct r600_context *ctx, struct 
 r600_so_target *t);
 +bool si_is_timer_query(unsigned type);
  bool si_query_needs_begin(unsigned type);
  void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean 
 count_draw_in);

 diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
 b/src/gallium/drivers/radeonsi/r600_hw_context.c
 index 59b2d70..f050b3b 100644
 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c
 +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
 @@ -110,6 +110,13 @@ err:
 return;
  }

 +bool si_is_timer_query(unsigned type)
 +{
 +   return type == PIPE_QUERY_TIME_ELAPSED ||
 +   type == PIPE_QUERY_TIMESTAMP ||
 +   type == PIPE_QUERY_TIMESTAMP_DISJOINT;
 +}
 +
  bool si_query_needs_begin(unsigned type)
  {
 return type != PIPE_QUERY_TIMESTAMP;
 @@ -139,7 +146,7 @@ void si_need_cs_space(struct r600_context *ctx, unsigned 
 num_dw,
 }

 /* Count in queries_suspend. */
 -   num_dw += ctx-num_cs_dw_queries_suspend;
 +   num_dw += ctx-num_cs_dw_nontimer_queries_suspend;

 /* Count in streamout_end at the end of CS. */
 num_dw += ctx-num_cs_dw_streamout_end;
 @@ -211,7 +218,7 @@ void si_context_flush(struct r600_context *ctx, unsigned 
 flags)
 return;

 /* suspend queries */
 -   if (ctx-num_cs_dw_queries_suspend) {
 +   if (ctx-num_cs_dw_nontimer_queries_suspend) {
 r600_context_queries_suspend(ctx);
 queries_suspended = true;
 }
 @@ -506,7 +513,9 @@ void r600_query_begin(struct r600_context *ctx, struct 
 r600_query *query)
 cs-buf[cs-cdw++] = PKT3(PKT3_NOP, 0, 0);
 cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, 
 RADEON_USAGE_WRITE);

 -   ctx-num_cs_dw_queries_suspend += query-num_cs_dw;
 +   if (!si_is_timer_query(query-type)) {
 +   ctx-num_cs_dw_nontimer_queries_suspend += query-num_cs_dw;
 +   }
  }

  void r600_query_end(struct r600_context *ctx, struct r600_query *query)
 @@ -565,7 +574,10 @@ void r600_query_end(struct r600_context *ctx, struct 
 r600_query *query)
 cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, 
 RADEON_USAGE_WRITE);

 query-results_end = (query-results_end + query-result_size) % 
 query-buffer-b.b.width0;
 -   ctx-num_cs_dw_queries_suspend -= query-num_cs_dw;
 +
 +   if (si_query_needs_begin(query-type)  
 !si_is_timer_query(query-type)) {
 +   ctx-num_cs_dw_nontimer_queries_suspend -= query-num_cs_dw;
 +   }
  }

  void r600_query_predication(struct r600_context *ctx, struct r600_query 
 *query, int operation,
 @@ -712,19 +724,19 @@ void r600_context_queries_suspend(struct r600_context 
 *ctx)
  {
 struct r600_query *query;

 -   LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) {
 +   LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) {
 r600_query_end(ctx, query);
 }
 -   assert(ctx-num_cs_dw_queries_suspend == 0);
 +   assert(ctx-num_cs_dw_nontimer_queries_suspend == 0);
  }

  void r600_context_queries_resume(struct r600_context *ctx)
  {
 struct r600_query *query;

 -   assert(ctx-num_cs_dw_queries_suspend == 0);
 +   assert(ctx-num_cs_dw_nontimer_queries_suspend == 0);

 -   LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) {
 +   LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) {
 r600_query_begin(ctx, query);
 }
  }
 diff --git a/src/gallium/drivers/radeonsi/r600_query.c 
 b/src/gallium/drivers/radeonsi/r600_query.c
 index 927577c..aa51e74 100644
 --- a/src/gallium/drivers/radeonsi/r600_query.c
 +++ b/src/gallium/drivers/radeonsi/r600_query.c
 @@ -50,7 +50,10 @@ static void r600_begin_query(struct pipe_context *ctx, 
 struct pipe_query *query)
 memset(rquery-result, 0, sizeof(rquery-result));

Re: [Mesa-dev] [PATCH 13/22] i965/gs: Implement support for geometry shader surfaces.

2013-08-28 Thread Paul Berry
On 26 August 2013 15:12, Paul Berry stereotype...@gmail.com wrote:

 This patch implements pull constant upload, binding table upload, and
 surface setup for geometry shaders, by re-using vertex shader code
 that was generalized in previous patches.

 Based on work by Eric Anholt e...@anholt.net.
 ---
  src/mesa/drivers/dri/i965/Makefile.sources   |   1 +
  src/mesa/drivers/dri/i965/brw_context.h  |   2 +
  src/mesa/drivers/dri/i965/brw_gs_surface_state.c | 123
 +++
  src/mesa/drivers/dri/i965/brw_state.h|   3 +
  src/mesa/drivers/dri/i965/brw_state_upload.c |   3 +
  5 files changed, 132 insertions(+)
  create mode 100644 src/mesa/drivers/dri/i965/brw_gs_surface_state.c

 diff --git a/src/mesa/drivers/dri/i965/Makefile.sources
 b/src/mesa/drivers/dri/i965/Makefile.sources
 index 290cd93..81a16ff 100644
 --- a/src/mesa/drivers/dri/i965/Makefile.sources
 +++ b/src/mesa/drivers/dri/i965/Makefile.sources
 @@ -63,6 +63,7 @@ i965_FILES = \
 brw_gs.c \
 brw_gs_emit.c \
 brw_gs_state.c \
 +   brw_gs_surface_state.c \
 brw_interpolation_map.c \
 brw_lower_texture_gradients.cpp \
 brw_misc_state.c \
 diff --git a/src/mesa/drivers/dri/i965/brw_context.h
 b/src/mesa/drivers/dri/i965/brw_context.h
 index 35193a6..622b5c8 100644
 --- a/src/mesa/drivers/dri/i965/brw_context.h
 +++ b/src/mesa/drivers/dri/i965/brw_context.h
 @@ -148,6 +148,7 @@ enum brw_state_id {
 BRW_STATE_BATCH,
 BRW_STATE_INDEX_BUFFER,
 BRW_STATE_VS_CONSTBUF,
 +   BRW_STATE_GS_CONSTBUF,
 BRW_STATE_PROGRAM_CACHE,
 BRW_STATE_STATE_BASE_ADDRESS,
 BRW_STATE_VUE_MAP_VS,
 @@ -185,6 +186,7 @@ enum brw_state_id {
  /** \see brw.state.depth_region */
  #define BRW_NEW_INDEX_BUFFER   (1  BRW_STATE_INDEX_BUFFER)
  #define BRW_NEW_VS_CONSTBUF(1  BRW_STATE_VS_CONSTBUF)
 +#define BRW_NEW_GS_CONSTBUF(1  BRW_STATE_GS_CONSTBUF)
  #define BRW_NEW_PROGRAM_CACHE  (1  BRW_STATE_PROGRAM_CACHE)
  #define BRW_NEW_STATE_BASE_ADDRESS (1  BRW_STATE_STATE_BASE_ADDRESS)
  #define BRW_NEW_VUE_MAP_VS (1  BRW_STATE_VUE_MAP_VS)
 diff --git a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
 b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
 new file mode 100644
 index 000..d3d48ff
 --- /dev/null
 +++ b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
 @@ -0,0 +1,123 @@
 +/*
 + * Copyright © 2013 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the
 Software),
 + * to deal in the Software without restriction, including without
 limitation
 + * the rights to use, copy, modify, merge, publish, distribute,
 sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the
 next
 + * paragraph) shall be included in all copies or substantial portions of
 the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
 SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 DEALINGS
 + * IN THE SOFTWARE.
 + */
 +
 +#include main/mtypes.h
 +#include program/prog_parameter.h
 +
 +#include brw_context.h
 +#include brw_state.h
 +
 +
 +/* Creates a new GS constant buffer reflecting the current GS program's
 + * constants, if needed by the GS program.
 + *
 + * Otherwise, constants go through the CURBEs using the
 brw_constant_buffer
 + * state atom.
 + */
 +static void
 +brw_upload_gs_pull_constants(struct brw_context *brw)
 +{
 +   struct brw_vec4_context_base *vec4_ctx = brw-gs.base;
 +
 +   /* BRW_NEW_GEOMETRY_PROGRAM */
 +   struct brw_geometry_program *gp =
 +  (struct brw_geometry_program *) brw-geometry_program;
 +
 +   if (!gp)
 +  return;
 +
 +   /* CACHE_NEW_GS_PROG */
 +   const struct brw_vec4_prog_data *prog_data = brw-gs.prog_data-base;
 +
 +   /* _NEW_PROGRAM_CONSTANTS */
 +   brw_upload_vec4_pull_constants(brw, BRW_NEW_GS_CONSTBUF,
 gp-program.Base,
 +  vec4_ctx, prog_data);
 +}
 +
 +const struct brw_tracked_state brw_gs_pull_constants = {
 +   .dirty = {
 +  .mesa = (_NEW_PROGRAM_CONSTANTS),
 +  .brw = (BRW_NEW_BATCH | BRW_NEW_GEOMETRY_PROGRAM),
 +  .cache = CACHE_NEW_GS_PROG,
 +   },
 +   .emit = brw_upload_gs_pull_constants,
 +};
 +
 +static void
 +brw_upload_gs_ubo_surfaces(struct brw_context *brw)
 +{
 +   struct gl_context *ctx = brw-ctx;
 +   struct brw_vec4_context_base *vec4_ctx = 

[Mesa-dev] [PATCH] vbo: Implement new gs prim types in vbo_count_tessellated_primitives.

2013-08-28 Thread Paul Berry
---
 src/mesa/vbo/vbo_exec.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/mesa/vbo/vbo_exec.c b/src/mesa/vbo/vbo_exec.c
index 9c20bde..aa2c7b0 100644
--- a/src/mesa/vbo/vbo_exec.c
+++ b/src/mesa/vbo/vbo_exec.c
@@ -149,6 +149,18 @@ vbo_count_tessellated_primitives(GLenum mode, GLuint count,
case GL_QUADS:
   num_primitives = (count / 4) * 2;
   break;
+   case GL_LINES_ADJACENCY:
+  num_primitives = count / 4;
+  break;
+   case GL_LINE_STRIP_ADJACENCY:
+  num_primitives = count = 4 ? count - 3 : 0;
+  break;
+   case GL_TRIANGLES_ADJACENCY:
+  num_primitives = count / 6;
+  break;
+   case GL_TRIANGLE_STRIP_ADJACENCY:
+  num_primitives = count = 6 ? (count - 4) / 2 : 0;
+  break;
default:
   assert(!Unexpected primitive type in count_tessellated_primitives);
   num_primitives = 0;
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Vadim Girlin

On 08/28/2013 01:15 PM, Christian König wrote:

Well, for this discussion let's just assume that we fixed the delay in
the upper layers of the stack and the driver sees the shader code as
soon as the application (if I understood it correctly Vadim has just
volunteered for the job).


No, I'm not really volunteering to implement that. :)
I'm not even sure if it's possible in reasonable time. In fact it was 
more like a theoretical discussion about what would be required for the 
early compilation in the driver to make sense.


Perhaps I failed to explain it, but actually my point is that while the 
compilation is deferred in upper layers and nobody is going to change 
this (if it's possible at all), it doesn't make sense to try compiling 
early in the driver. I think we might prefer to defer the compilation in 
the driver as well - it doesn't make overall situation any worse, but 
can make it better by not compiling unused variants at least.


Vadim


Also let's assume that shaders are small and having allot of shader
variants around after they are compiled isn't bad.

In this case the probably best solution is to compile early and try to
make the shaders as state invariant as possible, e.g. don't do
optimizations like getting ride of extra exports for case where we don't
need the alpha test or if it's just a dependency on a boolean then have
both variants covered by the bytecode and use a bit constant to choose
between the two etc...

As a second step the driver should create a optimized version of the
shader in a background thread when we know all the state that is/was
active when the shader is used.

Of course you need a bit of heuristic for this, cause sometimes it is
better to switch between shader variants and other times it is better to
have one variant covering all the different states and just use bit
constants to choose between them.

Just some thoughts on this topic,
Christian.

PS: My mail server is once more driving me nuts, please ignore the extra
copy if you get this mail twice.

Am 28.08.2013 02:07, schrieb Vadim Girlin:

On 08/28/2013 02:59 AM, Marek Olšák wrote:

First, you won't really see any significant continual difference in
frame rate no matter how many shader variants you have unless you are
very CPU-bound. The problem is shader compilation on the first use,
that's where you get a big hiccup. Try Skyrim for example: You have to
first look around and see every object that's around you and get
unpleasant stuttering before you can actually go on and play the game.
Yes, this also Wine's fault that it compiles shaders on the first use
too, but we don't have to be as bad as Wine, do we? Valve also
reported shader recompilations on the first use being a serious issue
with open source drivers.


I perfectly understand that deferred compilation is exactly the
problem that makes the games freeze due to shader compilation on first
use when something new appears on the screen, but I don't think we can
solve this problem in the *driver* by trying to compile early, because
AFAICS currently the shaders are passed to the driver too late anyway,
and this happens not only with wine. E.g. when I run Heaven in a
window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see
Heaven's window and console output at the same time, what I see is
that most of GL dumps happen while Heaven shows splash screen with
loading progress, but most of the driver's dumps appear on the first
frame and few more times during benchmark. It looks like compilation
is deferred somewhere in the stack before the driver, or am I missing
something?

Vadim




Marek

On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin
vadimgir...@gmail.com wrote:

On 08/28/2013 12:43 AM, Marek Olšák wrote:


Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.



I don't think that shader variants are bad, but it's definitely bad
when we
are compiling variants that are never used. Currently glxgears
compiles 18
ps/vs shaders. In my branch with initial GS support [1] I switched
handling
of the shaders to deferred compilation, that is, shaders are
compiled only
before the actual draw. I found later that it's not really required
for GS,
but IIRC this change results in only 5 shaders being compiled for
glxgears
instead of 18. It seems most of the useless variants are results of
state
changes between creation of the shader state (initial compilation) and
actual draw call.

I had some concerns about increased overhead with those changes, and
it's
actually noticeable with drawoverhead demo, but I didn't see any
regressions
with a few real apps that I tested, e.g. glxgears even showed slightly
better performance with these changes. Probably I also implemented
it in a
not very optimal way (I was mostly concentrated on GS support) and the
overhead can be reduced.

One more thing is duplicate shaders, I've analyzed shader dumps from
Unigine
Heaven 3.0 some time 

[Mesa-dev] [PATCH 3/6] r600g: move streamout state to drivers/radeon

2013-08-28 Thread Marek Olšák
It looks like this patch got stuck in the moderation queue. You can
also find it here:

http://cgit.freedesktop.org/~mareko/mesa/commit/?h=radeonsi-stuffid=13bb26b24e738da6a8c51ee33876dc541fcde9da

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

2013-08-28 Thread Marek Olšák
On Wed, Aug 28, 2013 at 7:56 PM, Vadim Girlin vadimgir...@gmail.com wrote:
 On 08/28/2013 01:15 PM, Christian König wrote:

 Well, for this discussion let's just assume that we fixed the delay in
 the upper layers of the stack and the driver sees the shader code as
 soon as the application (if I understood it correctly Vadim has just
 volunteered for the job).


 No, I'm not really volunteering to implement that. :)
 I'm not even sure if it's possible in reasonable time. In fact it was more
 like a theoretical discussion about what would be required for the early
 compilation in the driver to make sense.

 Perhaps I failed to explain it, but actually my point is that while the
 compilation is deferred in upper layers and nobody is going to change this
 (if it's possible at all), it doesn't make sense to try compiling early in
 the driver. I think we might prefer to defer the compilation in the driver
 as well - it doesn't make overall situation any worse, but can make it
 better by not compiling unused variants at least.

Sounds good to me.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965/vs: Detect GRF sources in split_virtual_grfs send-from-GRF code.

2013-08-28 Thread Kenneth Graunke
It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF.
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 uses an IMM as src[0], and stores the
GRF as src[1].

To be safe, loop over all the source registers and mark any GRFs.  We
probably won't ever have more than one, but it's simpler to just check
all three rather than attempting to bail early.

Fixes assertion failures in Unigine Sanctuary since we started making
register allocation rely on split_virtual_grfs working.  (The register
classes were actually sufficient, we were just interpreting an IMM as
a virtual GRF number.)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68637
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

The assertion failures mentioned in the bug don't exist on 9.2, but the
underlying bug that caused them to fail still does, so I think it makes
sense to backport.  Not sure if these SEND-from-GRFs existed in 9.1.

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index ae836d3..55fa7c8 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1034,10 +1034,14 @@ vec4_visitor::split_virtual_grfs()
   vec4_instruction *inst = (vec4_instruction *)node;
 
   /* If there's a SEND message loading from a GRF on gen7+, it needs to be
-   * contiguous.  Assume that the GRF for the SEND is always in src[0].
+   * contiguous.
*/
   if (inst-is_send_from_grf()) {
- split_grf[inst-src[0].reg] = false;
+ for (int i = 0; i  3; i++) {
+if (inst-src[i].file == GRF) {
+   split_grf[inst-src[i].reg] = false;
+}
+ }
   }
}
 
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965/fs: Detect GRF sources in split_virtual_grfs send-from-GRF code.

2013-08-28 Thread Kenneth Graunke
It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the
GRF.  For example, FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD uses src[1] for
the GRF.

To be safe, loop over all the source registers and mark any GRFs.  We
probably won't ever have more than one, but it's simpler to just check
all three rather than attempting to bail early.

Not observed to fix anything yet, but likely to.  Parallels the bug fix
in the previous commit, which actually does fix known failures.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index b770c0e..96cb2ee 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1359,7 +1359,11 @@ fs_visitor::split_virtual_grfs()
* the send is reading the whole thing.
*/
   if (inst-is_send_from_grf()) {
- split_grf[inst-src[0].reg] = false;
+ for (int i = 0; i  3; i++) {
+if (inst-src[i].file == GRF) {
+   split_grf[inst-src[i].reg] = false;
+}
+ }
   }
}
 
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/compute: Fix bug in compute memory pool

2013-08-28 Thread Aaron Watry
The changes look good to me... That seems to be a much more sane way
to add the item to the beginning of the linked list.

I've tested this on CEDAR (Radeon 5400) without any OpenCL
regressions, and the only piglit change was that the new piglit test
created for this bug now passes.

--Aaron


On Tue, Aug 27, 2013 at 10:17 AM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 When adding a new buffer to the beginning of the memory pool, we were
 accidentally deleting the buffer that was first in the buffer list.
 This was caused by a bug in the memory pool's linked list
 implementation.
 ---
  src/gallium/drivers/r600/compute_memory_pool.c | 9 ++---
  1 file changed, 2 insertions(+), 7 deletions(-)

 diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
 b/src/gallium/drivers/r600/compute_memory_pool.c
 index 454af90..4846bfe 100644
 --- a/src/gallium/drivers/r600/compute_memory_pool.c
 +++ b/src/gallium/drivers/r600/compute_memory_pool.c
 @@ -337,14 +337,9 @@ void compute_memory_finalize_pending(struct 
 compute_memory_pool* pool,
 }
 } else {
 /* Add item to the front of the list */
 -   item-next = pool-item_list-next;
 -   if (pool-item_list-next) {
 -   pool-item_list-next-prev = item;
 -   }
 +   item-next = pool-item_list;
 item-prev = pool-item_list-prev;
 -   if (pool-item_list-prev) {
 -   pool-item_list-prev-next = item;
 -   }
 +   pool-item_list-prev = item;
 pool-item_list = item;
 }
 }
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: propagate max_array_access through function calls

2013-08-28 Thread Dominik Behr
Fixes a bug where if an uniform array is passed to a function the accesses
to the array are not propagated so later all but the first vector of the
uniform array are removed in parcel_out_uniform_storage resulting in
broken shaders and out of bounds access to arrays in
brw::vec4_visitor::pack_uniform_registers.

Signed-off-by: Dominik Behr db...@chromium.org
---
 src/glsl/link_functions.cpp | 29 +
 1 file changed, 29 insertions(+)

diff --git a/src/glsl/link_functions.cpp b/src/glsl/link_functions.cpp
index 6b3e154..d935546 100644
--- a/src/glsl/link_functions.cpp
+++ b/src/glsl/link_functions.cpp
@@ -173,6 +173,35 @@ public:
   return visit_continue;
}
 
+   virtual ir_visitor_status visit_leave(ir_call *ir)
+   {
+  /* Traverse list of function parameters, and for array parameters
+ propagate max_array_access, Otherwise arrays that are only referenced
+ from inside functions via function parameters will be incorrectly 
+ optimized. This will lead to incorrect code being generated (or 
worse).
+ Do it when leaving the node so the childen would propagate their 
+ array accesses first */
+
+  const exec_node *formal_param_node = ir-callee-parameters.get_head();
+  const exec_node *actual_param_node = ir-actual_parameters.get_head();
+  while (!actual_param_node-is_tail_sentinel()) {
+ ir_variable *formal_param = (ir_variable *) formal_param_node;
+ ir_rvalue *actual_param = (ir_rvalue *) actual_param_node;
+
+ formal_param_node = formal_param_node-get_next();
+ actual_param_node = actual_param_node-get_next();
+
+ if (formal_param-type-is_array()) {
+ir_dereference_variable *deref = 
actual_param-as_dereference_variable();
+if (deref  deref-var  deref-var-type-is_array()) {
+   deref-var-max_array_access =
+  MAX2(formal_param-max_array_access, 
deref-var-max_array_access);
+}
+ }
+  }
+  return visit_continue;
+   }
+
virtual ir_visitor_status visit(ir_dereference_variable *ir)
{
   if (hash_table_find(locals, ir-var) == NULL) {
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallivm: refactor num_lods handling

2013-08-28 Thread Jose Fonseca
LGTM.

Jose

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 This is just preparation for per-pixel (or per-quad in case of multiple
 quads)
 min/mag filter since some assumptions about number of miplevels being equal
 to number of lods no longer holds true.
 This change does not change behavior yet (though theoretically when forcing
 per-element path it might be slower with different min/mag filter since the
 code will respect this setting even when there's no mip maps now in this
 case,
 so some lod calcs will be done per-element just ultimately still the same
 filter used for all pixels).
 ---
  src/gallium/auxiliary/gallivm/lp_bld_sample.c |  126 +-
  src/gallium/auxiliary/gallivm/lp_bld_sample.h |   13 +-
  src/gallium/auxiliary/gallivm/lp_bld_sample_aos.c |   20 +--
  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  141
  -
  4 files changed, 169 insertions(+), 131 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
 b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
 index 89d7249..e1cfd78 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
 @@ -217,7 +217,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
 struct lp_build_context *float_size_bld = bld-float_size_in_bld;
 struct lp_build_context *float_bld = bld-float_bld;
 struct lp_build_context *coord_bld = bld-coord_bld;
 -   struct lp_build_context *levelf_bld = bld-levelf_bld;
 +   struct lp_build_context *rho_bld = bld-lodf_bld;
 const unsigned dims = bld-dims;
 LLVMValueRef ddx_ddy[2];
 LLVMBuilderRef builder = bld-gallivm-builder;
 @@ -231,7 +231,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
 LLVMValueRef first_level, first_level_vec;
 unsigned length = coord_bld-type.length;
 unsigned num_quads = length / 4;
 -   boolean rho_per_quad = levelf_bld-type.length != length;
 +   boolean rho_per_quad = rho_bld-type.length != length;
 unsigned i;
 LLVMValueRef i32undef =
 LLVMGetUndef(LLVMInt32TypeInContext(gallivm-context));
 LLVMValueRef rho_xvec, rho_yvec;
 @@ -259,18 +259,18 @@ lp_build_rho(struct lp_build_sample_context *bld,
 */
if (rho_per_quad) {
   rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type,
 - levelf_bld-type, cube_rho, 0);
 + rho_bld-type, cube_rho, 0);
}
else {
   rho = lp_build_swizzle_scalar_aos(coord_bld, cube_rho, 0, 4);
}
if (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) {
 - rho = lp_build_sqrt(levelf_bld, rho);
 + rho = lp_build_sqrt(rho_bld, rho);
}
/* Could optimize this for single quad just skip the broadcast */
cubesize = lp_build_extract_broadcast(gallivm,
bld-float_size_in_type,
 -levelf_bld-type, float_size,
 index0);
 -  rho = lp_build_mul(levelf_bld, cubesize, rho);
 +rho_bld-type, float_size,
 index0);
 +  rho = lp_build_mul(rho_bld, cubesize, rho);
 }
 else if (derivs  !(bld-static_texture_state-target ==
 PIPE_TEXTURE_CUBE)) {
LLVMValueRef ddmax[3], ddx[3], ddy[3];
 @@ -311,9 +311,9 @@ lp_build_rho(struct lp_build_sample_context *bld,
   * otherwise would also need different code to per-pixel lod
   case.
   */
  rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type,
 -levelf_bld-type, rho, 0);
 +rho_bld-type, rho, 0);
   }
 - rho = lp_build_sqrt(levelf_bld, rho);
 + rho = lp_build_sqrt(rho_bld, rho);
  
}
else {
 @@ -329,7 +329,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
   * rho_vec contains per-pixel rho, convert to scalar per quad.
   */
  rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type,
 -levelf_bld-type, rho, 0);
 +rho_bld-type, rho, 0);
   }
}
 }
 @@ -404,7 +404,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
  
   if (rho_per_quad) {
  rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type,
 -levelf_bld-type, rho, 0);
 +rho_bld-type, rho, 0);
   }
   else {
  /*
 @@ -416,7 +416,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
   */
  rho = lp_build_swizzle_scalar_aos(coord_bld, rho, 0, 4);
   }
 - rho = lp_build_sqrt(levelf_bld, rho);
 + rho = lp_build_sqrt(rho_bld, rho);
}
else {
   ddx_ddy[0] = 

[Mesa-dev] [PATCH 1/2] i965/gen7: Use the base_level field of the sampler to handle GL's BASE_LEVEL.

2013-08-28 Thread Eric Anholt
This avoids the need to get the inter- and intra-tile offset and adjust
our miptree info based on them.
---
 src/mesa/drivers/dri/i965/gen7_sampler_state.c| 19 +--
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 14 +++---
 2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_sampler_state.c 
b/src/mesa/drivers/dri/i965/gen7_sampler_state.c
index 193b5b1..6162502 100644
--- a/src/mesa/drivers/dri/i965/gen7_sampler_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sampler_state.c
@@ -25,6 +25,7 @@
 #include brw_state.h
 #include brw_defines.h
 #include intel_batchbuffer.h
+#include intel_mipmap_tree.h
 
 #include main/macros.h
 #include main/samplerobj.h
@@ -40,6 +41,8 @@ gen7_update_sampler_state(struct brw_context *brw, int unit, 
int ss_index,
struct gl_context *ctx = brw-ctx;
struct gl_texture_unit *texUnit = ctx-Texture.Unit[unit];
struct gl_texture_object *texObj = texUnit-_Current;
+   struct intel_texture_image *intel_image =
+  intel_texture_image(texObj-Image[0][texObj-BaseLevel]);
struct gl_sampler_object *gl_sampler = _mesa_get_samplerobj(ctx, unit);
bool using_nearest = false;
 
@@ -150,17 +153,13 @@ gen7_update_sampler_state(struct brw_context *brw, int 
unit, int ss_index,
sampler-ss0.lod_preclamp = 1; /* OpenGL mode */
sampler-ss0.default_color_mode = 0; /* OpenGL/DX10 mode */
 
-   /* Set BaseMipLevel, MaxLOD, MinLOD:
-*
-* XXX: I don't think that using firstLevel, lastLevel works,
-* because we always setup the surface state as if firstLevel ==
-* level zero.  Probably have to subtract firstLevel from each of
-* these:
-*/
-   sampler-ss0.base_level = U_FIXED(0, 1);
+   int baselevel = texObj-BaseLevel - intel_image-mt-first_level;
+   sampler-ss0.base_level = U_FIXED(baselevel, 1);
 
-   sampler-ss1.max_lod = U_FIXED(CLAMP(gl_sampler-MaxLod, 0, 13), 8);
-   sampler-ss1.min_lod = U_FIXED(CLAMP(gl_sampler-MinLod, 0, 13), 8);
+   sampler-ss1.max_lod = U_FIXED(CLAMP(baselevel +
+gl_sampler-MaxLod, 0, 13), 8);
+   sampler-ss1.min_lod = U_FIXED(CLAMP(baselevel +
+gl_sampler-MinLod, 0, 13), 8);
 
/* The sampler can handle non-normalized texture rectangle coordinates
 * natively
diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
index 91f854b..b68e2c2 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
@@ -284,8 +284,8 @@ gen7_update_texture_surface(struct gl_context *ctx,
struct intel_texture_object *intelObj = intel_texture_object(tObj);
struct intel_mipmap_tree *mt = intelObj-mt;
struct gl_texture_image *firstImage = tObj-Image[0][tObj-BaseLevel];
+   struct intel_texture_image *intel_image = intel_texture_image(firstImage);
struct gl_sampler_object *sampler = _mesa_get_samplerobj(ctx, unit);
-   uint32_t tile_x, tile_y;
 
if (tObj-Target == GL_TEXTURE_BUFFER) {
   gen7_update_buffer_texture_surface(ctx, unit, binding_table, surf_index);
@@ -318,8 +318,6 @@ gen7_update_texture_surface(struct gl_context *ctx,
   surf[0] |= GEN7_SURFACE_ARYSPC_LOD0;
 
surf[1] = mt-region-bo-offset + mt-offset; /* reloc */
-   surf[1] += intel_miptree_get_tile_offsets(intelObj-mt, firstImage-Level, 
0,
- tile_x, tile_y);
 
surf[2] = SET_FIELD(mt-logical_width0 - 1, GEN7_SURFACE_WIDTH) |
  SET_FIELD(mt-logical_height0 - 1, GEN7_SURFACE_HEIGHT);
@@ -328,15 +326,9 @@ gen7_update_texture_surface(struct gl_context *ctx,
 
surf[4] = gen7_surface_msaa_bits(mt-num_samples, mt-msaa_layout);
 
-   assert(brw-has_surface_tile_offset || (tile_x == 0  tile_y == 0));
-   /* Note that the low bits of these fields are missing, so
-* there's the possibility of getting in trouble.
-*/
-   surf[5] = ((tile_x / 4)  BRW_SURFACE_X_OFFSET_SHIFT |
-  (tile_y / 2)  BRW_SURFACE_Y_OFFSET_SHIFT |
-  SET_FIELD(GEN7_MOCS_L3, GEN7_SURFACE_MOCS) |
+   surf[5] = (SET_FIELD(GEN7_MOCS_L3, GEN7_SURFACE_MOCS) |
   /* mip count */
-  (intelObj-_MaxLevel - tObj-BaseLevel));
+  (intelObj-_MaxLevel - intel_image-mt-first_level));
 
if (brw-is_haswell) {
   /* Handling GL_ALPHA as a surface format override breaks 1.30+ style
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Switch gen4-6 to using the sampler's base level for GL BASE_LEVEL.

2013-08-28 Thread Eric Anholt
Thanks to Ken for trawling through my neglected public branches and
finding the bug in this change (inside a megacommit) that made me abandon
this work.
---
 src/mesa/drivers/dri/i965/brw_wm_sampler_state.c | 19 +--
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 16 +++-
 2 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
index f2117a4..1f46f91 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
@@ -33,6 +33,7 @@
 #include brw_context.h
 #include brw_state.h
 #include brw_defines.h
+#include intel_mipmap_tree.h
 
 #include main/macros.h
 #include main/samplerobj.h
@@ -201,6 +202,8 @@ static void brw_update_sampler_state(struct brw_context 
*brw,
struct gl_context *ctx = brw-ctx;
struct gl_texture_unit *texUnit = ctx-Texture.Unit[unit];
struct gl_texture_object *texObj = texUnit-_Current;
+   struct intel_texture_image *intel_image =
+  intel_texture_image(texObj-Image[0][texObj-BaseLevel]);
struct gl_sampler_object *gl_sampler = _mesa_get_samplerobj(ctx, unit);
bool using_nearest = false;
 
@@ -319,17 +322,13 @@ static void brw_update_sampler_state(struct brw_context 
*brw,
sampler-ss0.lod_preclamp = 1; /* OpenGL mode */
sampler-ss0.default_color_mode = 0; /* OpenGL/DX10 mode */
 
-   /* Set BaseMipLevel, MaxLOD, MinLOD: 
-*
-* XXX: I don't think that using firstLevel, lastLevel works,
-* because we always setup the surface state as if firstLevel ==
-* level zero.  Probably have to subtract firstLevel from each of
-* these:
-*/
-   sampler-ss0.base_level = U_FIXED(0, 1);
+   int baselevel = texObj-BaseLevel - intel_image-mt-first_level;
+   sampler-ss0.base_level = U_FIXED(baselevel, 1);
 
-   sampler-ss1.max_lod = U_FIXED(CLAMP(gl_sampler-MaxLod, 0, 13), 6);
-   sampler-ss1.min_lod = U_FIXED(CLAMP(gl_sampler-MinLod, 0, 13), 6);
+   sampler-ss1.max_lod = U_FIXED(CLAMP(baselevel +
+gl_sampler-MaxLod, 0, 13), 6);
+   sampler-ss1.min_lod = U_FIXED(CLAMP(baselevel +
+gl_sampler-MinLod, 0, 13), 6);
 
/* On Gen6+, the sampler can handle non-normalized texture
 * rectangle coordinates natively
diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index e2c7b77..8bc3938 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -255,9 +255,9 @@ brw_update_texture_surface(struct gl_context *ctx,
struct intel_texture_object *intelObj = intel_texture_object(tObj);
struct intel_mipmap_tree *mt = intelObj-mt;
struct gl_texture_image *firstImage = tObj-Image[0][tObj-BaseLevel];
+   struct intel_texture_image *intel_image = intel_texture_image(firstImage);
struct gl_sampler_object *sampler = _mesa_get_samplerobj(ctx, unit);
uint32_t *surf;
-   uint32_t tile_x, tile_y;
 
if (tObj-Target == GL_TEXTURE_BUFFER) {
   brw_update_buffer_texture_surface(ctx, unit, binding_table, surf_index);
@@ -277,10 +277,8 @@ brw_update_texture_surface(struct gl_context *ctx,
   BRW_SURFACE_FORMAT_SHIFT));
 
surf[1] = intelObj-mt-region-bo-offset + intelObj-mt-offset; /* reloc 
*/
-   surf[1] += intel_miptree_get_tile_offsets(intelObj-mt, firstImage-Level, 
0,
- tile_x, tile_y);
 
-   surf[2] = ((intelObj-_MaxLevel - tObj-BaseLevel)  BRW_SURFACE_LOD_SHIFT 
|
+   surf[2] = ((intelObj-_MaxLevel - intel_image-mt-first_level)  
BRW_SURFACE_LOD_SHIFT |
  (mt-logical_width0 - 1)  BRW_SURFACE_WIDTH_SHIFT |
  (mt-logical_height0 - 1)  BRW_SURFACE_HEIGHT_SHIFT);
 
@@ -291,15 +289,7 @@ brw_update_texture_surface(struct gl_context *ctx,
 
surf[4] = brw_get_surface_num_multisamples(intelObj-mt-num_samples);
 
-   assert(brw-has_surface_tile_offset || (tile_x == 0  tile_y == 0));
-   /* Note that the low bits of these fields are missing, so
-* there's the possibility of getting in trouble.
-*/
-   assert(tile_x % 4 == 0);
-   assert(tile_y % 2 == 0);
-   surf[5] = ((tile_x / 4)  BRW_SURFACE_X_OFFSET_SHIFT |
- (tile_y / 2)  BRW_SURFACE_Y_OFFSET_SHIFT |
- (mt-align_h == 4 ? BRW_SURFACE_VERTICAL_ALIGN_ENABLE : 0));
+   surf[5] = mt-align_h == 4 ? BRW_SURFACE_VERTICAL_ALIGN_ENABLE : 0;
 
/* Emit relocation to surface contents */
drm_intel_bo_emit_reloc(brw-batch.bo,
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] i965: Avoid flushing the batch for every blorp op.

2013-08-28 Thread Eric Anholt
Paul Berry stereotype...@gmail.com writes:

 On 27 August 2013 15:21, Eric Anholt e...@anholt.net wrote:

 This brings over the batch-wrap-prevention and aperture space checking
 code from the normal brw_draw.c path, so that we don't need to flush the
 batch every time.

 There's a risk here if the intel_emit_post_sync_nonzero_flush() call isn't
 high enough up in the state emit sequences -- before, we implicitly had
 one at the batch flush before any state was emitted, so Mesa's workaround
 emits didn't really matter.

 Improves cairo-gl performance by 13.7733% +/- 1.74876% (n=30/32)
 Improves minecraft apitrace performance by 1.03183% +/- 0.482297% (n=90).
 Reduces low-resolution GLB 2.7 performance by 1.17553% +/- 0.432263% (n=88)
 Reduces Lightsmark performance by 3.70246% +/- 0.322432% (n=126)
 No statistically significant performance difference on unigine tropics
 (n=10)
 No statistically significant performance difference on openarena (n=755)

 The two apps that are hurt happen to include stalls on busy buffer
 objects, so I think this is an effect of missing out on an opportune
 flush.
 ---
  src/mesa/drivers/dri/i965/brw_blorp.cpp  | 50
 
  src/mesa/drivers/dri/i965/brw_blorp.h|  4 ---
  src/mesa/drivers/dri/i965/gen6_blorp.cpp | 12 
  src/mesa/drivers/dri/i965/gen7_blorp.cpp |  1 -
  4 files changed, 50 insertions(+), 17 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp
 b/src/mesa/drivers/dri/i965/brw_blorp.cpp
 index 1576ff2..c566d1d 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
 @@ -21,6 +21,7 @@
   * IN THE SOFTWARE.
   */

 +#include errno.h
  #include intel_batchbuffer.h
  #include intel_fbo.h

 @@ -191,6 +192,26 @@ intel_hiz_exec(struct brw_context *brw, struct
 intel_mipmap_tree *mt,
  void
  brw_blorp_exec(struct brw_context *brw, const brw_blorp_params *params)
  {
 +   struct gl_context *ctx = brw-ctx;
 +   uint32_t estimated_max_batch_usage = 1500;
 +   bool check_aperture_failed_once = false;
 +
 +   /* Flush the sampler and render caches.  We definitely need to flush
 the
 +* sampler cache so that we get updated contents from the render cache
 for
 +* the glBlitFramebuffer() source.  Also, we are sometimes warned in
 the
 +* docs to flush the cache between reinterpretations of the same
 surface
 +* data with different formats, which blorp does for stencil and depth
 +* data.
 +*/
 +   intel_batchbuffer_emit_mi_flush(brw);
 +
 +retry:
 +   intel_batchbuffer_require_space(brw, estimated_max_batch_usage, false);
 +   intel_batchbuffer_save_state(brw);
 +   drm_intel_bo *saved_bo = brw-batch.bo;
 +   uint32_t saved_used = brw-batch.used;
 +   uint32_t saved_state_batch_offset = brw-batch.state_batch_offset;
 +
 switch (brw-gen) {
 case 6:
gen6_blorp_exec(brw, params);
 @@ -204,6 +225,35 @@ brw_blorp_exec(struct brw_context *brw, const
 brw_blorp_params *params)
break;
 }


 Would it be feasible to add an assertion here to verify that the amount of
 batch space actually used by this blorp call is less than or equal to
 estimated_max_batch_usage?  That would give me a lot of increased
 confidence that the magic number 1500 is correct.

 With the added assertion, the series is:

 Reviewed-by: Paul Berry stereotype...@gmail.com

That's this code:

+   /* Make sure we didn't wrap the batch unintentionally, and make sure we
+* reserved enough space that a wrap will never happen.
+*/
+   assert(brw-batch.bo == saved_bo);
+   assert((brw-batch.used - saved_used) * 4 +
+  (saved_state_batch_offset - brw-batch.state_batch_offset) 
+  estimated_max_batch_usage);


pgpKlts8zbMLd.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965/gen7: Use the base_level field of the sampler to handle GL's BASE_LEVEL.

2013-08-28 Thread Kenneth Graunke

On 08/28/2013 03:27 PM, Eric Anholt wrote:

This avoids the need to get the inter- and intra-tile offset and adjust
our miptree info based on them.
---
  src/mesa/drivers/dri/i965/gen7_sampler_state.c| 19 +--
  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 14 +++---
  2 files changed, 12 insertions(+), 21 deletions(-)


This miniseries is:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] draw: fix point/line/triangle determination in draw_need_pipeline()

2013-08-28 Thread Brian Paul
The previous point/line/triangle() functions didn't handle GS primitives.
---
 src/gallium/auxiliary/draw/draw_pipe_validate.c |   31 +--
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_validate.c 
b/src/gallium/auxiliary/draw/draw_pipe_validate.c
index 3562acd..356f4d6 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_validate.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_validate.c
@@ -30,28 +30,13 @@
 
 #include util/u_memory.h
 #include util/u_math.h
+#include util/u_prim.h
 #include pipe/p_defines.h
 #include draw_private.h
 #include draw_pipe.h
 #include draw_context.h
 #include draw_vbuf.h
 
-static boolean points( unsigned prim )
-{
-   return (prim == PIPE_PRIM_POINTS);
-}
-
-static boolean lines( unsigned prim )
-{
-   return (prim == PIPE_PRIM_LINES ||
-   prim == PIPE_PRIM_LINE_STRIP ||
-   prim == PIPE_PRIM_LINE_LOOP);
-}
-
-static boolean triangles( unsigned prim )
-{
-   return prim = PIPE_PRIM_TRIANGLES;
-}
 
 /**
  * Default version of a function to check if we need any special
@@ -66,6 +51,8 @@ draw_need_pipeline(const struct draw_context *draw,
const struct pipe_rasterizer_state *rasterizer,
unsigned int prim )
 {
+   unsigned reduced_prim = u_reduced_prim(prim);
+
/* If the driver has overridden this, use that version: 
 */
if (draw-render 
@@ -80,8 +67,7 @@ draw_need_pipeline(const struct draw_context *draw,
 * and triggering the pipeline, because we have to trigger the
 * pipeline *anyway* if unfilled mode is active.
 */
-   if (lines(prim)) 
-   {
+   if (reduced_prim == PIPE_PRIM_LINES) {
   /* line stipple */
   if (rasterizer-line_stipple_enable  draw-pipeline.line_stipple)
  return TRUE;
@@ -97,9 +83,7 @@ draw_need_pipeline(const struct draw_context *draw,
   if (draw_current_shader_num_written_culldistances(draw))
  return TRUE;
}
-
-   if (points(prim))
-   {
+   else if (reduced_prim == PIPE_PRIM_POINTS) {
   /* large points */
   if (rasterizer-point_size  draw-pipeline.wide_point_threshold)
  return TRUE;
@@ -117,10 +101,7 @@ draw_need_pipeline(const struct draw_context *draw,
   if (rasterizer-sprite_coord_enable  draw-pipeline.point_sprite)
  return TRUE;
}
-
-
-   if (triangles(prim)) 
-   {
+   else if (reduced_prim == PIPE_PRIM_TRIANGLES) {
   /* polygon stipple */
   if (rasterizer-poly_stipple_enable  draw-pipeline.pstipple)
  return TRUE;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965/vs: Detect GRF sources in split_virtual_grfs send-from-GRF code.

2013-08-28 Thread Eric Anholt
Kenneth Graunke kenn...@whitecape.org writes:

 It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF.
 VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 uses an IMM as src[0], and stores the
 GRF as src[1].

 To be safe, loop over all the source registers and mark any GRFs.  We
 probably won't ever have more than one, but it's simpler to just check
 all three rather than attempting to bail early.

 Fixes assertion failures in Unigine Sanctuary since we started making
 register allocation rely on split_virtual_grfs working.  (The register
 classes were actually sufficient, we were just interpreting an IMM as
 a virtual GRF number.)

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68637
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 Cc: mesa-sta...@lists.freedesktop.org

These are:

Reviewed-by: Eric Anholt e...@anholt.net


pgpjjyMq4hDQB.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw: fix point/line/triangle determination in draw_need_pipeline()

2013-08-28 Thread Roland Scheidegger
Am 29.08.2013 01:14, schrieb Brian Paul:
 The previous point/line/triangle() functions didn't handle GS primitives.
 ---
  src/gallium/auxiliary/draw/draw_pipe_validate.c |   31 
 +--
  1 file changed, 6 insertions(+), 25 deletions(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_pipe_validate.c 
 b/src/gallium/auxiliary/draw/draw_pipe_validate.c
 index 3562acd..356f4d6 100644
 --- a/src/gallium/auxiliary/draw/draw_pipe_validate.c
 +++ b/src/gallium/auxiliary/draw/draw_pipe_validate.c
 @@ -30,28 +30,13 @@
  
  #include util/u_memory.h
  #include util/u_math.h
 +#include util/u_prim.h
  #include pipe/p_defines.h
  #include draw_private.h
  #include draw_pipe.h
  #include draw_context.h
  #include draw_vbuf.h
  
 -static boolean points( unsigned prim )
 -{
 -   return (prim == PIPE_PRIM_POINTS);
 -}
 -
 -static boolean lines( unsigned prim )
 -{
 -   return (prim == PIPE_PRIM_LINES ||
 -   prim == PIPE_PRIM_LINE_STRIP ||
 -   prim == PIPE_PRIM_LINE_LOOP);
 -}
 -
 -static boolean triangles( unsigned prim )
 -{
 -   return prim = PIPE_PRIM_TRIANGLES;
 -}
  
  /**
   * Default version of a function to check if we need any special
 @@ -66,6 +51,8 @@ draw_need_pipeline(const struct draw_context *draw,
 const struct pipe_rasterizer_state *rasterizer,
 unsigned int prim )
  {
 +   unsigned reduced_prim = u_reduced_prim(prim);
 +
 /* If the driver has overridden this, use that version: 
  */
 if (draw-render 
 @@ -80,8 +67,7 @@ draw_need_pipeline(const struct draw_context *draw,
  * and triggering the pipeline, because we have to trigger the
  * pipeline *anyway* if unfilled mode is active.
  */
 -   if (lines(prim)) 
 -   {
 +   if (reduced_prim == PIPE_PRIM_LINES) {
/* line stipple */
if (rasterizer-line_stipple_enable  draw-pipeline.line_stipple)
   return TRUE;
 @@ -97,9 +83,7 @@ draw_need_pipeline(const struct draw_context *draw,
if (draw_current_shader_num_written_culldistances(draw))
   return TRUE;
 }
 -
 -   if (points(prim))
 -   {
 +   else if (reduced_prim == PIPE_PRIM_POINTS) {
/* large points */
if (rasterizer-point_size  draw-pipeline.wide_point_threshold)
   return TRUE;
 @@ -117,10 +101,7 @@ draw_need_pipeline(const struct draw_context *draw,
if (rasterizer-sprite_coord_enable  draw-pipeline.point_sprite)
   return TRUE;
 }
 -
 -
 -   if (triangles(prim)) 
 -   {
 +   else if (reduced_prim == PIPE_PRIM_TRIANGLES) {
/* polygon stipple */
if (rasterizer-poly_stipple_enable  draw-pipeline.pstipple)
   return TRUE;
 

Reviewed-by: Roland Scheidegger srol...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallivm: support per-pixel min/mag filter in SoA path

2013-08-28 Thread sroland
From: Roland Scheidegger srol...@vmware.com

Since we can have per-pixel lod we should also honor the filter per-pixel
(in fact we didn't honor it per quad neither in the multiple quad case).
Do this by running the linear path and simply beating the weights into shape
(the sample with the higher weight is the one which should have been chosen
with nearest filtering hence adjust filter weight to 1.0/0.0 based on that).
If all pixels use nearest filter (either min and mag) then still run just a
nearest filter as this is way cheaper (probably around 4 times faster for 2d,
more for 3d case) and it should be relatively rare that pixels really need
different filtering. OTOH if all pixels would require linear don't do anything
special since the linear path with filter adjustments shouldn't really be all
that much more expensive than ordinary linear, and we think it's rare that
min/mag filters are configured differently so there doesn't seem much value
in trying to optimize this further.
This does not yet fix the AoS path (though currently AoS is only used for
single quads hence it could be considered less broken, just never honoring
per-pixel filter decision but doing it per quad).
---
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  320 ++---
 1 file changed, 276 insertions(+), 44 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
index c686d82..5c5ab87 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
@@ -827,11 +827,14 @@ lp_build_masklerp2d(struct lp_build_context *bld,
 /**
  * Generate code to sample a mipmap level with linear filtering.
  * If sampling a cube texture, r = cube face in [0,5].
+ * If linear_mask is present, only pixels having their mask set
+ * will receive linear filtering, the rest will use nearest.
  */
 static void
 lp_build_sample_image_linear(struct lp_build_sample_context *bld,
  unsigned sampler_unit,
  LLVMValueRef size,
+ LLVMValueRef linear_mask,
  LLVMValueRef row_stride_vec,
  LLVMValueRef img_stride_vec,
  LLVMValueRef data_ptr,
@@ -905,6 +908,31 @@ lp_build_sample_image_linear(struct 
lp_build_sample_context *bld,
   lp_build_name(z1, tex.z1.layer);
}
 
+   if (linear_mask) {
+  /*
+   * Whack filter weights into place. Whatever pixel had more weight is
+   * the one which should have been selected by nearest filtering hence
+   * just use 100% weight for it.
+   */
+  struct lp_build_context *c_bld = bld-coord_bld;
+  LLVMValueRef w1_mask, w1_weight;
+  LLVMValueRef half = lp_build_const_vec(bld-gallivm, c_bld-type, 0.5f);
+
+  w1_mask = lp_build_cmp(c_bld, PIPE_FUNC_GREATER, s_fpart, half);
+  /* this select is really just a and */
+  w1_weight = lp_build_select(c_bld, w1_mask, c_bld-one, c_bld-zero);
+  s_fpart = lp_build_select(c_bld, linear_mask, s_fpart, w1_weight);
+  if (dims = 2) {
+ w1_mask = lp_build_cmp(c_bld, PIPE_FUNC_GREATER, t_fpart, half);
+ w1_weight = lp_build_select(c_bld, w1_mask, c_bld-one, c_bld-zero);
+ t_fpart = lp_build_select(c_bld, linear_mask, t_fpart, w1_weight);
+ if (dims == 3) {
+w1_mask = lp_build_cmp(c_bld, PIPE_FUNC_GREATER, r_fpart, half);
+w1_weight = lp_build_select(c_bld, w1_mask, c_bld-one, 
c_bld-zero);
+r_fpart = lp_build_select(c_bld, linear_mask, r_fpart, w1_weight);
+ }
+  }
+   }
 
/*
 * Get texture colors.
@@ -1053,8 +1081,8 @@ lp_build_sample_image_linear(struct 
lp_build_sample_context *bld,
 
 /**
  * Sample the texture/mipmap using given image filter and mip filter.
- * data0_ptr and data1_ptr point to the two mipmap levels to sample
- * from.  width0/1_vec, height0/1_vec, depth0/1_vec indicate their sizes.
+ * ilevel0 and ilevel1 indicate the two mipmap levels to sample
+ * from (vectors or scalars).
  * If we're using nearest miplevel sampling the '1' values will be null/unused.
  */
 static void
@@ -1105,7 +1133,7 @@ lp_build_sample_mipmap(struct lp_build_sample_context 
*bld,
else {
   assert(img_filter == PIPE_TEX_FILTER_LINEAR);
   lp_build_sample_image_linear(bld, sampler_unit,
-   size0,
+   size0, NULL,
row_stride0_vec, img_stride0_vec,
data_ptr0, mipoff0, coords, offsets,
colors0);
@@ -1131,15 +1159,8 @@ lp_build_sample_mipmap(struct lp_build_sample_context 
*bld,
   * We'll do mip filtering if any of the quads (or individual
   * pixel in case of per-pixel lod) need it.
   * It might be better to split the vectors here and only 

[Mesa-dev] [PATCH 1/6] i965: Remove unused ATTRIB_BIT_DWORDS define.

2013-08-28 Thread Kenneth Graunke
---
 src/mesa/drivers/dri/i965/brw_context.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index c456e61..3cb6dc6 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -780,13 +780,6 @@ struct brw_cached_batch_item {
struct brw_cached_batch_item *next;
 };

-
-
-/* Protect against a future where VERT_ATTRIB_MAX  32.  Wouldn't life
- * be easier if C allowed arrays of packed elements?
- */
-#define ATTRIB_BIT_DWORDS  ((VERT_ATTRIB_MAX+31)/32)
-
 struct brw_vertex_buffer {
/** Buffer object containing the uploaded vertex data */
drm_intel_bo *bo;
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] i965: Combine brw_emit_prim and gen7_emit_prim.

2013-08-28 Thread Kenneth Graunke
These functions have almost identical code; the only difference is that
a few of the bits moved around.  Adding a few trivial conditionals
allows the same function to work on all generations, and the resulting
code is still quite readable.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_draw.c | 80 
 1 file changed, 17 insertions(+), 63 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index c7164ac..df9b750 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -171,11 +171,15 @@ static void brw_emit_prim(struct brw_context *brw,
start_vertex_location = prim-start;
base_vertex_location = prim-basevertex;
if (prim-indexed) {
-  vertex_access_type = GEN4_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM;
+  vertex_access_type = brw-gen = 7 ?
+ GEN7_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM :
+ GEN4_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM;
   start_vertex_location += brw-ib.start_vertex_offset;
   base_vertex_location += brw-vb.start_vertex_bias;
} else {
-  vertex_access_type = GEN4_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL;
+  vertex_access_type = brw-gen = 7 ?
+ GEN7_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL :
+ GEN4_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL;
   start_vertex_location += brw-vb.start_vertex_bias;
}
 
@@ -198,65 +202,16 @@ static void brw_emit_prim(struct brw_context *brw,
   intel_batchbuffer_emit_mi_flush(brw);
}
 
-   BEGIN_BATCH(6);
-   OUT_BATCH(CMD_3D_PRIM  16 | (6 - 2) |
-hw_prim  GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT |
-vertex_access_type);
-   OUT_BATCH(verts_per_instance);
-   OUT_BATCH(start_vertex_location);
-   OUT_BATCH(prim-num_instances);
-   OUT_BATCH(prim-base_instance);
-   OUT_BATCH(base_vertex_location);
-   ADVANCE_BATCH();
-
-   brw-batch.need_workaround_flush = true;
-
-   if (brw-always_flush_cache) {
-  intel_batchbuffer_emit_mi_flush(brw);
-   }
-}
-
-static void gen7_emit_prim(struct brw_context *brw,
-  const struct _mesa_prim *prim,
-  uint32_t hw_prim)
-{
-   int verts_per_instance;
-   int vertex_access_type;
-   int start_vertex_location;
-   int base_vertex_location;
-
-   DBG(PRIM: %s %d %d\n, _mesa_lookup_enum_by_nr(prim-mode),
-   prim-start, prim-count);
-
-   start_vertex_location = prim-start;
-   base_vertex_location = prim-basevertex;
-   if (prim-indexed) {
-  vertex_access_type = GEN7_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM;
-  start_vertex_location += brw-ib.start_vertex_offset;
-  base_vertex_location += brw-vb.start_vertex_bias;
+   if (brw-gen = 7) {
+  BEGIN_BATCH(7);
+  OUT_BATCH(CMD_3D_PRIM  16 | (7 - 2));
+  OUT_BATCH(hw_prim | vertex_access_type);
} else {
-  vertex_access_type = GEN7_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL;
-  start_vertex_location += brw-vb.start_vertex_bias;
+  BEGIN_BATCH(6);
+  OUT_BATCH(CMD_3D_PRIM  16 | (6 - 2) |
+hw_prim  GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT |
+vertex_access_type);
}
-
-   verts_per_instance = prim-count;
-
-   /* If nothing to emit, just return. */
-   if (verts_per_instance == 0)
-  return;
-
-   /* If we're set to always flush, do it before and after the primitive emit.
-* We want to catch both missed flushes that hurt instruction/state cache
-* and missed flushes of the render cache as it heads to other parts of
-* the besides the draw code.
-*/
-   if (brw-always_flush_cache) {
-  intel_batchbuffer_emit_mi_flush(brw);
-   }
-
-   BEGIN_BATCH(7);
-   OUT_BATCH(CMD_3D_PRIM  16 | (7 - 2));
-   OUT_BATCH(hw_prim | vertex_access_type);
OUT_BATCH(verts_per_instance);
OUT_BATCH(start_vertex_location);
OUT_BATCH(prim-num_instances);
@@ -264,6 +219,8 @@ static void gen7_emit_prim(struct brw_context *brw,
OUT_BATCH(base_vertex_location);
ADVANCE_BATCH();
 
+   brw-batch.need_workaround_flush = true;
+
if (brw-always_flush_cache) {
   intel_batchbuffer_emit_mi_flush(brw);
}
@@ -453,10 +410,7 @@ retry:
 brw_upload_state(brw);
   }
 
-  if (brw-gen = 7)
-gen7_emit_prim(brw, prim[i], brw-primitive);
-  else
-brw_emit_prim(brw, prim[i], brw-primitive);
+  brw_emit_prim(brw, prim[i], brw-primitive);
 
   brw-no_batch_wrap = false;
 
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] i965: Use the proper element of the prim array in brw_try_draw_prims.

2013-08-28 Thread Kenneth Graunke
The VBO module actually calls us with an array of _mesa_prim objects.
For example, it may break up a DrawArrays() call into multiple
primitives when primitive restart is enabled.

Previously, we treated prim like a pointer, always accessing element 0.
This worked because all of the primitive objects in a single draw call
have the same value for num_instances and basevertex.

However, accessing an array as a pointer and using the wrong object's
fields is misleading.  For stylistic reasons alone, we should use the
right object.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_draw.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index df9b750..2583a6f 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -386,12 +386,12 @@ static bool brw_try_draw_prims( struct gl_context *ctx,
   intel_batchbuffer_require_space(brw, estimated_max_prim_size, false);
   intel_batchbuffer_save_state(brw);
 
-  if (brw-num_instances != prim-num_instances) {
- brw-num_instances = prim-num_instances;
+  if (brw-num_instances != prim[i].num_instances) {
+ brw-num_instances = prim[i].num_instances;
  brw-state.dirty.brw |= BRW_NEW_VERTICES;
   }
-  if (brw-basevertex != prim-basevertex) {
- brw-basevertex = prim-basevertex;
+  if (brw-basevertex != prim[i].basevertex) {
+ brw-basevertex = prim[i].basevertex;
  brw-state.dirty.brw |= BRW_NEW_VERTICES;
   }
   if (brw-gen  6)
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] i965: Clarify that we only check one prim's type for cut index support.

2013-08-28 Thread Kenneth Graunke
can_cut_index_handle_prims() was passed an array of _mesa_prim objects
and a count, and runs a loop for that many iterations.  However, it
treats the array like a pointer, repeatedly checking the first element.

This is wasteful and bizarre.

The VBO module will never call us with multiple primitives of different
topologies, so it's actually reasonable to just check the first element.

Once.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_primitive_restart.c | 37 +++
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_primitive_restart.c 
b/src/mesa/drivers/dri/i965/brw_primitive_restart.c
index 0dbc48f..ca2e6b7 100644
--- a/src/mesa/drivers/dri/i965/brw_primitive_restart.c
+++ b/src/mesa/drivers/dri/i965/brw_primitive_restart.c
@@ -76,7 +76,6 @@ can_cut_index_handle_restart_index(struct gl_context *ctx,
 static bool
 can_cut_index_handle_prims(struct gl_context *ctx,
const struct _mesa_prim *prim,
-   GLuint nr_prims,
const struct _mesa_index_buffer *ib)
 {
struct brw_context *brw = brw_context(ctx);
@@ -92,24 +91,22 @@ can_cut_index_handle_prims(struct gl_context *ctx,
   return false;
}
 
-   for ( ; nr_prims  0; nr_prims--) {
-  switch(prim-mode) {
-  case GL_POINTS:
-  case GL_LINES:
-  case GL_LINE_STRIP:
-  case GL_TRIANGLES:
-  case GL_TRIANGLE_STRIP:
- /* Cut index supports these primitive types */
- break;
-  default:
- /* Cut index does not support these primitive types */
-  //case GL_LINE_LOOP:
-  //case GL_TRIANGLE_FAN:
-  //case GL_QUADS:
-  //case GL_QUAD_STRIP:
-  //case GL_POLYGON:
- return false;
-  }
+   switch (prim-mode) {
+   case GL_POINTS:
+   case GL_LINES:
+   case GL_LINE_STRIP:
+   case GL_TRIANGLES:
+   case GL_TRIANGLE_STRIP:
+  /* Cut index supports these primitive types */
+  break;
+   default:
+  /* Cut index does not support these primitive types */
+   //case GL_LINE_LOOP:
+   //case GL_TRIANGLE_FAN:
+   //case GL_QUADS:
+   //case GL_QUAD_STRIP:
+   //case GL_POLYGON:
+  return false;
}
 
return true;
@@ -161,7 +158,7 @@ brw_handle_primitive_restart(struct gl_context *ctx,
 */
brw-prim_restart.in_progress = true;
 
-   if (can_cut_index_handle_prims(ctx, prim, nr_prims, ib)) {
+   if (can_cut_index_handle_prims(ctx, prim[0], ib)) {
   /* Cut index should work for primitive restart, so use it
*/
   brw-prim_restart.enable_cut_index = true;
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] i965: Rename prim parameter to prims where it's an array.

2013-08-28 Thread Kenneth Graunke
Some drawing functions take a single _mesa_prim object, while others
take an array of primitives.  Both kinds of functions used a parameter
called prim (the singular form), which was confusing.

Using the plural form, prims, clearly communicates that the parameter
is an array of primitives.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_draw.c  | 26 +++
 src/mesa/drivers/dri/i965/brw_draw.h  |  2 +-
 src/mesa/drivers/dri/i965/brw_primitive_restart.c |  8 +++
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 2583a6f..d14f7f0 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -314,7 +314,7 @@ static void brw_postdraw_set_buffers_need_resolve(struct 
brw_context *brw)
  */
 static bool brw_try_draw_prims( struct gl_context *ctx,
 const struct gl_client_array *arrays[],
-const struct _mesa_prim *prim,
+const struct _mesa_prim *prims,
 GLuint nr_prims,
 const struct _mesa_index_buffer *ib,
 GLuint min_index,
@@ -386,18 +386,18 @@ static bool brw_try_draw_prims( struct gl_context *ctx,
   intel_batchbuffer_require_space(brw, estimated_max_prim_size, false);
   intel_batchbuffer_save_state(brw);
 
-  if (brw-num_instances != prim[i].num_instances) {
- brw-num_instances = prim[i].num_instances;
+  if (brw-num_instances != prims[i].num_instances) {
+ brw-num_instances = prims[i].num_instances;
  brw-state.dirty.brw |= BRW_NEW_VERTICES;
   }
-  if (brw-basevertex != prim[i].basevertex) {
- brw-basevertex = prim[i].basevertex;
+  if (brw-basevertex != prims[i].basevertex) {
+ brw-basevertex = prims[i].basevertex;
  brw-state.dirty.brw |= BRW_NEW_VERTICES;
   }
   if (brw-gen  6)
-brw_set_prim(brw, prim[i]);
+brw_set_prim(brw, prims[i]);
   else
-gen6_set_prim(brw, prim[i]);
+gen6_set_prim(brw, prims[i]);
 
 retry:
   /* Note that before the loop, brw-state.dirty.brw was set to != 0, and
@@ -410,7 +410,7 @@ retry:
 brw_upload_state(brw);
   }
 
-  brw_emit_prim(brw, prim[i], brw-primitive);
+  brw_emit_prim(brw, prims[i], brw-primitive);
 
   brw-no_batch_wrap = false;
 
@@ -446,7 +446,7 @@ retry:
 }
 
 void brw_draw_prims( struct gl_context *ctx,
-const struct _mesa_prim *prim,
+const struct _mesa_prim *prims,
 GLuint nr_prims,
 const struct _mesa_index_buffer *ib,
 GLboolean index_bounds_valid,
@@ -461,7 +461,7 @@ void brw_draw_prims( struct gl_context *ctx,
   return;
 
/* Handle primitive restart if needed */
-   if (brw_handle_primitive_restart(ctx, prim, nr_prims, ib)) {
+   if (brw_handle_primitive_restart(ctx, prims, nr_prims, ib)) {
   /* The draw was handled, so we can exit now */
   return;
}
@@ -471,7 +471,7 @@ void brw_draw_prims( struct gl_context *ctx,
 * to upload.
 */
if (!vbo_all_varyings_in_vbos(arrays)  !index_bounds_valid)
-  vbo_get_minmax_indices(ctx, prim, ib, min_index, max_index, nr_prims);
+  vbo_get_minmax_indices(ctx, prims, ib, min_index, max_index, nr_prims);
 
/* Do GL_SELECT and GL_FEEDBACK rendering using swrast, even though it
 * won't support all the extensions we support.
@@ -481,7 +481,7 @@ void brw_draw_prims( struct gl_context *ctx,
  _mesa_lookup_enum_by_nr(ctx-RenderMode));
   _swsetup_Wakeup(ctx);
   _tnl_wakeup(ctx);
-  _tnl_draw_prims(ctx, arrays, prim, nr_prims, ib, min_index, max_index);
+  _tnl_draw_prims(ctx, arrays, prims, nr_prims, ib, min_index, max_index);
   return;
}
 
@@ -489,7 +489,7 @@ void brw_draw_prims( struct gl_context *ctx,
 * manage it.  swrast doesn't support our featureset, so we can't fall back
 * to it.
 */
-   brw_try_draw_prims(ctx, arrays, prim, nr_prims, ib, min_index, max_index);
+   brw_try_draw_prims(ctx, arrays, prims, nr_prims, ib, min_index, max_index);
 }
 
 void brw_draw_init( struct brw_context *brw )
diff --git a/src/mesa/drivers/dri/i965/brw_draw.h 
b/src/mesa/drivers/dri/i965/brw_draw.h
index c915bc3..aac375f 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.h
+++ b/src/mesa/drivers/dri/i965/brw_draw.h
@@ -49,7 +49,7 @@ void brw_draw_destroy( struct brw_context *brw );
 /* brw_primitive_restart.c */
 GLboolean
 brw_handle_primitive_restart(struct gl_context *ctx,
- const struct _mesa_prim *prim,
+ const struct _mesa_prim *prims,
  GLuint nr_prims,

[Mesa-dev] [PATCH 6/6] mesa: Rename gl_context::swtnl_im to vbo_context; use proper type.

2013-08-28 Thread Kenneth Graunke
The main GL context's swtnl_im field is the VBO module's vbo_context
structure.  Using the name swtnl in the name is confusing since
some drivers use hardware texturing and lighting, but still rely on the
VBO module for drawing.

v2: Forward declare the type and use that instead of void *
(suggested by Eric Anholt)

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/main/mtypes.h | 3 ++-
 src/mesa/vbo/vbo_context.c | 4 ++--
 src/mesa/vbo/vbo_context.h | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 22bb58c..7d56322 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -80,6 +80,7 @@ struct prog_instruction;
 struct gl_program_parameter_list;
 struct set;
 struct set_entry;
+struct vbo_context;
 /*@}*/
 
 
@@ -3669,7 +3670,7 @@ struct gl_context
void *swrast_context;
void *swsetup_context;
void *swtnl_context;
-   void *swtnl_im;
+   struct vbo_context *vbo_context;
struct st_context *st;
void *aelt_context;
/*@}*/
diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c
index b97313d..2aa5bbc 100644
--- a/src/mesa/vbo/vbo_context.c
+++ b/src/mesa/vbo/vbo_context.c
@@ -152,7 +152,7 @@ GLboolean _vbo_CreateContext( struct gl_context *ctx )
 {
struct vbo_context *vbo = CALLOC_STRUCT(vbo_context);
 
-   ctx-swtnl_im = (void *)vbo;
+   ctx-vbo_context = (void *) vbo;
 
/* Initialize the arrayelt helper
 */
@@ -224,7 +224,7 @@ void _vbo_DestroyContext( struct gl_context *ctx )
   if (ctx-API == API_OPENGL_COMPAT)
  vbo_save_destroy(ctx);
   free(vbo);
-  ctx-swtnl_im = NULL;
+  ctx-vbo_context = NULL;
}
 }
 
diff --git a/src/mesa/vbo/vbo_context.h b/src/mesa/vbo/vbo_context.h
index 27fae83..db47a8b 100644
--- a/src/mesa/vbo/vbo_context.h
+++ b/src/mesa/vbo/vbo_context.h
@@ -90,7 +90,7 @@ struct vbo_context {
 
 static inline struct vbo_context *vbo_context(struct gl_context *ctx) 
 {
-   return (struct vbo_context *)(ctx-swtnl_im);
+   return (struct vbo_context *) ctx-vbo_context;
 }
 
 
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Allow precision qualifiers for sampler types

2013-08-28 Thread Ian Romanick
On 08/27/2013 12:52 PM, Anuj Phogat wrote:
 On Tue, Aug 27, 2013 at 11:53 AM, Ian Romanick i...@freedesktop.org wrote:
 On 08/27/2013 10:45 AM, Anuj Phogat wrote:

 GLSL 1.30 doesn't allow precision qualifiers on sampler types,
 but in GLSL ES, sampler types are also allowed. This seems like
 an oversight (since the intention of including these in GLSL 1.30
 is to allow compatibility with ES shaders).

 Currently, Mesa allows default precision qualifiers to be set for
 sampler types in GLSL (commit d5948f2). This patch makes it follow
 GLSL ES rules and also allow declaring sampler variables with a
 precision qualifier in GLSL.


 I think our current behavior is incorrect even in the ES case.  GLSL ES 3.30
 You mean to say GLSL ES 3.00?

Yes.  That's about the fifth time I've made that typo in the last week...

 and desktop GLSL 4.40 say the following in section 4.5.3 (Precision
 Qualifiers):


 Any floating point or any integer declaration can have the type
 preceded by one of these precision qualifiers...

 Yes, samplers are now allowed in GLSL 4.4. They were not in GLSL 4.3.
 
 The also both say the following in section 4.5.4 (Default Precision
 Qualifiers):

 The precision statement...can be used to establish a default
 precision qualifier. The type field can be either int or float
 or any of the sampler types...

 So I believe

 precision mediump sampler2D;

 should be legal in all versions, but

 uniform mediump sampler2D s;

 should not.

 Yes, there is no clear statement in GLSL spec which allows:
 uniform mediump sampler2D s;
 
 Which syntax is the test using?

 test uses:
 uniform mediump sampler2D s;
 
 I haven't yet tested if it is accepted by NVIDIA.

There is an example in section 8 (Built-in Functions) that uses this syntax:

uniform lowp sampler2D sampler;
highp vec2 coord;
...
lowp vec4 col = texture (sampler, coord); // texture() returns lowp

It seems that this syntax should be legal.  I've submitted a spec bug to
clarify the language in section 4.5.

I have also attached a patch to fix up the comment in that piece of
code.  Go ahead and combine my patch (with my Signed-off-by) with your
code changes.

With the one other change suggested below,

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

 This fixes a shader compilation error in Khronos OpenGL conformance
 test depth_texture_mipmap.

 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
   src/glsl/ast_to_hir.cpp | 14 +-
   1 file changed, 9 insertions(+), 5 deletions(-)

 diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
 index 192130a..b3d6d8c 100644
 --- a/src/glsl/ast_to_hir.cpp
 +++ b/src/glsl/ast_to_hir.cpp
 @@ -3131,8 +3131,8 @@ ast_declarator_list::hir(exec_list *instructions,
state-check_precision_qualifiers_allowed(loc);
 }

 -
 -  /* Precision qualifiers only apply to floating point and integer
 types.
 +  /* Precision qualifiers apply to floating point, integer and
 sampler
 +   * types.
  *
  * From section 4.5.2 of the GLSL 1.30 spec:
  *Any floating point or any integer declaration can have the
 type
 @@ -3144,20 +3144,24 @@ ast_declarator_list::hir(exec_list *instructions,
  *
  * From page 87 of the GLSL ES spec:
  *RESOLUTION: Allow sampler types to take a precision
 qualifier.
 +   *
 +   * GLSL 1.30 doesn't allow precision qualifiers on sampler types,
 but
 +   * this seems like an oversight (since the intention of including
 these
 +   * in GLSL 1.30 is to allow compatibility with ES shaders). So we
 allow
 +   * int, float, and all sampler types regardless of GLSL version.
  */
 if (this-type-qualifier.precision != ast_precision_none
  !var-type-is_float()
  !var-type-is_integer()
  !var-type-is_record()
 -   !(var-type-is_sampler()  state-es_shader)
 +   !(var-type-is_sampler())

You can delete the extra ( and ).

  !(var-type-is_array()
   (var-type-fields.array-is_float()
  || var-type-fields.array-is_integer( {

_mesa_glsl_error(loc, state,
 precision qualifiers apply only to floating
 point
 -  %s types, state-es_shader ? , integer, and
 sampler
 -  : and integer);
 +  , integer and sampler types);
 }

 /* From page 17 (page 23 of the PDF) of the GLSL 1.20 spec:



comment-fix-up.patch
Description: application/pgp-keys
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/22] i965: Move data from brw-vs into a base class if gs will also need it.

2013-08-28 Thread Kenneth Graunke

On 08/26/2013 03:12 PM, Paul Berry wrote:

This paves the way for sharing the code that will set up the vertex
and geometry shader pipeline state.
---
  src/mesa/drivers/dri/i965/brw_context.h  | 47 ++--
  src/mesa/drivers/dri/i965/brw_draw.c |  3 +-
  src/mesa/drivers/dri/i965/brw_misc_state.c   |  6 +--
  src/mesa/drivers/dri/i965/brw_vs.c   | 12 +++---
  src/mesa/drivers/dri/i965/brw_vs_state.c | 24 ++--
  src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 43 --
  src/mesa/drivers/dri/i965/brw_vtbl.c |  2 +-
  src/mesa/drivers/dri/i965/brw_wm_sampler_state.c |  8 ++--
  src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  4 +-
  src/mesa/drivers/dri/i965/gen6_sampler_state.c   |  2 +-
  src/mesa/drivers/dri/i965/gen6_vs_state.c| 23 +++-
  src/mesa/drivers/dri/i965/gen7_vs_state.c| 18 +
  12 files changed, 107 insertions(+), 85 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index dcd4c9a..9784956 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -818,6 +818,32 @@ struct brw_query_object {


  /**
+ * Data shared between brw_context::vs and brw_context::gs
+ */
+struct brw_vec4_context_base
+{
+   drm_intel_bo *scratch_bo;
+   drm_intel_bo *const_bo;
+   /** Offset in the program cache to the program */
+   uint32_t prog_offset;
+   uint32_t state_offset;
+
+   uint32_t push_const_offset; /* Offset in the batchbuffer */
+   int push_const_size; /* in 256-bit register increments */
+
+   uint32_t bind_bo_offset;
+   uint32_t surf_offset[BRW_MAX_VEC4_SURFACES];
+
+   /** SAMPLER_STATE count and table offset */
+   uint32_t sampler_count;
+   uint32_t sampler_offset;
+
+   /** Offsets in the batch to sampler default colors (texture border color) */
+   uint32_t sdc_offset[BRW_MAX_TEX_UNIT];
+};


I like what this patch is doing, but I really don't like the names.

With the exception of ralloc, context/ctx generally always mean the 
global GL context: gl_context or a subclass like brw_context.  (For 
ralloc, we inherited the context terminology from talloc, so it kind 
of stuck.)  vec4_ctx/brw_vec4_context_base are something totally different.


This is a structure that represents the shader program state for a 
particular pipeline stage.  Also, other than BRW_MAX_VEC4_SURFACES, 
there's nothing vec4 specific about this at all.  The pixel shader could 
use every one of these fields (and should eventually).  So I dislike 
vec4 in the name - we're just going to have to change it.


I had suggested names like brw_shader_state or brw_pipeline_state...I'm 
open to other ideas.


--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Remove never used RSR and RSL opcodes.

2013-08-28 Thread Matt Turner
Existed since the initial import, but appear to never have actually
existed.
---
Rotate?

 src/mesa/drivers/dri/i965/brw_defines.h  | 2 --
 src/mesa/drivers/dri/i965/brw_eu.h   | 2 --
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 2 --
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 --
 4 files changed, 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 832ff55..7e5be2a 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -691,8 +691,6 @@ enum opcode {
BRW_OPCODE_XOR =7,
BRW_OPCODE_SHR =8,
BRW_OPCODE_SHL =9,
-   BRW_OPCODE_RSR =10,
-   BRW_OPCODE_RSL =11,
BRW_OPCODE_ASR =12,
BRW_OPCODE_CMP =16,
BRW_OPCODE_CMPN =   17,
diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 387450b..6ac1c68 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -154,8 +154,6 @@ ALU2(OR)
 ALU2(XOR)
 ALU2(SHR)
 ALU2(SHL)
-ALU2(RSR)
-ALU2(RSL)
 ALU2(ASR)
 ALU1(F32TO16)
 ALU1(F16TO32)
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index ecf8597..f26c913 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -936,8 +936,6 @@ ALU2(OR)
 ALU2(XOR)
 ALU2(SHR)
 ALU2(SHL)
-ALU2(RSR)
-ALU2(RSL)
 ALU2(ASR)
 ALU1(F32TO16)
 ALU1(F16TO32)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index e715c37..ccd4e5e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -53,8 +53,6 @@ is_expression(const fs_inst *const inst)
case BRW_OPCODE_XOR:
case BRW_OPCODE_SHR:
case BRW_OPCODE_SHL:
-   case BRW_OPCODE_RSR:
-   case BRW_OPCODE_RSL:
case BRW_OPCODE_ASR:
case BRW_OPCODE_ADD:
case BRW_OPCODE_MUL:
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Remove never used DPA2 opcode.

2013-08-28 Thread Matt Turner
Existed since the initial import, but appears to never have actually
existed.
---
 src/mesa/drivers/dri/i965/brw_defines.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 7e5be2a..21c8baa 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -738,7 +738,6 @@ enum opcode {
BRW_OPCODE_DPH =85,
BRW_OPCODE_DP3 =86,
BRW_OPCODE_DP2 =87,
-   BRW_OPCODE_DPA2 =   88,
BRW_OPCODE_LINE =   89,
BRW_OPCODE_PLN =90,
BRW_OPCODE_MAD =91,
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] nv50: implement new float comparison instructions

2013-08-28 Thread Lucas Stach
Am Dienstag, den 13.08.2013, 20:14 +0200 schrieb Christoph Bumiller:
 On 13.08.2013 19:04, srol...@vmware.com wrote:
  From: Roland Scheidegger srol...@vmware.com
 
  untested.
 
 Looks like it should work though, thanks.
 nv50 only supported u32 result all along and on nvc0 both cases are
 already handled by the rest of the code, too.
 
This commit beaks Xonotic on NV92 for me. Dmesg has a lot of those:
TRAP_MP_EXEC - TP 0 MP 0: TIMEOUT at 07fed0 warp 20, opcode 90001204 82051008

  ---
   .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp |   17 +
   1 file changed, 17 insertions(+)
 
  diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp 
  b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
  index 56eccac..a2ad9f4 100644
  --- a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
  +++ b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
  @@ -440,6 +440,11 @@ nv50_ir::DataType Instruction::inferDstType() const
  switch (getOpcode()) {
  case TGSI_OPCODE_F2U: return nv50_ir::TYPE_U32;
  case TGSI_OPCODE_F2I: return nv50_ir::TYPE_S32;
  +   case TGSI_OPCODE_FSEQ:
  +   case TGSI_OPCODE_FSGE:
  +   case TGSI_OPCODE_FSLT:
  +   case TGSI_OPCODE_FSNE:
  +  return nv50_ir::TYPE_U32;
  case TGSI_OPCODE_I2F:
  case TGSI_OPCODE_U2F:
 return nv50_ir::TYPE_F32;
  @@ -456,19 +461,23 @@ nv50_ir::CondCode Instruction::getSetCond() const
  case TGSI_OPCODE_SLT:
  case TGSI_OPCODE_ISLT:
  case TGSI_OPCODE_USLT:
  +   case TGSI_OPCODE_FSLT:
 return CC_LT;
  case TGSI_OPCODE_SLE:
 return CC_LE;
  case TGSI_OPCODE_SGE:
  case TGSI_OPCODE_ISGE:
  case TGSI_OPCODE_USGE:
  +   case TGSI_OPCODE_FSGE:
 return CC_GE;
  case TGSI_OPCODE_SGT:
 return CC_GT;
  case TGSI_OPCODE_SEQ:
  case TGSI_OPCODE_USEQ:
  +   case TGSI_OPCODE_FSEQ:
 return CC_EQ;
  case TGSI_OPCODE_SNE:
  +   case TGSI_OPCODE_FSNE:
 return CC_NEU;
  case TGSI_OPCODE_USNE:
 return CC_NE;
  @@ -556,6 +565,10 @@ static nv50_ir::operation translateOpcode(uint opcode)
  NV50_IR_OPCODE_CASE(KILL_IF, DISCARD);
   
  NV50_IR_OPCODE_CASE(F2I, CVT);
  +   NV50_IR_OPCODE_CASE(FSEQ, SET);
  +   NV50_IR_OPCODE_CASE(FSGE, SET);
  +   NV50_IR_OPCODE_CASE(FSLT, SET);
  +   NV50_IR_OPCODE_CASE(FSNE, SET);
  NV50_IR_OPCODE_CASE(IDIV, DIV);
  NV50_IR_OPCODE_CASE(IMAX, MAX);
  NV50_IR_OPCODE_CASE(IMIN, MIN);
  @@ -2354,6 +2367,10 @@ Converter::handleInstruction(const struct 
  tgsi_full_instruction *insn)
  case TGSI_OPCODE_SLE:
  case TGSI_OPCODE_SNE:
  case TGSI_OPCODE_STR:
  +   case TGSI_OPCODE_FSEQ:
  +   case TGSI_OPCODE_FSGE:
  +   case TGSI_OPCODE_FSLT:
  +   case TGSI_OPCODE_FSNE:
  case TGSI_OPCODE_ISGE:
  case TGSI_OPCODE_ISLT:
  case TGSI_OPCODE_USEQ:
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/22] i965: Make sure constants re-sent after constant buffer reallocation.

2013-08-28 Thread Kenneth Graunke

On 08/26/2013 03:12 PM, Paul Berry wrote:

The hardware requires that after constant buffers for a stage are
allocated using a 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS}
command, and prior to execution of a 3DPRIMITIVE, the corresponding
stage's constant buffers must be reprogrammed using a
3DSTATE_CONSTANT_{VS,HS,DS,GS,PS} command.

Previously we didn't need to worry about this, because we only
programmed 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} once on
startup.  But now that we reallocate the constant buffers whenever
geometry shaders are switched on and off, we need to make sure the
constant buffers are reprogrammed.


Not exactly.  The change to do PUSH_CONSTANT_ALLOC once at startup is 
very recent - I only committed it on June 10th (fc800f0c60a2) 
Previously, we had a state atom which did PUSH_CONSTANT_ALLOC whenever 
BRW_NEW_CONTEXT was flagged.


That's still vaguely once at startup, but keep in mind that before 
hardware contexts were mandatory, BRW_NEW_CONTEXT got flagged on every 
batch.


The atoms list looked like this:

   gen7_push_constant_alloc,
   ...
   gen7_vs_state,
   ...
   gen7_ps_state,

Both VS and PS state listen to BRW_NEW_BATCH, so on every batch, we'd 
end up doing:


3DSTATE_PUSH_CONSTANT_ALLOC_VS (if hw_ctx == NULL)
3DSTATE_PUSH_CONSTANT_ALLOC_PS (if hw_ctx == NULL)
3DSTATE_CONSTANT_VS
3DSTATE_CONSTANT_PS

which meant that we always obeyed this rule, even when we didn't do the 
allocation once at startup and never again.


But this only worked because we always allocated push constant space at 
the start of a batch.  Your previous patch cause reallocations to happen 
mid-batch whenever the geometry program changes.  This makes the old 
tricks quit working, and we do need a new flag.


So, I was pretty skeptical of this patch, but on further review, it does 
appear to be necessary and looks fine as is.



We do this by adding a new bit, BRW_NEW_PUSH_CONSTANT_ALLOCATION, to
brw-state.dirty.brw.
---
  src/mesa/drivers/dri/i965/brw_context.h   |  2 ++
  src/mesa/drivers/dri/i965/gen6_gs_state.c |  2 +-
  src/mesa/drivers/dri/i965/gen6_vs_state.c |  3 ++-
  src/mesa/drivers/dri/i965/gen6_wm_state.c |  3 ++-
  src/mesa/drivers/dri/i965/gen7_urb.c  | 13 +
  src/mesa/drivers/dri/i965/gen7_vs_state.c |  3 ++-
  src/mesa/drivers/dri/i965/gen7_wm_state.c |  3 ++-
  7 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 95f9bb2..35193a6 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -158,6 +158,7 @@ enum brw_state_id {
 BRW_STATE_UNIFORM_BUFFER,
 BRW_STATE_META_IN_PROGRESS,
 BRW_STATE_INTERPOLATION_MAP,
+   BRW_STATE_PUSH_CONSTANT_ALLOCATION,
 BRW_NUM_STATE_BITS
  };

@@ -194,6 +195,7 @@ enum brw_state_id {
  #define BRW_NEW_UNIFORM_BUFFER  (1  BRW_STATE_UNIFORM_BUFFER)
  #define BRW_NEW_META_IN_PROGRESS(1  BRW_STATE_META_IN_PROGRESS)
  #define BRW_NEW_INTERPOLATION_MAP   (1  BRW_STATE_INTERPOLATION_MAP)
+#define BRW_NEW_PUSH_CONSTANT_ALLOCATION (1  
BRW_STATE_PUSH_CONSTANT_ALLOCATION)

  struct brw_state_flags {
 /** State update flags signalled by mesa internals */
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
b/src/mesa/drivers/dri/i965/gen6_gs_state.c
index ac78286..9648fb7 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
@@ -81,7 +81,7 @@ upload_gs_state(struct brw_context *brw)
  const struct brw_tracked_state gen6_gs_state = {
 .dirty = {
.mesa  = _NEW_TRANSFORM,
-  .brw   = BRW_NEW_CONTEXT,
+  .brw   = BRW_NEW_CONTEXT | BRW_NEW_PUSH_CONSTANT_ALLOCATION,
.cache = CACHE_NEW_FF_GS_PROG
 },
 .emit = upload_gs_state,
diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c 
b/src/mesa/drivers/dri/i965/gen6_vs_state.c
index c099342..9f99db8 100644
--- a/src/mesa/drivers/dri/i965/gen6_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c
@@ -206,7 +206,8 @@ const struct brw_tracked_state gen6_vs_state = {
.mesa  = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS,
.brw   = (BRW_NEW_CONTEXT |
BRW_NEW_VERTEX_PROGRAM |
-   BRW_NEW_BATCH),
+   BRW_NEW_BATCH |
+BRW_NEW_PUSH_CONSTANT_ALLOCATION),
.cache = CACHE_NEW_VS_PROG | CACHE_NEW_SAMPLER
 },
 .emit = upload_vs_state,
diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
b/src/mesa/drivers/dri/i965/gen6_wm_state.c
index e286785..6725805 100644
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -229,7 +229,8 @@ const struct brw_tracked_state gen6_wm_state = {
_NEW_POLYGON |
  _NEW_MULTISAMPLE),
.brw   = (BRW_NEW_FRAGMENT_PROGRAM |
-   BRW_NEW_BATCH),
+   BRW_NEW_BATCH |
+BRW_NEW_PUSH_CONSTANT_ALLOCATION),
.cache 

Re: [Mesa-dev] [PATCH 09/22] i965/gs: Allocate push constant space for use by GS.

2013-08-28 Thread Kenneth Graunke

On 08/26/2013 03:12 PM, Paul Berry wrote:

Previously, we would always use the same push constant allocation
regardless of what shader programs were being run: the available push
constant space was split into 2 equal size partitions, one for the
vertex shader, and one for the fragment shader.

Now that we are adding geometry shader support, we need to do
something smarter.  This patch adjusts things so that when a geometry
shader is in use, we split the available push constant space into 3
nearly-equal size partitions instead of 2.

Since the push constant allocation is now affected by GL state, it can
no longer be set up by brw_upload_initial_gpu_state(); instead it must
be set up by a state atom.
---
  src/mesa/drivers/dri/i965/brw_context.h  |   3 +-
  src/mesa/drivers/dri/i965/brw_defines.h  |   1 +
  src/mesa/drivers/dri/i965/brw_state.h|   4 +-
  src/mesa/drivers/dri/i965/brw_state_upload.c |   5 +-
  src/mesa/drivers/dri/i965/gen7_blorp.cpp |   6 ++
  src/mesa/drivers/dri/i965/gen7_urb.c | 101 +++
  6 files changed, 98 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 77f2a6b..95f9bb2 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1508,7 +1508,8 @@ gen6_get_sample_position(struct gl_context *ctx,

  /* gen7_urb.c */
  void
-gen7_allocate_push_constants(struct brw_context *brw);
+gen7_emit_push_constant_state(struct brw_context *brw, unsigned vs_size,
+  unsigned gs_size, unsigned fs_size);

  void
  gen7_emit_urb_state(struct brw_context *brw,
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 832ff55..8d9a824 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1284,6 +1284,7 @@ enum brw_message_target {
  # define GEN7_URB_STARTING_ADDRESS_SHIFT25

  #define _3DSTATE_PUSH_CONSTANT_ALLOC_VS 0x7912 /* GEN7+ */
+#define _3DSTATE_PUSH_CONSTANT_ALLOC_GS 0x7915 /* GEN7+ */
  #define _3DSTATE_PUSH_CONSTANT_ALLOC_PS 0x7916 /* GEN7+ */
  # define GEN7_PUSH_CONSTANT_BUFFER_OFFSET_SHIFT 16

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 85f82fe..4814639 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -112,6 +112,7 @@ extern const struct brw_tracked_state 
gen7_cc_viewport_state_pointer;
  extern const struct brw_tracked_state gen7_clip_state;
  extern const struct brw_tracked_state gen7_disable_stages;
  extern const struct brw_tracked_state gen7_ps_state;
+extern const struct brw_tracked_state gen7_push_constant_space;
  extern const struct brw_tracked_state gen7_sbe_state;
  extern const struct brw_tracked_state gen7_sf_clip_viewport;
  extern const struct brw_tracked_state gen7_sf_state;
@@ -220,9 +221,6 @@ uint32_t
  get_attr_override(const struct brw_vue_map *vue_map, int 
urb_entry_read_offset,
int fs_attr, bool two_side_color, uint32_t 
*max_source_attr);

-/* gen7_urb.c */
-void gen7_allocate_push_constants(struct brw_context *brw);
-
  #ifdef __cplusplus
  }
  #endif
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index b883002..9638c69 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -188,6 +188,7 @@ static const struct brw_tracked_state *gen7_atoms[] =
 gen7_cc_viewport_state_pointer, /* must do after brw_cc_vp */
 gen7_sf_clip_viewport,

+   gen7_push_constant_space,
 gen7_urb,
 gen6_blend_state, /* must do before cc unit */
 gen6_color_calc_state,/* must do before cc unit */
@@ -251,10 +252,6 @@ brw_upload_initial_gpu_state(struct brw_context *brw)
return;

 brw_upload_invariant_state(brw);
-
-   if (brw-gen = 7) {
-  gen7_allocate_push_constants(brw);
-   }
  }

  void brw_init_state( struct brw_context *brw )
diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index 6c798b1..9df3d92 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -51,6 +51,12 @@ static void
  gen7_blorp_emit_urb_config(struct brw_context *brw,
 const brw_blorp_params *params)
  {
+   unsigned urb_size = (brw-is_haswell  brw-gt == 3) ? 32 : 16;
+   gen7_emit_push_constant_state(brw,
+ urb_size / 2 /* vs_size */,
+ 0 /* gs_size */,
+ urb_size / 2 /* fs_size */);
+
 /* The minimum valid number of VS entries is 32. See 3DSTATE_URB_VS, Dword
  * 1.15:0 VS Number of URB Entries.
  */
diff --git a/src/mesa/drivers/dri/i965/gen7_urb.c 

Re: [Mesa-dev] [PATCH 11/22] i965: generalize brw_vs_pull_constants in preparation for GS.

2013-08-28 Thread Kenneth Graunke

On 08/26/2013 03:12 PM, Paul Berry wrote:

---
  src/mesa/drivers/dri/i965/brw_state.h|  8 +++
  src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 66 +++-
  2 files changed, 50 insertions(+), 24 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 4814639..e7a1b40 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -221,6 +221,14 @@ uint32_t
  get_attr_override(const struct brw_vue_map *vue_map, int 
urb_entry_read_offset,
int fs_attr, bool two_side_color, uint32_t 
*max_source_attr);

+/* brw_vs_surface_state.c */
+void
+brw_upload_vec4_pull_constants(struct brw_context *brw,
+   GLbitfield64 brw_new_constbuf,


FWIW, brw-state.dirty.brw is only 32-bits currently.  That said, it's 
probably going to change in the not-too-distant future, so using 
GLbitfield64 preemptively isn't crazy.



+   const struct gl_program *prog,
+   struct brw_vec4_context_base *vec4_ctx,
+   const struct brw_vec4_prog_data *prog_data);
+
  #ifdef __cplusplus
  }
  #endif
diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
index 629eb96..48124bf 100644
--- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
@@ -35,56 +35,50 @@
  #include brw_context.h
  #include brw_state.h

-/* Creates a new VS constant buffer reflecting the current VS program's
- * constants, if needed by the VS program.
- *
- * Otherwise, constants go through the CURBEs using the brw_constant_buffer
- * state atom.
- */
-static void
-brw_upload_vs_pull_constants(struct brw_context *brw)
-{
-   struct brw_vec4_context_base *vec4_ctx = brw-vs.base;

-   /* BRW_NEW_VERTEX_PROGRAM */
-   struct brw_vertex_program *vp =
-  (struct brw_vertex_program *) brw-vertex_program;
+void
+brw_upload_vec4_pull_constants(struct brw_context *brw,
+   GLbitfield64 brw_new_constbuf,
+   const struct gl_program *prog,
+   struct brw_vec4_context_base *vec4_ctx,
+   const struct brw_vec4_prog_data *prog_data)
+{
 int i;

 /* Updates the ParamaterValues[i] pointers for all parameters of the
  * basic type of PROGRAM_STATE_VAR.
  */
-   _mesa_load_state_parameters(brw-ctx, vp-program.Base.Parameters);
+   _mesa_load_state_parameters(brw-ctx, prog-Parameters);

-   /* CACHE_NEW_VS_PROG */
-   if (!brw-vs.prog_data-base.nr_pull_params) {
+   if (!prog_data-nr_pull_params) {
if (vec4_ctx-const_bo) {
 drm_intel_bo_unreference(vec4_ctx-const_bo);
 vec4_ctx-const_bo = NULL;
 vec4_ctx-surf_offset[SURF_INDEX_VEC4_CONST_BUFFER] = 0;
-brw-state.dirty.brw |= BRW_NEW_VS_CONSTBUF;
+brw-state.dirty.brw |= brw_new_constbuf;
}
return;
 }

 /* _NEW_PROGRAM_CONSTANTS */
 drm_intel_bo_unreference(vec4_ctx-const_bo);
-   uint32_t size = brw-vs.prog_data-base.nr_pull_params * 4;
-   vec4_ctx-const_bo = drm_intel_bo_alloc(brw-bufmgr, vp_const_buffer,
+   uint32_t size = prog_data-nr_pull_params * 4;
+   vec4_ctx-const_bo = drm_intel_bo_alloc(brw-bufmgr, vec4_const_buffer,
 size, 64);

 drm_intel_gem_bo_map_gtt(vec4_ctx-const_bo);
-   for (i = 0; i  brw-vs.prog_data-base.nr_pull_params; i++) {
+
+   for (i = 0; i  prog_data-nr_pull_params; i++) {
memcpy(vec4_ctx-const_bo-virtual + i * 4,
-brw-vs.prog_data-base.pull_param[i],
+prog_data-pull_param[i],
 4);
 }

 if (0) {
-  for (i = 0; i  ALIGN(brw-vs.prog_data-base.nr_pull_params, 4) / 4;
+  for (i = 0; i  ALIGN(prog_data-nr_pull_params, 4) / 4;
 i++) {


You could probably move the i++ up a line since it's shorter now.

This patch is great.


 float *row = (float *)vec4_ctx-const_bo-virtual + i * 4;
-printf(vs const surface %3d: %4.3f %4.3f %4.3f %4.3f\n,
+printf(const surface %3d: %4.3f %4.3f %4.3f %4.3f\n,
i, row[0], row[1], row[2], row[3]);
}
 }
@@ -95,7 +89,31 @@ brw_upload_vs_pull_constants(struct brw_context *brw)
 brw-vtbl.create_constant_surface(brw, vec4_ctx-const_bo, 0, size,
   vec4_ctx-surf_offset[surf], false);

-   brw-state.dirty.brw |= BRW_NEW_VS_CONSTBUF;
+   brw-state.dirty.brw |= brw_new_constbuf;
+}
+
+
+/* Creates a new VS constant buffer reflecting the current VS program's
+ * constants, if needed by the VS program.
+ *
+ * Otherwise, constants go through the CURBEs using the brw_constant_buffer
+ * state atom.
+ */
+static void
+brw_upload_vs_pull_constants(struct brw_context *brw)
+{
+   struct brw_vec4_context_base *vec4_ctx = 

Re: [Mesa-dev] [PATCH 13/22] i965/gs: Implement support for geometry shader surfaces.

2013-08-28 Thread Kenneth Graunke

On 08/26/2013 03:12 PM, Paul Berry wrote:

This patch implements pull constant upload, binding table upload, and
surface setup for geometry shaders, by re-using vertex shader code
that was generalized in previous patches.

Based on work by Eric Anholt e...@anholt.net.


This looks a lot better than the previous version.

I've never really been crazy about having binding table code split 
across brw_vs_surface_state.c and brw_wm_surface_state.c, with the bulk 
of the code in the WM file for some reason.  This adds a third file, and 
I'm not crazy about that either.


Still, I vote that we land brw_gs_surface_state.c as is now; we can 
always do more tidying and code motion after the fact.


--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 20/22] i965/gen7: merge defines for 3DSTATE{VS, GS, WM} dword 2

2013-08-28 Thread Kenneth Graunke

On 08/26/2013 03:12 PM, Paul Berry wrote:

Dword 2 of all 3DSTATE commands is the same, so there's no need tohave


Well, not -all- 3DSTATE commands...just these :)

It's weird that you decided to share the bits for 3DSTATE_VS, 
3DSTATE_GS, and 3DSTATE_WM on SNB, but not GEN7_PS_* for 3DSTATE_PS on 
IVB.  If you're going to do WM, you might as well do PS too...



separate defines for it.  This will allow us to unify some of the
state setup code between VS and GS.
---
  src/mesa/drivers/dri/i965/brw_defines.h   | 30 +-
  src/mesa/drivers/dri/i965/gen6_blorp.cpp  |  2 +-
  src/mesa/drivers/dri/i965/gen6_gs_state.c |  6 +++---
  src/mesa/drivers/dri/i965/gen6_vs_state.c |  4 ++--
  src/mesa/drivers/dri/i965/gen6_wm_state.c |  4 ++--
  src/mesa/drivers/dri/i965/gen7_disable.c  |  4 ++--
  src/mesa/drivers/dri/i965/gen7_vs_state.c |  4 ++--
  7 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index ec6c854..d698757 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1303,14 +1303,16 @@ enum brw_message_target {

  #define _3DSTATE_SCISSOR_STATE_POINTERS   0x780f /* GEN6+ */

-#define _3DSTATE_VS0x7810 /* GEN6+ */
+/* Common to _3DSTATE_{VS,GS} */


No mention of WM here.  Maybe:
/* Common to 3DSTATE_{VS,GS,PS|WM} */


  /* DW2 */
-# define GEN6_VS_SPF_MODE  (1  31)
-# define GEN6_VS_VECTOR_MASK_ENABLE(1  30)
-# define GEN6_VS_SAMPLER_COUNT_SHIFT   27
-# define GEN6_VS_BINDING_TABLE_ENTRY_COUNT_SHIFT   18
-# define GEN6_VS_FLOATING_POINT_MODE_IEEE_754  (0  16)
-# define GEN6_VS_FLOATING_POINT_MODE_ALT   (1  16)
+# define GEN6_SPF_MODE (1  31)
+# define GEN6_VECTOR_MASK_ENABLE   (1  30)
+# define GEN6_SAMPLER_COUNT_SHIFT  27
+# define GEN6_BINDING_TABLE_ENTRY_COUNT_SHIFT  18
+# define GEN6_FLOATING_POINT_MODE_IEEE_754 (0  16)
+# define GEN6_FLOATING_POINT_MODE_ALT  (1  16)
+
+#define _3DSTATE_VS0x7810 /* GEN6+ */
  /* DW4 */
  # define GEN6_VS_DISPATCH_START_GRF_SHIFT 20
  # define GEN6_VS_URB_READ_LENGTH_SHIFT11
@@ -1323,13 +1325,6 @@ enum brw_message_target {
  # define GEN6_VS_ENABLE   (1  0)

  #define _3DSTATE_GS   0x7811 /* GEN6+ */
-/* DW2 */
-# define GEN6_GS_SPF_MODE  (1  31)
-# define GEN6_GS_VECTOR_MASK_ENABLE(1  30)
-# define GEN6_GS_SAMPLER_COUNT_SHIFT   27
-# define GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT   18
-# define GEN6_GS_FLOATING_POINT_MODE_IEEE_754  (0  16)
-# define GEN6_GS_FLOATING_POINT_MODE_ALT   (1  16)
  /* DW4 */
  # define GEN6_GS_URB_READ_LENGTH_SHIFT11
  # define GEN7_GS_INCLUDE_VERTEX_HANDLES   (1  10)
@@ -1518,13 +1513,6 @@ enum brw_wm_barycentric_interp_mode {

  #define _3DSTATE_WM   0x7814 /* GEN6+ */
  /* DW1: kernel pointer */
-/* DW2 */
-# define GEN6_WM_SPF_MODE  (1  31)
-# define GEN6_WM_VECTOR_MASK_ENABLE(1  30)
-# define GEN6_WM_SAMPLER_COUNT_SHIFT   27
-# define GEN6_WM_BINDING_TABLE_ENTRY_COUNT_SHIFT   18
-# define GEN6_WM_FLOATING_POINT_MODE_IEEE_754  (0  16)
-# define GEN6_WM_FLOATING_POINT_MODE_ALT   (1  16)
  /* DW3: scratch space */
  /* DW4 */
  # define GEN6_WM_STATISTICS_ENABLE(1  31)
diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
index 1c85921..4b11d72 100644
--- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
@@ -727,7 +727,7 @@ gen6_blorp_emit_wm_config(struct brw_context *brw,
 dw6 |= 0  GEN6_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT; /* No interp */
 dw6 |= 0  GEN6_WM_NUM_SF_OUTPUTS_SHIFT; /* No inputs from SF */
 if (params-use_wm_prog) {
-  dw2 |= 1  GEN6_WM_SAMPLER_COUNT_SHIFT; /* Up to 4 samplers */
+  dw2 |= 1  GEN6_SAMPLER_COUNT_SHIFT; /* Up to 4 samplers */
dw4 |= prog_data-first_curbe_grf  GEN6_WM_DISPATCH_START_GRF_SHIFT_0;
dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
dw5 |= GEN6_WM_KILL_ENABLE; /* TODO: temporarily smash on */
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
b/src/mesa/drivers/dri/i965/gen6_gs_state.c
index 9648fb7..29f9042 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
@@ -46,7 +46,7 @@ upload_gs_state(struct brw_context *brw)
BEGIN_BATCH(7);
OUT_BATCH(_3DSTATE_GS  16 | (7 - 2));
OUT_BATCH(brw-ff_gs.prog_offset);
-  

Re: [Mesa-dev] [PATCH 21/22] i965/gen7: Generalize gen7_vs_state in preparation for GS.

2013-08-28 Thread Kenneth Graunke

On 08/26/2013 03:12 PM, Paul Berry wrote:

---
  src/mesa/drivers/dri/i965/brw_state.h |  41 ++
  src/mesa/drivers/dri/i965/gen7_vs_state.c | 123 --
  2 files changed, 122 insertions(+), 42 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index b54338a..efef994 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -128,6 +128,38 @@ extern const struct brw_tracked_state gen7_wm_state;
  extern const struct brw_tracked_state haswell_cut_index;


+/**
+ * Parameters that differ between Gen7 VS and GS state upload commands.
+ */
+struct gen7_vec4_upload_params
+{
+   /**
+* Command used to set the binding table pointers for this stage.
+*/
+   unsigned binding_table_pointers_cmd;
+
+   /**
+* Command used to set the sampler state pointers for this stage.
+*/
+   unsigned sampler_state_pointers_cmd;
+
+   /**
+* Command used to send constants for this stage.
+*/
+   unsigned constant_cmd;
+
+   /**
+* Command used to send state for this stage.
+*/
+   unsigned state_cmd;
+
+   /**
+* Size of the state command for this stage.
+*/
+   unsigned state_cmd_size;
+};
+
+
  /* brw_misc_state.c */
  void brw_upload_invariant_state(struct brw_context *brw);
  uint32_t
@@ -240,6 +272,15 @@ brw_vec4_upload_binding_table(struct brw_context *brw,
struct brw_vec4_context_base *vec4_ctx,
const struct brw_vec4_prog_data *prog_data);

+/* gen7_vs_state.c */
+void
+gen7_upload_vec4_state(struct brw_context *brw,
+   const struct gen7_vec4_upload_params *upload_params,
+   const struct brw_vec4_context_base *vec4_ctx,
+   bool active, bool alt_floating_point_mode,
+   const struct brw_vec4_prog_data *prog_data,
+   const unsigned *stage_specific_cmd_data);
+
  #ifdef __cplusplus
  }
  #endif
diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
b/src/mesa/drivers/dri/i965/gen7_vs_state.c
index 30fe802..fd81112 100644
--- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
@@ -29,33 +29,31 @@
  #include program/prog_statevars.h
  #include intel_batchbuffer.h

-static void
-upload_vs_state(struct brw_context *brw)
-{
-   struct gl_context *ctx = brw-ctx;
-   const struct brw_vec4_context_base *vec4_ctx = brw-vs.base;
-   uint32_t floating_point_mode = 0;
-   const int max_threads_shift = brw-is_haswell ?
-  HSW_VS_MAX_THREADS_SHIFT : GEN6_VS_MAX_THREADS_SHIFT;

-   gen7_emit_vs_workaround_flush(brw);
-
-   /* BRW_NEW_VS_BINDING_TABLE */
+void
+gen7_upload_vec4_state(struct brw_context *brw,
+   const struct gen7_vec4_upload_params *upload_params,
+   const struct brw_vec4_context_base *vec4_ctx,
+   bool active, bool alt_floating_point_mode,
+   const struct brw_vec4_prog_data *prog_data,
+   const unsigned *stage_specific_cmd_data)
+{
+   /* BRW_NEW_*_BINDING_TABLE */
 BEGIN_BATCH(2);
-   OUT_BATCH(_3DSTATE_BINDING_TABLE_POINTERS_VS  16 | (2 - 2));
+   OUT_BATCH(upload_params-binding_table_pointers_cmd  16 | (2 - 2));
 OUT_BATCH(vec4_ctx-bind_bo_offset);
 ADVANCE_BATCH();

 /* CACHE_NEW_SAMPLER */
 BEGIN_BATCH(2);
-   OUT_BATCH(_3DSTATE_SAMPLER_STATE_POINTERS_VS  16 | (2 - 2));
+   OUT_BATCH(upload_params-sampler_state_pointers_cmd  16 | (2 - 2));
 OUT_BATCH(vec4_ctx-sampler_offset);
 ADVANCE_BATCH();

-   if (vec4_ctx-push_const_size == 0) {
+   if (!active || vec4_ctx-push_const_size == 0) {
/* Disable the push constant buffers. */
BEGIN_BATCH(7);
-  OUT_BATCH(_3DSTATE_CONSTANT_VS  16 | (7 - 2));
+  OUT_BATCH(upload_params-constant_cmd  16 | (7 - 2));
OUT_BATCH(0);
OUT_BATCH(0);
OUT_BATCH(0);
@@ -65,10 +63,10 @@ upload_vs_state(struct brw_context *brw)
ADVANCE_BATCH();
 } else {
BEGIN_BATCH(7);
-  OUT_BATCH(_3DSTATE_CONSTANT_VS  16 | (7 - 2));
+  OUT_BATCH(upload_params-constant_cmd  16 | (7 - 2));
OUT_BATCH(vec4_ctx-push_const_size);
OUT_BATCH(0);
-  /* Pointer to the VS constant buffer.  Covered by the set of
+  /* Pointer to the stage's constant buffer.  Covered by the set of
 * state flags from gen6_prepare_wm_contants
 */
OUT_BATCH(vec4_ctx-push_const_offset | GEN7_MOCS_L3);
@@ -78,36 +76,77 @@ upload_vs_state(struct brw_context *brw)
ADVANCE_BATCH();
 }

+   BEGIN_BATCH(upload_params-state_cmd_size);
+   OUT_BATCH(upload_params-state_cmd  16 |
+ (upload_params-state_cmd_size - 2));
+   if (active) {
+  OUT_BATCH(vec4_ctx-prog_offset);
+  OUT_BATCH((alt_floating_point_mode ? GEN6_FLOATING_POINT_MODE_ALT
+ : GEN6_FLOATING_POINT_MODE_IEEE_754) |

Re: [Mesa-dev] [PATCH 08/22] i965/gs: Allocate URB space for use by GS.

2013-08-28 Thread Kenneth Graunke

On 08/26/2013 03:12 PM, Paul Berry wrote:

Previously, we gave all of the URB space (other than the small amount
that is used for push constants) to the vertex shader.  However, when
a geometry shader is active, we need to divide it up between the
vertex and geometry shaders.

The size of the URB entries for the vertex and geometry shaders can
vary dramatically from one shader to the next.  So it doesn't make
sense to simply split the available space in two.  In particular:

- On Ivy Bridge GT1, this would not leave enough space for the worst
   case geometry shader, which requires 64k of URB space.

- Due to hardware-imposed limits on the maximum number of URB entries,
   sometimes a given shader stage will only be capable of using a small
   amount of URB space.  When this happens, it may make sense to
   allocate substantially less than half of the available space to that
   stage.

Our algorithm for dividing space between the two stages is to first
compute (a) the minimum amount of URB space that each stage needs in
order to function properly, and (b) the amount of additional URB space
that each stage wants (i.e. that it would be capable of making use
of).  If the total amount of space available is not enough to satisfy
needs + wants, then each stage's wants amount is scaled back by the
same factor in order to fit.

When only a vertex shader is active, this algorithm produces
equivalent results to the old algorithm (if the vertex shader stage
can make use of all the available URB space, we assign all the space
to it; if it can't, we let it use as much as it can).

In the future, when we need to support tessellation control and
tessellation evaluation pipeline stages, it should be straightforward
to expand this algorithm to cover them.

v2: Use unsigned rather than GLuint.
---
  src/mesa/drivers/dri/i965/brw_context.h  |   6 +-
  src/mesa/drivers/dri/i965/gen7_blorp.cpp |  16 ++--
  src/mesa/drivers/dri/i965/gen7_urb.c | 155 +--
  3 files changed, 142 insertions(+), 35 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index be5175f..77f2a6b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1511,8 +1511,10 @@ void
  gen7_allocate_push_constants(struct brw_context *brw);

  void
-gen7_emit_urb_state(struct brw_context *brw, GLuint nr_vs_entries,
-GLuint vs_size, GLuint vs_start);
+gen7_emit_urb_state(struct brw_context *brw,
+unsigned nr_vs_entries, unsigned vs_size,
+unsigned vs_start, unsigned nr_gs_entries,
+unsigned gs_size, unsigned gs_start);



diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index a387836..6c798b1 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -51,14 +51,16 @@ static void
  gen7_blorp_emit_urb_config(struct brw_context *brw,
 const brw_blorp_params *params)
  {
-   /* The minimum valid value is 32. See 3DSTATE_URB_VS,
-* Dword 1.15:0 VS Number of URB Entries.
+   /* The minimum valid number of VS entries is 32. See 3DSTATE_URB_VS, Dword
+* 1.15:0 VS Number of URB Entries.
  */
-   int num_vs_entries = 32;
-   int vs_size = 2;
-   int vs_start = 2; /* skip over push constants */
-
-   gen7_emit_urb_state(brw, num_vs_entries, vs_size, vs_start);
+   gen7_emit_urb_state(brw,
+   32 /* num_vs_entries */,
+   2 /* vs_size */,
+   2 /* vs_start */,
+   0 /* num_gs_entries */,
+   1 /* gs_size */,
+   2 /* gs_start */);
  }


diff --git a/src/mesa/drivers/dri/i965/gen7_urb.c 
b/src/mesa/drivers/dri/i965/gen7_urb.c
index 927af37..2d10cc12 100644
--- a/src/mesa/drivers/dri/i965/gen7_urb.c
+++ b/src/mesa/drivers/dri/i965/gen7_urb.c
@@ -74,34 +74,136 @@ gen7_upload_urb(struct brw_context *brw)
  {
 const int push_size_kB = brw-is_haswell  brw-gt == 3 ? 32 : 16;

-   /* Total space for entries is URB size - 16kB for push constants */
-   int handle_region_size = (brw-urb.size - push_size_kB) * 1024; /* bytes */
-
 /* CACHE_NEW_VS_PROG */
 unsigned vs_size = MAX2(brw-vs.prog_data-base.urb_entry_size, 1);
-
-   int nr_vs_entries = handle_region_size / (vs_size * 64);
-   if (nr_vs_entries  brw-urb.max_vs_entries)
-  nr_vs_entries = brw-urb.max_vs_entries;
-
-   /* According to volume 2a, nr_vs_entries must be a multiple of 8. */
-   brw-urb.nr_vs_entries = ROUND_DOWN_TO(nr_vs_entries, 8);
-
-   /* URB Starting Addresses are specified in multiples of 8kB. */
-   brw-urb.vs_start = push_size_kB / 8; /* skip over push constants */
-
-   assert(brw-urb.nr_vs_entries % 8 == 0);
-   assert(brw-urb.nr_gs_entries % 8 == 0);
-   /* GS requirement */
-   assert(!brw-ff_gs.prog_active);
+   unsigned