Re: [Mesa-dev] [PATCH] glsl: propagate max_array_access through function calls
Ah it is by design. Sentinels are special nodes with no payload. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
Yeah, st/mesa also compiles shaders on the first use, so we've got 3 places to fix: Wine, st/mesa, the driver. Marek On Wed, Aug 28, 2013 at 2:07 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 02:59 AM, Marek Olšák wrote: First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. I perfectly understand that deferred compilation is exactly the problem that makes the games freeze due to shader compilation on first use when something new appears on the screen, but I don't think we can solve this problem in the *driver* by trying to compile early, because AFAICS currently the shaders are passed to the driver too late anyway, and this happens not only with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at the same time, what I see is that most of GL dumps happen while Heaven shows splash screen with loading progress, but most of the driver's dumps appear on the first frame and few more times during benchmark. It looks like compilation is deferred somewhere in the stack before the driver, or am I missing something? Vadim Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries between runs. Vadim [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders What the patch does is probably the right solution. At least alpha-test state changes don't cause shader recompilation and re-binding, which also negatively affects performance. Ideally we shouldn't depend on the framebuffer state at all, but we need to emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we should always be fine with key.nr_cbufs forced to 8 for any shader without that property. I expect app developers to do the right thing and not write outputs they don't need. Marek On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger srol...@vmware.com wrote: Not that I'm qualified to review r600 code, but couldn't you create different shader variants depending on whether you need alpha test? At least I would assume shader exports aren't free. Roland Am 27.08.2013 19:56, schrieb Vadim Girlin: We need to export at least one color if the shader writes it, even when nr_cbufs==0. Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Tested on evergreen with multiple combinations of backends
Re: [Mesa-dev] [PATCH] glx: make the interval of LIBGL_SHOW_FPS adjustable
Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Wed, Aug 28, 2013 at 6:14 AM, Chia-I Wu olva...@gmail.com wrote: LIBGL_SHOW_FPS=1 makes GLX print FPS every second while other values do nothing. Extend it so that LIBGL_SHOW_FPS=N will print the FPS every N seconds. --- src/glx/dri2_glx.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c index c54edac..54fc21c 100644 --- a/src/glx/dri2_glx.c +++ b/src/glx/dri2_glx.c @@ -95,7 +95,7 @@ struct dri2_screen { void *driver; int fd; - Bool show_fps; + int show_fps_interval; }; struct dri2_context @@ -764,6 +764,8 @@ unsigned dri2GetSwapEventType(Display* dpy, XID drawable) static void show_fps(struct dri2_drawable *draw) { + const int interval = + ((struct dri2_screen *) draw-base.psc)-show_fps_interval; struct timeval tv; uint64_t current_time; @@ -772,7 +774,7 @@ static void show_fps(struct dri2_drawable *draw) draw-frames++; - if (draw-previous_time + 100 = current_time) { + if (draw-previous_time + interval * 100 = current_time) { if (draw-previous_time) { fprintf(stderr, libGL: FPS = %.1f\n, ((uint64_t)draw-frames * 100) / @@ -859,7 +861,7 @@ dri2SwapBuffers(__GLXDRIdrawable *pdraw, int64_t target_msc, int64_t divisor, target_msc, divisor, remainder); } -if (psc-show_fps) { +if (psc-show_fps_interval) { show_fps(priv); } @@ -1283,7 +1285,9 @@ dri2CreateScreen(int screen, struct glx_display * priv) free(deviceName); tmp = getenv(LIBGL_SHOW_FPS); - psc-show_fps = tmp strcmp(tmp, 1) == 0; + psc-show_fps_interval = (tmp) ? atoi(tmp) : 0; + if (psc-show_fps_interval 0) + psc-show_fps_interval = 0; return psc-base; -- 1.8.4.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] glsl: memory leak in parsing extension statements?
Hi, Looking at the code, is there a potential memory leak in GLSL parser wrt extension statements? glsl_lexer.ll has: PP[_a-zA-Z][_a-zA-Z0-9]* { yylval-identifier = strdup(yytext); return IDENTIFIER; } i.e. calls strdup on the token (there's one other place that calls strdup; whereas most regular identifiers use ralloc_strdup for easier memory management. glsl_parser.yy has this: extension_statement: EXTENSION any_identifier COLON any_identifier EOL { if (!_mesa_glsl_process_extension($2, @2, $4, @4, state)) { YYERROR; } } ; which looks like it processes the extension identifiers, but never frees the memory. -- Aras Pranckevičius work: http://unity3d.com home: http://aras-p.info ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] obtain def-use chain in glsl s-expression
Hi, Mesa community, I am not familiar with S-expression or other forms of lisp languages. I am working on GLSL IR transformation. for example, i want to change a variable to a array of same type. By now , i can find the definition of a variable. How can i update all uses of this variable in S-expression? I think all uses of this variable are in the form of ir_dereference_variable. The difficulty is how to collect d-u chain using hierarchical visitor. I think some optimizer authors must have the same problem. Could you give me a pass which solved my problem? so I can take reference. thanks, --lx ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
Well, for this discussion let's just assume that we fixed the delay in the upper layers of the stack and the driver sees the shader code as soon as the application (if I understood it correctly Vadim has just volunteered for the job). Also let's assume that shaders are small and having allot of shader variants around after they are compiled isn't bad. In this case the probably best solution is to compile early and try to make the shaders as state invariant as possible, e.g. don't do optimizations like getting ride of extra exports for case where we don't need the alpha test or if it's just a dependency on a boolean then have both variants covered by the bytecode and use a bit constant to choose between the two etc... As a second step the driver should create a optimized version of the shader in a background thread when we know all the state that is/was active when the shader is used. Of course you need a bit of heuristic for this, cause sometimes it is better to switch between shader variants and other times it is better to have one variant covering all the different states and just use bit constants to choose between them. Just some thoughts on this topic, Christian. PS: My mail server is once more driving me nuts, please ignore the extra copy if you get this mail twice. Am 28.08.2013 02:07, schrieb Vadim Girlin: On 08/28/2013 02:59 AM, Marek Olšák wrote: First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. I perfectly understand that deferred compilation is exactly the problem that makes the games freeze due to shader compilation on first use when something new appears on the screen, but I don't think we can solve this problem in the *driver* by trying to compile early, because AFAICS currently the shaders are passed to the driver too late anyway, and this happens not only with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at the same time, what I see is that most of GL dumps happen while Heaven shows splash screen with loading progress, but most of the driver's dumps appear on the first frame and few more times during benchmark. It looks like compilation is deferred somewhere in the stack before the driver, or am I missing something? Vadim Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time ago and found that from about 320 compiled shaders, only about 180 (50%) were unique, others were duplicates (detected by comparing the bytecode dumps for them in an automated way), maybe they had different shader keys (which still resulted in the same bytecode), but I suspect duplicate pipe shaders were also involved. Unfortunately I didn't have a time to investigate it more thoroughly since then. So my point is that we don't really need to eliminate shader variants, first we need to eliminate compilation of unused variants and duplicate shaders. Also we might want to consider offloading of the compilation to separate thread(s) and caching of shader binaries
Re: [Mesa-dev] tgsi dump and parsing
- Original Message - On Wed, Aug 28, 2013 at 3:32 PM, Dave Airlie airl...@gmail.com wrote: IMM[0] FLT32 { 0x, 0x, 0x, 0x } # 1.0, 3.0, 2.0, 4.0 If you use %.9g instead of %.4f then floating point numbers will be preserved without loss of precision. I see a -nan in my tests that doesn't get reparsed so I expect hex is still better. oops to list as well this time, sorry. Just in case you are wondering its tests/shaders/glsl-const-builtin-inversesqrt.shader_test and tests/shaders/glsl-const-builtin-normalize.shader_test that throw up the -nan in the dumps. We could teach tgsi_parse to understand `nan` too. We could also have a new tgsi_compare() function that, instead of doing a bare memcmp, it would scan the tokens, and account for the ambiguity of NaNs in IMM FLT32. I just feel a bit awkward that we have `IMM[x] INT32 {...}` and `IMM[x] FLT32 {...}` but end up dumping floats as integers. The whole point of INT32/FLT32 is to allow humans to read the numbers, because it is just syntactic sugar: by definition a shader must behave precisely the same way regardless the IMMS have INT32 or FLT32, as in TGSI the type is not defined by the arguments but rather the opcodes. Also, editing IMM FLT32 by hand will be much harder -- you'll need to convert floats their integer repreentation, as the floats in the comment will likely be ignored.. To me, it seems that would be trading off a concrete advantage -- the usability of the TGSI textual representation --, for this much more dubious advantage of perfect bit-by-bit reversibility of TGSI binary-text shaders. That said, I don't feel strongly either way. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2 03/15] mesa: Add a clone function to mesa hash
On 08/27/2013 08:39 PM, Timothy Arceri wrote: V2: const qualify table parameter Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/main/hash.c | 28 src/mesa/main/hash.h |3 +++ 2 files changed, 31 insertions(+) Reviewed-by: Brian Paul bri...@vmware.com Do you need someone to commit/push your patches for you? -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2 03/15] mesa: Add a clone function to mesa hash
- Original Message - From: Brian Paul bri...@vmware.com On 08/27/2013 08:39 PM, Timothy Arceri wrote: V2: const qualify table parameter Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/main/hash.c | 28 src/mesa/main/hash.h | 3 +++ 2 files changed, 31 insertions(+) Reviewed-by: Brian Paul bri...@vmware.com Do you need someone to commit/push your patches for you? -Brian Hi Brian, Yes I need someone to commit for me. Tim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 28 August 2013 12:17, Marek Olšák mar...@gmail.com wrote: Yeah, st/mesa also compiles shaders on the first use, so we've got 3 places to fix: Wine, st/mesa, the driver. For what it's worth, while Wine definitely has some room for improvement in this regard, in some cases we don't get the shaders any earlier from the application either. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] gallivm: refactor num_lods handling
From: Roland Scheidegger srol...@vmware.com This is just preparation for per-pixel (or per-quad in case of multiple quads) min/mag filter since some assumptions about number of miplevels being equal to number of lods no longer holds true. This change does not change behavior yet (though theoretically when forcing per-element path it might be slower with different min/mag filter since the code will respect this setting even when there's no mip maps now in this case, so some lod calcs will be done per-element just ultimately still the same filter used for all pixels). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 126 +- src/gallium/auxiliary/gallivm/lp_bld_sample.h | 13 +- src/gallium/auxiliary/gallivm/lp_bld_sample_aos.c | 20 +-- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 141 - 4 files changed, 169 insertions(+), 131 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index 89d7249..e1cfd78 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -217,7 +217,7 @@ lp_build_rho(struct lp_build_sample_context *bld, struct lp_build_context *float_size_bld = bld-float_size_in_bld; struct lp_build_context *float_bld = bld-float_bld; struct lp_build_context *coord_bld = bld-coord_bld; - struct lp_build_context *levelf_bld = bld-levelf_bld; + struct lp_build_context *rho_bld = bld-lodf_bld; const unsigned dims = bld-dims; LLVMValueRef ddx_ddy[2]; LLVMBuilderRef builder = bld-gallivm-builder; @@ -231,7 +231,7 @@ lp_build_rho(struct lp_build_sample_context *bld, LLVMValueRef first_level, first_level_vec; unsigned length = coord_bld-type.length; unsigned num_quads = length / 4; - boolean rho_per_quad = levelf_bld-type.length != length; + boolean rho_per_quad = rho_bld-type.length != length; unsigned i; LLVMValueRef i32undef = LLVMGetUndef(LLVMInt32TypeInContext(gallivm-context)); LLVMValueRef rho_xvec, rho_yvec; @@ -259,18 +259,18 @@ lp_build_rho(struct lp_build_sample_context *bld, */ if (rho_per_quad) { rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type, - levelf_bld-type, cube_rho, 0); + rho_bld-type, cube_rho, 0); } else { rho = lp_build_swizzle_scalar_aos(coord_bld, cube_rho, 0, 4); } if (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) { - rho = lp_build_sqrt(levelf_bld, rho); + rho = lp_build_sqrt(rho_bld, rho); } /* Could optimize this for single quad just skip the broadcast */ cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type, -levelf_bld-type, float_size, index0); - rho = lp_build_mul(levelf_bld, cubesize, rho); +rho_bld-type, float_size, index0); + rho = lp_build_mul(rho_bld, cubesize, rho); } else if (derivs !(bld-static_texture_state-target == PIPE_TEXTURE_CUBE)) { LLVMValueRef ddmax[3], ddx[3], ddy[3]; @@ -311,9 +311,9 @@ lp_build_rho(struct lp_build_sample_context *bld, * otherwise would also need different code to per-pixel lod case. */ rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type, -levelf_bld-type, rho, 0); +rho_bld-type, rho, 0); } - rho = lp_build_sqrt(levelf_bld, rho); + rho = lp_build_sqrt(rho_bld, rho); } else { @@ -329,7 +329,7 @@ lp_build_rho(struct lp_build_sample_context *bld, * rho_vec contains per-pixel rho, convert to scalar per quad. */ rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type, -levelf_bld-type, rho, 0); +rho_bld-type, rho, 0); } } } @@ -404,7 +404,7 @@ lp_build_rho(struct lp_build_sample_context *bld, if (rho_per_quad) { rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type, -levelf_bld-type, rho, 0); +rho_bld-type, rho, 0); } else { /* @@ -416,7 +416,7 @@ lp_build_rho(struct lp_build_sample_context *bld, */ rho = lp_build_swizzle_scalar_aos(coord_bld, rho, 0, 4); } - rho = lp_build_sqrt(levelf_bld, rho); + rho = lp_build_sqrt(rho_bld, rho); } else { ddx_ddy[0] = lp_build_abs(coord_bld, ddx_ddy[0]); @@ -497,7 +497,7 @@ lp_build_rho(struct lp_build_sample_context *bld, } if (rho_per_quad) { rho =
[Mesa-dev] [PATCH 2/2] gallivm: don't calculate square root of rho if we use accurate rho method
From: Roland Scheidegger srol...@vmware.com While a sqrt here and there shouldn't hurt much (depending on the cpu) it is possible to completely omit it since rho is only used for calculating lod and there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for calculating lod this means we get a simple mul instead of sqrt (in case of nearest mip filter in fact we don't need to replace the sqrt with something else at all), only in some not very useful path this doesn't work (combined brilinear calculation of int level and fractional lod, accurate rho calc but brilinear filtering seems odd). Apart from being faster as an added bonus this should increase our crappy fractional accuracy of lod, since fast_log2 is only good for ~3bits and this should increase accuracy by one bit (though not used if dimension is just one as we'd need an extra mul there as we never had the squared rho in the first place). --- src/gallium/auxiliary/gallivm/lp_bld_arit.c | 20 +-- src/gallium/auxiliary/gallivm/lp_bld_arit.h |3 +- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 76 + 3 files changed, 56 insertions(+), 43 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c index 09107ff..c295e22 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c @@ -3381,7 +3381,8 @@ lp_build_fast_log2(struct lp_build_context *bld, */ LLVMValueRef lp_build_ilog2(struct lp_build_context *bld, - LLVMValueRef x) + LLVMValueRef x, + boolean x_is_squared) { LLVMBuilderRef builder = bld-gallivm-builder; LLVMValueRef sqrt2 = lp_build_const_vec(bld-gallivm, bld-type, M_SQRT2); @@ -3391,11 +3392,20 @@ lp_build_ilog2(struct lp_build_context *bld, assert(lp_check_value(bld-type, x)); - /* x * 2^(0.5) i.e., add 0.5 to the log2(x) */ - x = LLVMBuildFMul(builder, x, sqrt2, ); + if (x_is_squared) { + struct lp_type i_type = lp_int_type(bld-type); + LLVMValueRef one = lp_build_const_int_vec(bld-gallivm, i_type, 1); + /* ipart = log2(x) + 0.5 = 0.5*(log2(x^2) + 1.0) */ + ipart = lp_build_extract_exponent(bld, x, 1); + ipart = LLVMBuildAShr(builder, ipart, one, ); + } - /* ipart = floor(log2(x) + 0.5) */ - ipart = lp_build_extract_exponent(bld, x, 0); + else { + /* x * 2^(0.5) i.e., add 0.5 to the log2(x) */ + x = LLVMBuildFMul(builder, x, sqrt2, ); + /* ipart = floor(log2(x) + 0.5) */ + ipart = lp_build_extract_exponent(bld, x, 0); + } return ipart; } diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.h b/src/gallium/auxiliary/gallivm/lp_bld_arit.h index d98025e..931175c 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.h @@ -323,7 +323,8 @@ lp_build_fast_log2(struct lp_build_context *bld, LLVMValueRef lp_build_ilog2(struct lp_build_context *bld, - LLVMValueRef x); + LLVMValueRef x, + boolean x_is_squared); void lp_build_exp2_approx(struct lp_build_context *bld, diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index e1cfd78..c34833a 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -232,6 +232,7 @@ lp_build_rho(struct lp_build_sample_context *bld, unsigned length = coord_bld-type.length; unsigned num_quads = length / 4; boolean rho_per_quad = rho_bld-type.length != length; + boolean no_rho_opt = (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) (dims 1); unsigned i; LLVMValueRef i32undef = LLVMGetUndef(LLVMInt32TypeInContext(gallivm-context)); LLVMValueRef rho_xvec, rho_yvec; @@ -264,12 +265,13 @@ lp_build_rho(struct lp_build_sample_context *bld, else { rho = lp_build_swizzle_scalar_aos(coord_bld, cube_rho, 0, 4); } - if (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) { - rho = lp_build_sqrt(rho_bld, rho); - } /* Could optimize this for single quad just skip the broadcast */ cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type, rho_bld-type, float_size, index0); + if (no_rho_opt) { + /* skipping sqrt hence returning rho squared */ + cubesize = lp_build_mul(rho_bld, cubesize, cubesize); + } rho = lp_build_mul(rho_bld, cubesize, rho); } else if (derivs !(bld-static_texture_state-target == PIPE_TEXTURE_CUBE)) { @@ -281,7 +283,11 @@ lp_build_rho(struct lp_build_sample_context *bld, floatdim = lp_build_extract_broadcast(gallivm, bld-float_size_in_type, coord_bld-type, float_size, indexi); - if ((gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) (dims 1)) { + /* + * note that
Re: [Mesa-dev] obtain def-use chain in glsl s-expression
On 08/27/2013 11:34 PM, Liu Xin wrote: Hi, Mesa community, I am not familiar with S-expression or other forms of lisp languages. That's OK - the IR has no resemblance to actual Scheme or Lisp programming. We just print and read the () syntax because it's simple. I am working on GLSL IR transformation. for example, i want to change a variable to a array of same type. By now , i can find the definition of a variable. How can i update all uses of this variable in S-expression? I think all uses of this variable are in the form of ir_dereference_variable. That's right. ir_dereference_variable is an actual use of a variable. The difficulty is how to collect d-u chain using hierarchical visitor. Yeah...sadly, the compiler doesn't have UD chains. Ian was working on those a few years back, but the code never landed. I think some optimizer authors must have the same problem. Could you give me a pass which solved my problem? so I can take reference. You might look at opt_array_splitting. It uses two visitors: First, ir_array_reference_visitor walks over the IR and finds variables it might want to transform, storing those in a hash table. As a second pass, ir_array_splitting_visitor walks over the IR and actually transforms things. ir_array_splitting_visitor is also an ir_rvalue visitor, which is useful for transforming expression trees. You get passed an ir_rvalue ** pointer, and can replace a whole subexpression tree with something else. In your case, you'll probably find ir_dereference_variables and replace them with ir_dereference_arrays. (In the printed IR, replace (var_ref foo) with (array_ref (var_ref new_foo_array) ...subscript...).) Good luck! --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: fix segfaults with aaline and aapoint stages disabled
- Original Message - There are drivers not using these optional stages. Broken by a3ae5dc7dd5c2f8893f86a920247e690e550ebd4. Cc: mesa-sta...@lists.freedesktop.org --- src/gallium/auxiliary/draw/draw_context.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index d1fac0c..641dd82 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -564,8 +564,10 @@ draw_prepare_shader_outputs(struct draw_context *draw) draw_remove_extra_vertex_attribs(draw); draw_prim_assembler_prepare_outputs(draw-ia); draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled); - draw_aapoint_prepare_outputs(draw, draw-pipeline.aapoint); - draw_aaline_prepare_outputs(draw, draw-pipeline.aaline); + if (draw-pipeline.aapoint) + draw_aapoint_prepare_outputs(draw, draw-pipeline.aapoint); + if (draw-pipeline.aaline) + draw_aaline_prepare_outputs(draw, draw-pipeline.aaline); } /** -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Reviewed-by: Jose Fonseca jfons...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Vendor-neutral OpenGL dispatching library
Last September, Andy Ritger proposed updating the Linux OpenGL ABI to allow for multiple vendors to co-exist within a single process and OpenGL applications to dispatch commands to different vendors with per-context granularity. The current proposal [1] calls for a vendor-neutral API library which acts as an intermediate layer between the application and OpenGL vendor implementations that manages this dispatching. I have written a work-in-progress library based on this proposal which implements this API library for GLX. This library leverages some code from Mesa's glapi module to handle TLS and core OpenGL dispatching, as well as the BSD-licensed uthash library [2] and the X.org Xserver's list.h [3]. The library source can be found at this location: http://github.com/NVIDIA/libglvnd In this repository, the file README.md describes the library's code organization and architecture as well as remaining open issues and implementation TODOs. What do people think about this? We are hoping to gather feedback to help facilitate discussion of the implementation of the new ABI during XDC 2013. Any concerns, suggestions, or other comments would be much appreciated. Thanks, Brian [1] https://github.com/aritger/linux-opengl-abi-proposal/blob/master/linux-opengl-abi-proposal.txt [2] http://troydhanson.github.io/uthash/ [3] http://cgit.freedesktop.org/xorg/xserver/tree/include/list.h?id=74469895e39fa38337f59edd64c4031ab9bb51d8 --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. --- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] tgsi dump and parsing
Yes, if we change the representation, we should keep backwards compatability in tgsi text parsing Jose - Original Message - There are some TGSI shaders parsed by tgsi_text_translate which declare floating-point immediates. Any incompatible change to the parser would break them. Marek On Wed, Aug 28, 2013 at 3:57 PM, Jose Fonseca jfons...@vmware.com wrote: - Original Message - On Wed, Aug 28, 2013 at 3:32 PM, Dave Airlie airl...@gmail.com wrote: IMM[0] FLT32 { 0x, 0x, 0x, 0x } # 1.0, 3.0, 2.0, 4.0 If you use %.9g instead of %.4f then floating point numbers will be preserved without loss of precision. I see a -nan in my tests that doesn't get reparsed so I expect hex is still better. oops to list as well this time, sorry. Just in case you are wondering its tests/shaders/glsl-const-builtin-inversesqrt.shader_test and tests/shaders/glsl-const-builtin-normalize.shader_test that throw up the -nan in the dumps. We could teach tgsi_parse to understand `nan` too. We could also have a new tgsi_compare() function that, instead of doing a bare memcmp, it would scan the tokens, and account for the ambiguity of NaNs in IMM FLT32. I just feel a bit awkward that we have `IMM[x] INT32 {...}` and `IMM[x] FLT32 {...}` but end up dumping floats as integers. The whole point of INT32/FLT32 is to allow humans to read the numbers, because it is just syntactic sugar: by definition a shader must behave precisely the same way regardless the IMMS have INT32 or FLT32, as in TGSI the type is not defined by the arguments but rather the opcodes. Also, editing IMM FLT32 by hand will be much harder -- you'll need to convert floats their integer repreentation, as the floats in the comment will likely be ignored.. To me, it seems that would be trading off a concrete advantage -- the usability of the TGSI textual representation --, for this much more dubious advantage of perfect bit-by-bit reversibility of TGSI binary-text shaders. That said, I don't feel strongly either way. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 8/8] i965: Avoid flushing the batch for every blorp op.
On 27 August 2013 15:21, Eric Anholt e...@anholt.net wrote: This brings over the batch-wrap-prevention and aperture space checking code from the normal brw_draw.c path, so that we don't need to flush the batch every time. There's a risk here if the intel_emit_post_sync_nonzero_flush() call isn't high enough up in the state emit sequences -- before, we implicitly had one at the batch flush before any state was emitted, so Mesa's workaround emits didn't really matter. Improves cairo-gl performance by 13.7733% +/- 1.74876% (n=30/32) Improves minecraft apitrace performance by 1.03183% +/- 0.482297% (n=90). Reduces low-resolution GLB 2.7 performance by 1.17553% +/- 0.432263% (n=88) Reduces Lightsmark performance by 3.70246% +/- 0.322432% (n=126) No statistically significant performance difference on unigine tropics (n=10) No statistically significant performance difference on openarena (n=755) The two apps that are hurt happen to include stalls on busy buffer objects, so I think this is an effect of missing out on an opportune flush. --- src/mesa/drivers/dri/i965/brw_blorp.cpp | 50 src/mesa/drivers/dri/i965/brw_blorp.h| 4 --- src/mesa/drivers/dri/i965/gen6_blorp.cpp | 12 src/mesa/drivers/dri/i965/gen7_blorp.cpp | 1 - 4 files changed, 50 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp b/src/mesa/drivers/dri/i965/brw_blorp.cpp index 1576ff2..c566d1d 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp @@ -21,6 +21,7 @@ * IN THE SOFTWARE. */ +#include errno.h #include intel_batchbuffer.h #include intel_fbo.h @@ -191,6 +192,26 @@ intel_hiz_exec(struct brw_context *brw, struct intel_mipmap_tree *mt, void brw_blorp_exec(struct brw_context *brw, const brw_blorp_params *params) { + struct gl_context *ctx = brw-ctx; + uint32_t estimated_max_batch_usage = 1500; + bool check_aperture_failed_once = false; + + /* Flush the sampler and render caches. We definitely need to flush the +* sampler cache so that we get updated contents from the render cache for +* the glBlitFramebuffer() source. Also, we are sometimes warned in the +* docs to flush the cache between reinterpretations of the same surface +* data with different formats, which blorp does for stencil and depth +* data. +*/ + intel_batchbuffer_emit_mi_flush(brw); + +retry: + intel_batchbuffer_require_space(brw, estimated_max_batch_usage, false); + intel_batchbuffer_save_state(brw); + drm_intel_bo *saved_bo = brw-batch.bo; + uint32_t saved_used = brw-batch.used; + uint32_t saved_state_batch_offset = brw-batch.state_batch_offset; + switch (brw-gen) { case 6: gen6_blorp_exec(brw, params); @@ -204,6 +225,35 @@ brw_blorp_exec(struct brw_context *brw, const brw_blorp_params *params) break; } Would it be feasible to add an assertion here to verify that the amount of batch space actually used by this blorp call is less than or equal to estimated_max_batch_usage? That would give me a lot of increased confidence that the magic number 1500 is correct. With the added assertion, the series is: Reviewed-by: Paul Berry stereotype...@gmail.com + /* Make sure we didn't wrap the batch unintentionally, and make sure we +* reserved enough space that a wrap will never happen. +*/ + assert(brw-batch.bo == saved_bo); + assert((brw-batch.used - saved_used) * 4 + + (saved_state_batch_offset - brw-batch.state_batch_offset) + estimated_max_batch_usage); + /* Shut up compiler warnings on release build */ + (void)saved_bo; + (void)saved_used; + (void)saved_state_batch_offset; + + /* Check if the blorp op we just did would make our batch likely to fail to +* map all the BOs into the GPU at batch exec time later. If so, flush the +* batch and try again with nothing else in the batch. +*/ + if (dri_bufmgr_check_aperture_space(brw-batch.bo, 1)) { + if (!check_aperture_failed_once) { + check_aperture_failed_once = true; + intel_batchbuffer_reset_to_saved(brw); + intel_batchbuffer_flush(brw); + goto retry; + } else { + int ret = intel_batchbuffer_flush(brw); + WARN_ONCE(ret == -ENOSPC, + i965: blorp emit exceeded available aperture space\n); + } + } + if (unlikely(brw-always_flush_batch)) intel_batchbuffer_flush(brw); diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h b/src/mesa/drivers/dri/i965/brw_blorp.h index dceb4fc..e03e27f 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.h +++ b/src/mesa/drivers/dri/i965/brw_blorp.h @@ -370,10 +370,6 @@ void gen6_blorp_init(struct brw_context *brw); void -gen6_blorp_emit_batch_head(struct brw_context *brw, - const brw_blorp_params *params); -
[Mesa-dev] [PATCH] radeonsi: Do not suspend timer queries
Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de --- src/gallium/drivers/radeonsi/r600.h| 1 + src/gallium/drivers/radeonsi/r600_hw_context.c | 28 ++ src/gallium/drivers/radeonsi/r600_query.c | 7 +-- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 2 +- src/gallium/drivers/radeonsi/radeonsi_pipe.h | 4 ++-- src/gallium/drivers/radeonsi/si_state_draw.c | 2 +- 6 files changed, 30 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600.h b/src/gallium/drivers/radeonsi/r600.h index ce0468d..ac3b2f1 100644 --- a/src/gallium/drivers/radeonsi/r600.h +++ b/src/gallium/drivers/radeonsi/r600.h @@ -102,6 +102,7 @@ void si_context_emit_fence(struct r600_context *ctx, struct si_resource *fence, unsigned offset, unsigned value); void r600_context_draw_opaque_count(struct r600_context *ctx, struct r600_so_target *t); +bool si_is_timer_query(unsigned type); bool si_query_needs_begin(unsigned type); void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean count_draw_in); diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c b/src/gallium/drivers/radeonsi/r600_hw_context.c index 59b2d70..f050b3b 100644 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c @@ -110,6 +110,13 @@ err: return; } +bool si_is_timer_query(unsigned type) +{ + return type == PIPE_QUERY_TIME_ELAPSED || + type == PIPE_QUERY_TIMESTAMP || + type == PIPE_QUERY_TIMESTAMP_DISJOINT; +} + bool si_query_needs_begin(unsigned type) { return type != PIPE_QUERY_TIMESTAMP; @@ -139,7 +146,7 @@ void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, } /* Count in queries_suspend. */ - num_dw += ctx-num_cs_dw_queries_suspend; + num_dw += ctx-num_cs_dw_nontimer_queries_suspend; /* Count in streamout_end at the end of CS. */ num_dw += ctx-num_cs_dw_streamout_end; @@ -211,7 +218,7 @@ void si_context_flush(struct r600_context *ctx, unsigned flags) return; /* suspend queries */ - if (ctx-num_cs_dw_queries_suspend) { + if (ctx-num_cs_dw_nontimer_queries_suspend) { r600_context_queries_suspend(ctx); queries_suspended = true; } @@ -506,7 +513,9 @@ void r600_query_begin(struct r600_context *ctx, struct r600_query *query) cs-buf[cs-cdw++] = PKT3(PKT3_NOP, 0, 0); cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, RADEON_USAGE_WRITE); - ctx-num_cs_dw_queries_suspend += query-num_cs_dw; + if (!si_is_timer_query(query-type)) { + ctx-num_cs_dw_nontimer_queries_suspend += query-num_cs_dw; + } } void r600_query_end(struct r600_context *ctx, struct r600_query *query) @@ -565,7 +574,10 @@ void r600_query_end(struct r600_context *ctx, struct r600_query *query) cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, RADEON_USAGE_WRITE); query-results_end = (query-results_end + query-result_size) % query-buffer-b.b.width0; - ctx-num_cs_dw_queries_suspend -= query-num_cs_dw; + + if (si_query_needs_begin(query-type) !si_is_timer_query(query-type)) { + ctx-num_cs_dw_nontimer_queries_suspend -= query-num_cs_dw; + } } void r600_query_predication(struct r600_context *ctx, struct r600_query *query, int operation, @@ -712,19 +724,19 @@ void r600_context_queries_suspend(struct r600_context *ctx) { struct r600_query *query; - LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) { + LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) { r600_query_end(ctx, query); } - assert(ctx-num_cs_dw_queries_suspend == 0); + assert(ctx-num_cs_dw_nontimer_queries_suspend == 0); } void r600_context_queries_resume(struct r600_context *ctx) { struct r600_query *query; - assert(ctx-num_cs_dw_queries_suspend == 0); + assert(ctx-num_cs_dw_nontimer_queries_suspend == 0); - LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) { + LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) { r600_query_begin(ctx, query); } } diff --git a/src/gallium/drivers/radeonsi/r600_query.c b/src/gallium/drivers/radeonsi/r600_query.c index 927577c..aa51e74 100644 --- a/src/gallium/drivers/radeonsi/r600_query.c +++ b/src/gallium/drivers/radeonsi/r600_query.c @@ -50,7 +50,10 @@ static void r600_begin_query(struct pipe_context *ctx, struct pipe_query *query) memset(rquery-result, 0, sizeof(rquery-result)); rquery-results_start = rquery-results_end; r600_query_begin(rctx, (struct r600_query *)query); - LIST_ADDTAIL(rquery-list, rctx-active_query_list); + + if (!si_is_timer_query(rquery-type)) { +
[Mesa-dev] [PATCH] radeon/uvd: fix MPEG2/4 ref frame index limit
From: Christian König christian.koe...@amd.com Otherwise the first few frames have an incorrect reference index. Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeon/radeon_uvd.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_uvd.c b/src/gallium/drivers/radeon/radeon_uvd.c index f3652a6..3e00977 100644 --- a/src/gallium/drivers/radeon/radeon_uvd.c +++ b/src/gallium/drivers/radeon/radeon_uvd.c @@ -493,8 +493,8 @@ uint8_t pquant /* extract the frame number from a referenced video buffer */ static uint32_t get_ref_pic_idx(struct ruvd_decoder *dec, struct pipe_video_buffer *ref) { - uint32_t min = dec-frame_number - NUM_MPEG2_REFS; - uint32_t max = dec-frame_number - 1; + uint32_t min = MAX2(dec-frame_number, NUM_MPEG2_REFS) - NUM_MPEG2_REFS; + uint32_t max = MAX2(dec-frame_number, 1) - 1; uintptr_t frame; /* seems to be the most sane fallback */ -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] r600g,radeonsi: remove unused variables
--- src/gallium/drivers/r600/r600_pipe.h | 3 --- src/gallium/drivers/radeonsi/radeonsi_pipe.h | 5 - 2 files changed, 8 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 21d68c9..1564cc3 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -417,9 +417,6 @@ struct r600_fence_block { struct list_headhead; }; -#define R600_CONSTANT_ARRAY_SIZE 256 -#define R600_RESOURCE_ARRAY_SIZE 160 - struct r600_constbuf_state { struct r600_atomatom; diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h b/src/gallium/drivers/radeonsi/radeonsi_pipe.h index f9e4999..cd5a4f7 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h @@ -102,8 +102,6 @@ struct r600_textures_info { uint32_tdepth_texture_mask; /* which textures are depth */ uint32_tcompressed_colortex_mask; unsignedn_samplers; - boolsamplers_dirty; - boolis_array_sampler[NUM_TEX_UNITS]; }; struct r600_fence { @@ -120,9 +118,6 @@ struct r600_fence_block { struct list_headhead; }; -#define R600_CONSTANT_ARRAY_SIZE 256 -#define R600_RESOURCE_ARRAY_SIZE 160 - struct r600_constbuf_state { struct pipe_constant_buffer cb[2]; -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] radeonsi: cleanup initialization of SGPR shader parameters
--- src/gallium/drivers/radeonsi/radeonsi_shader.c | 32 +++--- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c b/src/gallium/drivers/radeonsi/radeonsi_shader.c index 2b1928a..13bc92c 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c @@ -1349,7 +1349,7 @@ static void create_function(struct si_shader_context *si_shader_ctx) struct lp_build_tgsi_context *bld_base = si_shader_ctx-radeon_bld.soa.bld_base; struct gallivm_state *gallivm = bld_base-base.gallivm; LLVMTypeRef params[20], f32, i8, i32, v2i32, v3i32; - unsigned i; + unsigned i, last_sgpr, num_params; i8 = LLVMInt8TypeInContext(gallivm-context); i32 = LLVMInt32TypeInContext(gallivm-context); @@ -1361,17 +1361,21 @@ static void create_function(struct si_shader_context *si_shader_ctx) params[SI_PARAM_SAMPLER] = params[SI_PARAM_CONST]; params[SI_PARAM_RESOURCE] = LLVMPointerType(LLVMVectorType(i8, 32), CONST_ADDR_SPACE); - if (si_shader_ctx-type == TGSI_PROCESSOR_VERTEX) { - params[SI_PARAM_VERTEX_BUFFER] = params[SI_PARAM_SAMPLER]; + switch (si_shader_ctx-type) { + case TGSI_PROCESSOR_VERTEX: + params[SI_PARAM_VERTEX_BUFFER] = params[SI_PARAM_CONST]; params[SI_PARAM_START_INSTANCE] = i32; + last_sgpr = SI_PARAM_START_INSTANCE; params[SI_PARAM_VERTEX_ID] = i32; params[SI_PARAM_DUMMY_0] = i32; params[SI_PARAM_DUMMY_1] = i32; params[SI_PARAM_INSTANCE_ID] = i32; - radeon_llvm_create_func(si_shader_ctx-radeon_bld, params, 9); + num_params = SI_PARAM_INSTANCE_ID+1; + break; - } else { + case TGSI_PROCESSOR_FRAGMENT: params[SI_PARAM_PRIM_MASK] = i32; + last_sgpr = SI_PARAM_PRIM_MASK; params[SI_PARAM_PERSP_SAMPLE] = v2i32; params[SI_PARAM_PERSP_CENTER] = v2i32; params[SI_PARAM_PERSP_CENTROID] = v2i32; @@ -1388,18 +1392,20 @@ static void create_function(struct si_shader_context *si_shader_ctx) params[SI_PARAM_ANCILLARY] = f32; params[SI_PARAM_SAMPLE_COVERAGE] = f32; params[SI_PARAM_POS_FIXED_PT] = f32; - radeon_llvm_create_func(si_shader_ctx-radeon_bld, params, 20); + num_params = SI_PARAM_POS_FIXED_PT+1; + break; + + default: + assert(0 unimplemented shader); + return; } + assert(num_params = Elements(params)); + radeon_llvm_create_func(si_shader_ctx-radeon_bld, params, num_params); radeon_llvm_shader_type(si_shader_ctx-radeon_bld.main_fn, si_shader_ctx-type); - for (i = SI_PARAM_CONST; i = SI_PARAM_VERTEX_BUFFER; ++i) { - LLVMValueRef P = LLVMGetParam(si_shader_ctx-radeon_bld.main_fn, i); - LLVMAddAttribute(P, LLVMInRegAttribute); - } - if (si_shader_ctx-type == TGSI_PROCESSOR_VERTEX) { - LLVMValueRef P = LLVMGetParam(si_shader_ctx-radeon_bld.main_fn, - SI_PARAM_START_INSTANCE); + for (i = 0; i = last_sgpr; ++i) { + LLVMValueRef P = LLVMGetParam(si_shader_ctx-radeon_bld.main_fn, i); LLVMAddAttribute(P, LLVMInRegAttribute); } -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] radeonsi: simplify and improve flushing
This mimics r600g. The R600_CONTEXT_xxx flags are added to rctx-b.flags and si_emit_cache_flush emits the packets. That's it. The shared radeon code tells us when the streamout cache should be flushed, so we have to check the flags anyway. There is a new atom cache_flush, because caches must be flushed *after* resource descriptors are changed in memory. Functional changes: * Write caches are flushed at the end of CS and read caches are flushed at its beginning. * Sampler view states are removed from si_state, they only held the flush flags. * Everytime a shader is changed, the I cache is flushed. Is this needed? Due to a hw bug, this also flushes the K cache. * The WRITE_DATA packet is changed to use TC, which fixes a rendering issue in openarena. I'm not sure how TC interacts with CP DMA, but for now it seems to work better than any other solution I tried. (BTW CIK allows us to use TC for CP DMA.) * Flush the K cache instead of the texture cache when updating resource descriptors (due to a hw bug, this also flushes the I cache). I think the K cache flush is correct here, but I'm not sure if the texture cache should be flushed too (probably not considering we use TC for WRITE_DATA, but we don't use TC for CP DMA). * The number of resource contexts is decreased to 16. With all of these cache changes, 4 doesn't work, but 8 works, which suggests I'm actually doing the right thing here and the pipeline isn't drained during flushes. --- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeonsi/r600.h| 3 - src/gallium/drivers/radeonsi/r600_hw_context.c | 45 +++--- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 4 + src/gallium/drivers/radeonsi/radeonsi_pipe.h | 8 +- src/gallium/drivers/radeonsi/radeonsi_pm4.c| 11 --- src/gallium/drivers/radeonsi/radeonsi_pm4.h| 2 - src/gallium/drivers/radeonsi/si_commands.c | 9 -- src/gallium/drivers/radeonsi/si_descriptors.c | 16 ++-- src/gallium/drivers/radeonsi/si_state.c| 46 +- src/gallium/drivers/radeonsi/si_state.h| 9 +- src/gallium/drivers/radeonsi/si_state_draw.c | 111 - 12 files changed, 125 insertions(+), 140 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index 4b993ee..bd13488 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -42,6 +42,7 @@ #define R600_CONTEXT_INV_VERTEX_CACHE (1 0) #define R600_CONTEXT_INV_TEX_CACHE (1 1) #define R600_CONTEXT_INV_CONST_CACHE (1 2) +#define R600_CONTEXT_INV_SHADER_CACHE (1 3) /* read-write caches */ #define R600_CONTEXT_STREAMOUT_FLUSH (1 8) #define R600_CONTEXT_FLUSH_AND_INV (1 9) diff --git a/src/gallium/drivers/radeonsi/r600.h b/src/gallium/drivers/radeonsi/r600.h index ebadd97..46cfb14 100644 --- a/src/gallium/drivers/radeonsi/r600.h +++ b/src/gallium/drivers/radeonsi/r600.h @@ -69,9 +69,6 @@ struct r600_query { struct list_headlist; }; -#define R600_CONTEXT_DST_CACHES_DIRTY (1 1) -#define R600_CONTEXT_CHECK_EVENT_FLUSH (1 2) - struct r600_context; struct r600_screen; diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c b/src/gallium/drivers/radeonsi/r600_hw_context.c index 5631bdb..5826349 100644 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c @@ -150,7 +150,7 @@ void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, } /* Count in framebuffer cache flushes at the end of CS. */ - num_dw += 7; /* one SURFACE_SYNC and CACHE_FLUSH_AND_INV (r6xx-only) */ + num_dw += ctx-atoms.cache_flush-num_dw; /* Save 16 dwords for the fence mechanism. */ num_dw += 16; @@ -167,37 +167,6 @@ void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, } } -static void r600_flush_framebuffer(struct r600_context *ctx) -{ - struct si_pm4_state *pm4; - - if (!(ctx-flags R600_CONTEXT_DST_CACHES_DIRTY)) - return; - - pm4 = si_pm4_alloc_state(ctx); - - if (pm4 == NULL) - return; - - si_cmd_surface_sync(pm4, S_0085F0_CB0_DEST_BASE_ENA(1) | - S_0085F0_CB1_DEST_BASE_ENA(1) | - S_0085F0_CB2_DEST_BASE_ENA(1) | - S_0085F0_CB3_DEST_BASE_ENA(1) | - S_0085F0_CB4_DEST_BASE_ENA(1) | - S_0085F0_CB5_DEST_BASE_ENA(1) | - S_0085F0_CB6_DEST_BASE_ENA(1) | - S_0085F0_CB7_DEST_BASE_ENA(1) | - S_0085F0_DB_ACTION_ENA(1) | - S_0085F0_DB_DEST_BASE_ENA(1)); - si_cmd_flush_and_inv_cb_meta(pm4); -
[Mesa-dev] [PATCH 0/6] radeonsi: Minor cleanups and improvements
This series contains the changes my transform feedback work depends on, but there are some useful fixes too, making it worth comitting earlier. The last patch is the most important one, because it fixes the issues we had with the emission of resource descriptors that we had to use 256 resource contexts as a workaround. Further testing has shown that even 256 wasn't enough. With that patch, we only need 8 or 16 contexts as originally expected. I also made the first step towards sharing code between r600g and radeonsi and it's what made this series so big: 54 files changed, 2448 insertions(+), 2532 deletions(-) Please review. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] radeonsi: convert constant buffers to si_descriptors
There is a new class si_buffer_resources, which should be good enough for implementing any kind of buffer bindings (constant buffers, vertex buffers, streamout buffers, shader storage buffers, etc.) I don't even keep a copy of pipe_constant_buffer - we don't need it. The main motivation behind this is to have a well-tested infrastrusture for setting up streamout buffers. --- src/gallium/drivers/radeonsi/radeonsi_pipe.h | 10 +- src/gallium/drivers/radeonsi/si_descriptors.c | 143 +- src/gallium/drivers/radeonsi/si_state.c | 42 src/gallium/drivers/radeonsi/si_state.h | 15 ++- src/gallium/drivers/radeonsi/si_state_draw.c | 80 ++ 5 files changed, 162 insertions(+), 128 deletions(-) diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h b/src/gallium/drivers/radeonsi/radeonsi_pipe.h index ef531fb..e6e99c7 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h @@ -115,13 +115,6 @@ struct r600_fence_block { struct list_headhead; }; -struct r600_constbuf_state -{ - struct pipe_constant_buffer cb[2]; - uint32_tenabled_mask; - uint32_tdirty_mask; -}; - #define SI_NUM_ATOMS(rctx) (sizeof((rctx)-atoms)/sizeof((rctx)-atoms.array[0])) #define SI_NUM_SHADERS (PIPE_SHADER_FRAGMENT+1) @@ -138,6 +131,7 @@ struct r600_context { union { struct { + struct r600_atom *const_buffers[SI_NUM_SHADERS]; struct r600_atom *sampler_views[SI_NUM_SHADERS]; }; struct r600_atom *array[0]; @@ -164,7 +158,7 @@ struct r600_context { /* shader information */ unsignedsprite_coord_enable; unsignedexport_16bpc; - struct r600_constbuf_state constbuf_state[PIPE_SHADER_TYPES]; + struct si_buffer_resources const_buffers[SI_NUM_SHADERS]; struct r600_textures_info samplers[SI_NUM_SHADERS]; struct r600_resource*border_color_table; unsignedborder_color_offset; diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index db0da75..2983d75 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -32,7 +32,7 @@ #define SI_NUM_CONTEXTS 256 -static const uint32_t null_desc[8]; /* zeros */ +static uint32_t null_desc[8]; /* zeros */ /* Set this if you want the 3D engine to wait until CP DMA is done. * It should be set on the last CP DMA packet. */ @@ -170,7 +170,7 @@ static void si_emit_shader_pointer(struct r600_context *rctx, static void si_emit_descriptors(struct r600_context *rctx, struct si_descriptors *desc, - const uint32_t **descriptors) + uint32_t **descriptors) { struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs; uint64_t va_base; @@ -325,6 +325,135 @@ void si_set_sampler_view(struct r600_context *rctx, unsigned shader, si_update_descriptors(views-desc); } +/* BUFFER RESOURCES */ + +static void si_emit_buffer_resources(struct r600_context *rctx, struct r600_atom *atom) +{ + struct si_buffer_resources *buffers = (struct si_buffer_resources*)atom; + + si_emit_descriptors(rctx, buffers-desc, buffers-desc_data); +} + +static void si_init_buffer_resources(struct r600_context *rctx, +struct si_buffer_resources *buffers, +unsigned num_buffers, unsigned shader, +unsigned shader_userdata_index, +enum radeon_bo_usage shader_usage) +{ + int i; + + buffers-num_buffers = num_buffers; + buffers-shader_usage = shader_usage; + buffers-buffers = CALLOC(num_buffers, sizeof(struct pipe_resource*)); + buffers-desc_storage = CALLOC(num_buffers, sizeof(uint32_t) * 4); + + /* si_emit_descriptors only accepts an array of arrays. +* This adds such an array. */ + buffers-desc_data = CALLOC(num_buffers, sizeof(uint32_t*)); + for (i = 0; i num_buffers; i++) { + buffers-desc_data[i] = buffers-desc_storage[i*4]; + } + + si_init_descriptors(rctx, buffers-desc, + si_get_shader_user_data_base(shader) + + shader_userdata_index*4, 4, num_buffers, + si_emit_buffer_resources); +} + +static void si_release_buffer_resources(struct si_buffer_resources *buffers) +{ + int i; + + for (i = 0; i Elements(buffers-buffers); i++) { + pipe_resource_reference(buffers-buffers[i], NULL); + } + + FREE(buffers-buffers); +
Re: [Mesa-dev] [PATCH] radeonsi: Do not suspend timer queries
Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Wed, Aug 28, 2013 at 6:42 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de --- src/gallium/drivers/radeonsi/r600.h| 1 + src/gallium/drivers/radeonsi/r600_hw_context.c | 28 ++ src/gallium/drivers/radeonsi/r600_query.c | 7 +-- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 2 +- src/gallium/drivers/radeonsi/radeonsi_pipe.h | 4 ++-- src/gallium/drivers/radeonsi/si_state_draw.c | 2 +- 6 files changed, 30 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600.h b/src/gallium/drivers/radeonsi/r600.h index ce0468d..ac3b2f1 100644 --- a/src/gallium/drivers/radeonsi/r600.h +++ b/src/gallium/drivers/radeonsi/r600.h @@ -102,6 +102,7 @@ void si_context_emit_fence(struct r600_context *ctx, struct si_resource *fence, unsigned offset, unsigned value); void r600_context_draw_opaque_count(struct r600_context *ctx, struct r600_so_target *t); +bool si_is_timer_query(unsigned type); bool si_query_needs_begin(unsigned type); void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean count_draw_in); diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c b/src/gallium/drivers/radeonsi/r600_hw_context.c index 59b2d70..f050b3b 100644 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c @@ -110,6 +110,13 @@ err: return; } +bool si_is_timer_query(unsigned type) +{ + return type == PIPE_QUERY_TIME_ELAPSED || + type == PIPE_QUERY_TIMESTAMP || + type == PIPE_QUERY_TIMESTAMP_DISJOINT; +} + bool si_query_needs_begin(unsigned type) { return type != PIPE_QUERY_TIMESTAMP; @@ -139,7 +146,7 @@ void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, } /* Count in queries_suspend. */ - num_dw += ctx-num_cs_dw_queries_suspend; + num_dw += ctx-num_cs_dw_nontimer_queries_suspend; /* Count in streamout_end at the end of CS. */ num_dw += ctx-num_cs_dw_streamout_end; @@ -211,7 +218,7 @@ void si_context_flush(struct r600_context *ctx, unsigned flags) return; /* suspend queries */ - if (ctx-num_cs_dw_queries_suspend) { + if (ctx-num_cs_dw_nontimer_queries_suspend) { r600_context_queries_suspend(ctx); queries_suspended = true; } @@ -506,7 +513,9 @@ void r600_query_begin(struct r600_context *ctx, struct r600_query *query) cs-buf[cs-cdw++] = PKT3(PKT3_NOP, 0, 0); cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, RADEON_USAGE_WRITE); - ctx-num_cs_dw_queries_suspend += query-num_cs_dw; + if (!si_is_timer_query(query-type)) { + ctx-num_cs_dw_nontimer_queries_suspend += query-num_cs_dw; + } } void r600_query_end(struct r600_context *ctx, struct r600_query *query) @@ -565,7 +574,10 @@ void r600_query_end(struct r600_context *ctx, struct r600_query *query) cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, RADEON_USAGE_WRITE); query-results_end = (query-results_end + query-result_size) % query-buffer-b.b.width0; - ctx-num_cs_dw_queries_suspend -= query-num_cs_dw; + + if (si_query_needs_begin(query-type) !si_is_timer_query(query-type)) { + ctx-num_cs_dw_nontimer_queries_suspend -= query-num_cs_dw; + } } void r600_query_predication(struct r600_context *ctx, struct r600_query *query, int operation, @@ -712,19 +724,19 @@ void r600_context_queries_suspend(struct r600_context *ctx) { struct r600_query *query; - LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) { + LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) { r600_query_end(ctx, query); } - assert(ctx-num_cs_dw_queries_suspend == 0); + assert(ctx-num_cs_dw_nontimer_queries_suspend == 0); } void r600_context_queries_resume(struct r600_context *ctx) { struct r600_query *query; - assert(ctx-num_cs_dw_queries_suspend == 0); + assert(ctx-num_cs_dw_nontimer_queries_suspend == 0); - LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) { + LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) { r600_query_begin(ctx, query); } } diff --git a/src/gallium/drivers/radeonsi/r600_query.c b/src/gallium/drivers/radeonsi/r600_query.c index 927577c..aa51e74 100644 --- a/src/gallium/drivers/radeonsi/r600_query.c +++ b/src/gallium/drivers/radeonsi/r600_query.c @@ -50,7 +50,10 @@ static void r600_begin_query(struct pipe_context *ctx, struct pipe_query *query) memset(rquery-result, 0, sizeof(rquery-result));
Re: [Mesa-dev] [PATCH 13/22] i965/gs: Implement support for geometry shader surfaces.
On 26 August 2013 15:12, Paul Berry stereotype...@gmail.com wrote: This patch implements pull constant upload, binding table upload, and surface setup for geometry shaders, by re-using vertex shader code that was generalized in previous patches. Based on work by Eric Anholt e...@anholt.net. --- src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_context.h | 2 + src/mesa/drivers/dri/i965/brw_gs_surface_state.c | 123 +++ src/mesa/drivers/dri/i965/brw_state.h| 3 + src/mesa/drivers/dri/i965/brw_state_upload.c | 3 + 5 files changed, 132 insertions(+) create mode 100644 src/mesa/drivers/dri/i965/brw_gs_surface_state.c diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 290cd93..81a16ff 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -63,6 +63,7 @@ i965_FILES = \ brw_gs.c \ brw_gs_emit.c \ brw_gs_state.c \ + brw_gs_surface_state.c \ brw_interpolation_map.c \ brw_lower_texture_gradients.cpp \ brw_misc_state.c \ diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 35193a6..622b5c8 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -148,6 +148,7 @@ enum brw_state_id { BRW_STATE_BATCH, BRW_STATE_INDEX_BUFFER, BRW_STATE_VS_CONSTBUF, + BRW_STATE_GS_CONSTBUF, BRW_STATE_PROGRAM_CACHE, BRW_STATE_STATE_BASE_ADDRESS, BRW_STATE_VUE_MAP_VS, @@ -185,6 +186,7 @@ enum brw_state_id { /** \see brw.state.depth_region */ #define BRW_NEW_INDEX_BUFFER (1 BRW_STATE_INDEX_BUFFER) #define BRW_NEW_VS_CONSTBUF(1 BRW_STATE_VS_CONSTBUF) +#define BRW_NEW_GS_CONSTBUF(1 BRW_STATE_GS_CONSTBUF) #define BRW_NEW_PROGRAM_CACHE (1 BRW_STATE_PROGRAM_CACHE) #define BRW_NEW_STATE_BASE_ADDRESS (1 BRW_STATE_STATE_BASE_ADDRESS) #define BRW_NEW_VUE_MAP_VS (1 BRW_STATE_VUE_MAP_VS) diff --git a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c new file mode 100644 index 000..d3d48ff --- /dev/null +++ b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c @@ -0,0 +1,123 @@ +/* + * Copyright © 2013 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include main/mtypes.h +#include program/prog_parameter.h + +#include brw_context.h +#include brw_state.h + + +/* Creates a new GS constant buffer reflecting the current GS program's + * constants, if needed by the GS program. + * + * Otherwise, constants go through the CURBEs using the brw_constant_buffer + * state atom. + */ +static void +brw_upload_gs_pull_constants(struct brw_context *brw) +{ + struct brw_vec4_context_base *vec4_ctx = brw-gs.base; + + /* BRW_NEW_GEOMETRY_PROGRAM */ + struct brw_geometry_program *gp = + (struct brw_geometry_program *) brw-geometry_program; + + if (!gp) + return; + + /* CACHE_NEW_GS_PROG */ + const struct brw_vec4_prog_data *prog_data = brw-gs.prog_data-base; + + /* _NEW_PROGRAM_CONSTANTS */ + brw_upload_vec4_pull_constants(brw, BRW_NEW_GS_CONSTBUF, gp-program.Base, + vec4_ctx, prog_data); +} + +const struct brw_tracked_state brw_gs_pull_constants = { + .dirty = { + .mesa = (_NEW_PROGRAM_CONSTANTS), + .brw = (BRW_NEW_BATCH | BRW_NEW_GEOMETRY_PROGRAM), + .cache = CACHE_NEW_GS_PROG, + }, + .emit = brw_upload_gs_pull_constants, +}; + +static void +brw_upload_gs_ubo_surfaces(struct brw_context *brw) +{ + struct gl_context *ctx = brw-ctx; + struct brw_vec4_context_base *vec4_ctx =
[Mesa-dev] [PATCH] vbo: Implement new gs prim types in vbo_count_tessellated_primitives.
--- src/mesa/vbo/vbo_exec.c | 12 1 file changed, 12 insertions(+) diff --git a/src/mesa/vbo/vbo_exec.c b/src/mesa/vbo/vbo_exec.c index 9c20bde..aa2c7b0 100644 --- a/src/mesa/vbo/vbo_exec.c +++ b/src/mesa/vbo/vbo_exec.c @@ -149,6 +149,18 @@ vbo_count_tessellated_primitives(GLenum mode, GLuint count, case GL_QUADS: num_primitives = (count / 4) * 2; break; + case GL_LINES_ADJACENCY: + num_primitives = count / 4; + break; + case GL_LINE_STRIP_ADJACENCY: + num_primitives = count = 4 ? count - 3 : 0; + break; + case GL_TRIANGLES_ADJACENCY: + num_primitives = count / 6; + break; + case GL_TRIANGLE_STRIP_ADJACENCY: + num_primitives = count = 6 ? (count - 4) / 2 : 0; + break; default: assert(!Unexpected primitive type in count_tessellated_primitives); num_primitives = 0; -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On 08/28/2013 01:15 PM, Christian König wrote: Well, for this discussion let's just assume that we fixed the delay in the upper layers of the stack and the driver sees the shader code as soon as the application (if I understood it correctly Vadim has just volunteered for the job). No, I'm not really volunteering to implement that. :) I'm not even sure if it's possible in reasonable time. In fact it was more like a theoretical discussion about what would be required for the early compilation in the driver to make sense. Perhaps I failed to explain it, but actually my point is that while the compilation is deferred in upper layers and nobody is going to change this (if it's possible at all), it doesn't make sense to try compiling early in the driver. I think we might prefer to defer the compilation in the driver as well - it doesn't make overall situation any worse, but can make it better by not compiling unused variants at least. Vadim Also let's assume that shaders are small and having allot of shader variants around after they are compiled isn't bad. In this case the probably best solution is to compile early and try to make the shaders as state invariant as possible, e.g. don't do optimizations like getting ride of extra exports for case where we don't need the alpha test or if it's just a dependency on a boolean then have both variants covered by the bytecode and use a bit constant to choose between the two etc... As a second step the driver should create a optimized version of the shader in a background thread when we know all the state that is/was active when the shader is used. Of course you need a bit of heuristic for this, cause sometimes it is better to switch between shader variants and other times it is better to have one variant covering all the different states and just use bit constants to choose between them. Just some thoughts on this topic, Christian. PS: My mail server is once more driving me nuts, please ignore the extra copy if you get this mail twice. Am 28.08.2013 02:07, schrieb Vadim Girlin: On 08/28/2013 02:59 AM, Marek Olšák wrote: First, you won't really see any significant continual difference in frame rate no matter how many shader variants you have unless you are very CPU-bound. The problem is shader compilation on the first use, that's where you get a big hiccup. Try Skyrim for example: You have to first look around and see every object that's around you and get unpleasant stuttering before you can actually go on and play the game. Yes, this also Wine's fault that it compiles shaders on the first use too, but we don't have to be as bad as Wine, do we? Valve also reported shader recompilations on the first use being a serious issue with open source drivers. I perfectly understand that deferred compilation is exactly the problem that makes the games freeze due to shader compilation on first use when something new appears on the screen, but I don't think we can solve this problem in the *driver* by trying to compile early, because AFAICS currently the shaders are passed to the driver too late anyway, and this happens not only with wine. E.g. when I run Heaven in a window with MESA_GLSL=dump R600_DEBUG=ps,vs, so that I can see Heaven's window and console output at the same time, what I see is that most of GL dumps happen while Heaven shows splash screen with loading progress, but most of the driver's dumps appear on the first frame and few more times during benchmark. It looks like compilation is deferred somewhere in the stack before the driver, or am I missing something? Vadim Marek On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 12:43 AM, Marek Olšák wrote: Shader variants are BAD, BAD, BAD. Have you ever played an AAA game with a Mesa driver that likes to compile shader variants on first use? It's HORRIBLE. I don't think that shader variants are bad, but it's definitely bad when we are compiling variants that are never used. Currently glxgears compiles 18 ps/vs shaders. In my branch with initial GS support [1] I switched handling of the shaders to deferred compilation, that is, shaders are compiled only before the actual draw. I found later that it's not really required for GS, but IIRC this change results in only 5 shaders being compiled for glxgears instead of 18. It seems most of the useless variants are results of state changes between creation of the shader state (initial compilation) and actual draw call. I had some concerns about increased overhead with those changes, and it's actually noticeable with drawoverhead demo, but I didn't see any regressions with a few real apps that I tested, e.g. glxgears even showed slightly better performance with these changes. Probably I also implemented it in a not very optimal way (I was mostly concentrated on GS support) and the overhead can be reduced. One more thing is duplicate shaders, I've analyzed shader dumps from Unigine Heaven 3.0 some time
[Mesa-dev] [PATCH 3/6] r600g: move streamout state to drivers/radeon
It looks like this patch got stuck in the moderation queue. You can also find it here: http://cgit.freedesktop.org/~mareko/mesa/commit/?h=radeonsi-stuffid=13bb26b24e738da6a8c51ee33876dc541fcde9da Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs
On Wed, Aug 28, 2013 at 7:56 PM, Vadim Girlin vadimgir...@gmail.com wrote: On 08/28/2013 01:15 PM, Christian König wrote: Well, for this discussion let's just assume that we fixed the delay in the upper layers of the stack and the driver sees the shader code as soon as the application (if I understood it correctly Vadim has just volunteered for the job). No, I'm not really volunteering to implement that. :) I'm not even sure if it's possible in reasonable time. In fact it was more like a theoretical discussion about what would be required for the early compilation in the driver to make sense. Perhaps I failed to explain it, but actually my point is that while the compilation is deferred in upper layers and nobody is going to change this (if it's possible at all), it doesn't make sense to try compiling early in the driver. I think we might prefer to defer the compilation in the driver as well - it doesn't make overall situation any worse, but can make it better by not compiling unused variants at least. Sounds good to me. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965/vs: Detect GRF sources in split_virtual_grfs send-from-GRF code.
It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF. VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 uses an IMM as src[0], and stores the GRF as src[1]. To be safe, loop over all the source registers and mark any GRFs. We probably won't ever have more than one, but it's simpler to just check all three rather than attempting to bail early. Fixes assertion failures in Unigine Sanctuary since we started making register allocation rely on split_virtual_grfs working. (The register classes were actually sufficient, we were just interpreting an IMM as a virtual GRF number.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68637 Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) The assertion failures mentioned in the bug don't exist on 9.2, but the underlying bug that caused them to fail still does, so I think it makes sense to backport. Not sure if these SEND-from-GRFs existed in 9.1. diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index ae836d3..55fa7c8 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1034,10 +1034,14 @@ vec4_visitor::split_virtual_grfs() vec4_instruction *inst = (vec4_instruction *)node; /* If there's a SEND message loading from a GRF on gen7+, it needs to be - * contiguous. Assume that the GRF for the SEND is always in src[0]. + * contiguous. */ if (inst-is_send_from_grf()) { - split_grf[inst-src[0].reg] = false; + for (int i = 0; i 3; i++) { +if (inst-src[i].file == GRF) { + split_grf[inst-src[i].reg] = false; +} + } } } -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965/fs: Detect GRF sources in split_virtual_grfs send-from-GRF code.
It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF. For example, FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD uses src[1] for the GRF. To be safe, loop over all the source registers and mark any GRFs. We probably won't ever have more than one, but it's simpler to just check all three rather than attempting to bail early. Not observed to fix anything yet, but likely to. Parallels the bug fix in the previous commit, which actually does fix known failures. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index b770c0e..96cb2ee 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1359,7 +1359,11 @@ fs_visitor::split_virtual_grfs() * the send is reading the whole thing. */ if (inst-is_send_from_grf()) { - split_grf[inst-src[0].reg] = false; + for (int i = 0; i 3; i++) { +if (inst-src[i].file == GRF) { + split_grf[inst-src[i].reg] = false; +} + } } } -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/compute: Fix bug in compute memory pool
The changes look good to me... That seems to be a much more sane way to add the item to the beginning of the linked list. I've tested this on CEDAR (Radeon 5400) without any OpenCL regressions, and the only piglit change was that the new piglit test created for this bug now passes. --Aaron On Tue, Aug 27, 2013 at 10:17 AM, Tom Stellard t...@stellard.net wrote: From: Tom Stellard thomas.stell...@amd.com When adding a new buffer to the beginning of the memory pool, we were accidentally deleting the buffer that was first in the buffer list. This was caused by a bug in the memory pool's linked list implementation. --- src/gallium/drivers/r600/compute_memory_pool.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/r600/compute_memory_pool.c b/src/gallium/drivers/r600/compute_memory_pool.c index 454af90..4846bfe 100644 --- a/src/gallium/drivers/r600/compute_memory_pool.c +++ b/src/gallium/drivers/r600/compute_memory_pool.c @@ -337,14 +337,9 @@ void compute_memory_finalize_pending(struct compute_memory_pool* pool, } } else { /* Add item to the front of the list */ - item-next = pool-item_list-next; - if (pool-item_list-next) { - pool-item_list-next-prev = item; - } + item-next = pool-item_list; item-prev = pool-item_list-prev; - if (pool-item_list-prev) { - pool-item_list-prev-next = item; - } + pool-item_list-prev = item; pool-item_list = item; } } -- 1.7.11.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: propagate max_array_access through function calls
Fixes a bug where if an uniform array is passed to a function the accesses to the array are not propagated so later all but the first vector of the uniform array are removed in parcel_out_uniform_storage resulting in broken shaders and out of bounds access to arrays in brw::vec4_visitor::pack_uniform_registers. Signed-off-by: Dominik Behr db...@chromium.org --- src/glsl/link_functions.cpp | 29 + 1 file changed, 29 insertions(+) diff --git a/src/glsl/link_functions.cpp b/src/glsl/link_functions.cpp index 6b3e154..d935546 100644 --- a/src/glsl/link_functions.cpp +++ b/src/glsl/link_functions.cpp @@ -173,6 +173,35 @@ public: return visit_continue; } + virtual ir_visitor_status visit_leave(ir_call *ir) + { + /* Traverse list of function parameters, and for array parameters + propagate max_array_access, Otherwise arrays that are only referenced + from inside functions via function parameters will be incorrectly + optimized. This will lead to incorrect code being generated (or worse). + Do it when leaving the node so the childen would propagate their + array accesses first */ + + const exec_node *formal_param_node = ir-callee-parameters.get_head(); + const exec_node *actual_param_node = ir-actual_parameters.get_head(); + while (!actual_param_node-is_tail_sentinel()) { + ir_variable *formal_param = (ir_variable *) formal_param_node; + ir_rvalue *actual_param = (ir_rvalue *) actual_param_node; + + formal_param_node = formal_param_node-get_next(); + actual_param_node = actual_param_node-get_next(); + + if (formal_param-type-is_array()) { +ir_dereference_variable *deref = actual_param-as_dereference_variable(); +if (deref deref-var deref-var-type-is_array()) { + deref-var-max_array_access = + MAX2(formal_param-max_array_access, deref-var-max_array_access); +} + } + } + return visit_continue; + } + virtual ir_visitor_status visit(ir_dereference_variable *ir) { if (hash_table_find(locals, ir-var) == NULL) { -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gallivm: refactor num_lods handling
LGTM. Jose - Original Message - From: Roland Scheidegger srol...@vmware.com This is just preparation for per-pixel (or per-quad in case of multiple quads) min/mag filter since some assumptions about number of miplevels being equal to number of lods no longer holds true. This change does not change behavior yet (though theoretically when forcing per-element path it might be slower with different min/mag filter since the code will respect this setting even when there's no mip maps now in this case, so some lod calcs will be done per-element just ultimately still the same filter used for all pixels). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 126 +- src/gallium/auxiliary/gallivm/lp_bld_sample.h | 13 +- src/gallium/auxiliary/gallivm/lp_bld_sample_aos.c | 20 +-- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 141 - 4 files changed, 169 insertions(+), 131 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index 89d7249..e1cfd78 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -217,7 +217,7 @@ lp_build_rho(struct lp_build_sample_context *bld, struct lp_build_context *float_size_bld = bld-float_size_in_bld; struct lp_build_context *float_bld = bld-float_bld; struct lp_build_context *coord_bld = bld-coord_bld; - struct lp_build_context *levelf_bld = bld-levelf_bld; + struct lp_build_context *rho_bld = bld-lodf_bld; const unsigned dims = bld-dims; LLVMValueRef ddx_ddy[2]; LLVMBuilderRef builder = bld-gallivm-builder; @@ -231,7 +231,7 @@ lp_build_rho(struct lp_build_sample_context *bld, LLVMValueRef first_level, first_level_vec; unsigned length = coord_bld-type.length; unsigned num_quads = length / 4; - boolean rho_per_quad = levelf_bld-type.length != length; + boolean rho_per_quad = rho_bld-type.length != length; unsigned i; LLVMValueRef i32undef = LLVMGetUndef(LLVMInt32TypeInContext(gallivm-context)); LLVMValueRef rho_xvec, rho_yvec; @@ -259,18 +259,18 @@ lp_build_rho(struct lp_build_sample_context *bld, */ if (rho_per_quad) { rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type, - levelf_bld-type, cube_rho, 0); + rho_bld-type, cube_rho, 0); } else { rho = lp_build_swizzle_scalar_aos(coord_bld, cube_rho, 0, 4); } if (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) { - rho = lp_build_sqrt(levelf_bld, rho); + rho = lp_build_sqrt(rho_bld, rho); } /* Could optimize this for single quad just skip the broadcast */ cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type, -levelf_bld-type, float_size, index0); - rho = lp_build_mul(levelf_bld, cubesize, rho); +rho_bld-type, float_size, index0); + rho = lp_build_mul(rho_bld, cubesize, rho); } else if (derivs !(bld-static_texture_state-target == PIPE_TEXTURE_CUBE)) { LLVMValueRef ddmax[3], ddx[3], ddy[3]; @@ -311,9 +311,9 @@ lp_build_rho(struct lp_build_sample_context *bld, * otherwise would also need different code to per-pixel lod case. */ rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type, -levelf_bld-type, rho, 0); +rho_bld-type, rho, 0); } - rho = lp_build_sqrt(levelf_bld, rho); + rho = lp_build_sqrt(rho_bld, rho); } else { @@ -329,7 +329,7 @@ lp_build_rho(struct lp_build_sample_context *bld, * rho_vec contains per-pixel rho, convert to scalar per quad. */ rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type, -levelf_bld-type, rho, 0); +rho_bld-type, rho, 0); } } } @@ -404,7 +404,7 @@ lp_build_rho(struct lp_build_sample_context *bld, if (rho_per_quad) { rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type, -levelf_bld-type, rho, 0); +rho_bld-type, rho, 0); } else { /* @@ -416,7 +416,7 @@ lp_build_rho(struct lp_build_sample_context *bld, */ rho = lp_build_swizzle_scalar_aos(coord_bld, rho, 0, 4); } - rho = lp_build_sqrt(levelf_bld, rho); + rho = lp_build_sqrt(rho_bld, rho); } else { ddx_ddy[0] =
[Mesa-dev] [PATCH 1/2] i965/gen7: Use the base_level field of the sampler to handle GL's BASE_LEVEL.
This avoids the need to get the inter- and intra-tile offset and adjust our miptree info based on them. --- src/mesa/drivers/dri/i965/gen7_sampler_state.c| 19 +-- src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 14 +++--- 2 files changed, 12 insertions(+), 21 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen7_sampler_state.c b/src/mesa/drivers/dri/i965/gen7_sampler_state.c index 193b5b1..6162502 100644 --- a/src/mesa/drivers/dri/i965/gen7_sampler_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sampler_state.c @@ -25,6 +25,7 @@ #include brw_state.h #include brw_defines.h #include intel_batchbuffer.h +#include intel_mipmap_tree.h #include main/macros.h #include main/samplerobj.h @@ -40,6 +41,8 @@ gen7_update_sampler_state(struct brw_context *brw, int unit, int ss_index, struct gl_context *ctx = brw-ctx; struct gl_texture_unit *texUnit = ctx-Texture.Unit[unit]; struct gl_texture_object *texObj = texUnit-_Current; + struct intel_texture_image *intel_image = + intel_texture_image(texObj-Image[0][texObj-BaseLevel]); struct gl_sampler_object *gl_sampler = _mesa_get_samplerobj(ctx, unit); bool using_nearest = false; @@ -150,17 +153,13 @@ gen7_update_sampler_state(struct brw_context *brw, int unit, int ss_index, sampler-ss0.lod_preclamp = 1; /* OpenGL mode */ sampler-ss0.default_color_mode = 0; /* OpenGL/DX10 mode */ - /* Set BaseMipLevel, MaxLOD, MinLOD: -* -* XXX: I don't think that using firstLevel, lastLevel works, -* because we always setup the surface state as if firstLevel == -* level zero. Probably have to subtract firstLevel from each of -* these: -*/ - sampler-ss0.base_level = U_FIXED(0, 1); + int baselevel = texObj-BaseLevel - intel_image-mt-first_level; + sampler-ss0.base_level = U_FIXED(baselevel, 1); - sampler-ss1.max_lod = U_FIXED(CLAMP(gl_sampler-MaxLod, 0, 13), 8); - sampler-ss1.min_lod = U_FIXED(CLAMP(gl_sampler-MinLod, 0, 13), 8); + sampler-ss1.max_lod = U_FIXED(CLAMP(baselevel + +gl_sampler-MaxLod, 0, 13), 8); + sampler-ss1.min_lod = U_FIXED(CLAMP(baselevel + +gl_sampler-MinLod, 0, 13), 8); /* The sampler can handle non-normalized texture rectangle coordinates * natively diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c index 91f854b..b68e2c2 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c @@ -284,8 +284,8 @@ gen7_update_texture_surface(struct gl_context *ctx, struct intel_texture_object *intelObj = intel_texture_object(tObj); struct intel_mipmap_tree *mt = intelObj-mt; struct gl_texture_image *firstImage = tObj-Image[0][tObj-BaseLevel]; + struct intel_texture_image *intel_image = intel_texture_image(firstImage); struct gl_sampler_object *sampler = _mesa_get_samplerobj(ctx, unit); - uint32_t tile_x, tile_y; if (tObj-Target == GL_TEXTURE_BUFFER) { gen7_update_buffer_texture_surface(ctx, unit, binding_table, surf_index); @@ -318,8 +318,6 @@ gen7_update_texture_surface(struct gl_context *ctx, surf[0] |= GEN7_SURFACE_ARYSPC_LOD0; surf[1] = mt-region-bo-offset + mt-offset; /* reloc */ - surf[1] += intel_miptree_get_tile_offsets(intelObj-mt, firstImage-Level, 0, - tile_x, tile_y); surf[2] = SET_FIELD(mt-logical_width0 - 1, GEN7_SURFACE_WIDTH) | SET_FIELD(mt-logical_height0 - 1, GEN7_SURFACE_HEIGHT); @@ -328,15 +326,9 @@ gen7_update_texture_surface(struct gl_context *ctx, surf[4] = gen7_surface_msaa_bits(mt-num_samples, mt-msaa_layout); - assert(brw-has_surface_tile_offset || (tile_x == 0 tile_y == 0)); - /* Note that the low bits of these fields are missing, so -* there's the possibility of getting in trouble. -*/ - surf[5] = ((tile_x / 4) BRW_SURFACE_X_OFFSET_SHIFT | - (tile_y / 2) BRW_SURFACE_Y_OFFSET_SHIFT | - SET_FIELD(GEN7_MOCS_L3, GEN7_SURFACE_MOCS) | + surf[5] = (SET_FIELD(GEN7_MOCS_L3, GEN7_SURFACE_MOCS) | /* mip count */ - (intelObj-_MaxLevel - tObj-BaseLevel)); + (intelObj-_MaxLevel - intel_image-mt-first_level)); if (brw-is_haswell) { /* Handling GL_ALPHA as a surface format override breaks 1.30+ style -- 1.8.4.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: Switch gen4-6 to using the sampler's base level for GL BASE_LEVEL.
Thanks to Ken for trawling through my neglected public branches and finding the bug in this change (inside a megacommit) that made me abandon this work. --- src/mesa/drivers/dri/i965/brw_wm_sampler_state.c | 19 +-- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 16 +++- 2 files changed, 12 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c index f2117a4..1f46f91 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c @@ -33,6 +33,7 @@ #include brw_context.h #include brw_state.h #include brw_defines.h +#include intel_mipmap_tree.h #include main/macros.h #include main/samplerobj.h @@ -201,6 +202,8 @@ static void brw_update_sampler_state(struct brw_context *brw, struct gl_context *ctx = brw-ctx; struct gl_texture_unit *texUnit = ctx-Texture.Unit[unit]; struct gl_texture_object *texObj = texUnit-_Current; + struct intel_texture_image *intel_image = + intel_texture_image(texObj-Image[0][texObj-BaseLevel]); struct gl_sampler_object *gl_sampler = _mesa_get_samplerobj(ctx, unit); bool using_nearest = false; @@ -319,17 +322,13 @@ static void brw_update_sampler_state(struct brw_context *brw, sampler-ss0.lod_preclamp = 1; /* OpenGL mode */ sampler-ss0.default_color_mode = 0; /* OpenGL/DX10 mode */ - /* Set BaseMipLevel, MaxLOD, MinLOD: -* -* XXX: I don't think that using firstLevel, lastLevel works, -* because we always setup the surface state as if firstLevel == -* level zero. Probably have to subtract firstLevel from each of -* these: -*/ - sampler-ss0.base_level = U_FIXED(0, 1); + int baselevel = texObj-BaseLevel - intel_image-mt-first_level; + sampler-ss0.base_level = U_FIXED(baselevel, 1); - sampler-ss1.max_lod = U_FIXED(CLAMP(gl_sampler-MaxLod, 0, 13), 6); - sampler-ss1.min_lod = U_FIXED(CLAMP(gl_sampler-MinLod, 0, 13), 6); + sampler-ss1.max_lod = U_FIXED(CLAMP(baselevel + +gl_sampler-MaxLod, 0, 13), 6); + sampler-ss1.min_lod = U_FIXED(CLAMP(baselevel + +gl_sampler-MinLod, 0, 13), 6); /* On Gen6+, the sampler can handle non-normalized texture * rectangle coordinates natively diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index e2c7b77..8bc3938 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -255,9 +255,9 @@ brw_update_texture_surface(struct gl_context *ctx, struct intel_texture_object *intelObj = intel_texture_object(tObj); struct intel_mipmap_tree *mt = intelObj-mt; struct gl_texture_image *firstImage = tObj-Image[0][tObj-BaseLevel]; + struct intel_texture_image *intel_image = intel_texture_image(firstImage); struct gl_sampler_object *sampler = _mesa_get_samplerobj(ctx, unit); uint32_t *surf; - uint32_t tile_x, tile_y; if (tObj-Target == GL_TEXTURE_BUFFER) { brw_update_buffer_texture_surface(ctx, unit, binding_table, surf_index); @@ -277,10 +277,8 @@ brw_update_texture_surface(struct gl_context *ctx, BRW_SURFACE_FORMAT_SHIFT)); surf[1] = intelObj-mt-region-bo-offset + intelObj-mt-offset; /* reloc */ - surf[1] += intel_miptree_get_tile_offsets(intelObj-mt, firstImage-Level, 0, - tile_x, tile_y); - surf[2] = ((intelObj-_MaxLevel - tObj-BaseLevel) BRW_SURFACE_LOD_SHIFT | + surf[2] = ((intelObj-_MaxLevel - intel_image-mt-first_level) BRW_SURFACE_LOD_SHIFT | (mt-logical_width0 - 1) BRW_SURFACE_WIDTH_SHIFT | (mt-logical_height0 - 1) BRW_SURFACE_HEIGHT_SHIFT); @@ -291,15 +289,7 @@ brw_update_texture_surface(struct gl_context *ctx, surf[4] = brw_get_surface_num_multisamples(intelObj-mt-num_samples); - assert(brw-has_surface_tile_offset || (tile_x == 0 tile_y == 0)); - /* Note that the low bits of these fields are missing, so -* there's the possibility of getting in trouble. -*/ - assert(tile_x % 4 == 0); - assert(tile_y % 2 == 0); - surf[5] = ((tile_x / 4) BRW_SURFACE_X_OFFSET_SHIFT | - (tile_y / 2) BRW_SURFACE_Y_OFFSET_SHIFT | - (mt-align_h == 4 ? BRW_SURFACE_VERTICAL_ALIGN_ENABLE : 0)); + surf[5] = mt-align_h == 4 ? BRW_SURFACE_VERTICAL_ALIGN_ENABLE : 0; /* Emit relocation to surface contents */ drm_intel_bo_emit_reloc(brw-batch.bo, -- 1.8.4.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 8/8] i965: Avoid flushing the batch for every blorp op.
Paul Berry stereotype...@gmail.com writes: On 27 August 2013 15:21, Eric Anholt e...@anholt.net wrote: This brings over the batch-wrap-prevention and aperture space checking code from the normal brw_draw.c path, so that we don't need to flush the batch every time. There's a risk here if the intel_emit_post_sync_nonzero_flush() call isn't high enough up in the state emit sequences -- before, we implicitly had one at the batch flush before any state was emitted, so Mesa's workaround emits didn't really matter. Improves cairo-gl performance by 13.7733% +/- 1.74876% (n=30/32) Improves minecraft apitrace performance by 1.03183% +/- 0.482297% (n=90). Reduces low-resolution GLB 2.7 performance by 1.17553% +/- 0.432263% (n=88) Reduces Lightsmark performance by 3.70246% +/- 0.322432% (n=126) No statistically significant performance difference on unigine tropics (n=10) No statistically significant performance difference on openarena (n=755) The two apps that are hurt happen to include stalls on busy buffer objects, so I think this is an effect of missing out on an opportune flush. --- src/mesa/drivers/dri/i965/brw_blorp.cpp | 50 src/mesa/drivers/dri/i965/brw_blorp.h| 4 --- src/mesa/drivers/dri/i965/gen6_blorp.cpp | 12 src/mesa/drivers/dri/i965/gen7_blorp.cpp | 1 - 4 files changed, 50 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp b/src/mesa/drivers/dri/i965/brw_blorp.cpp index 1576ff2..c566d1d 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp @@ -21,6 +21,7 @@ * IN THE SOFTWARE. */ +#include errno.h #include intel_batchbuffer.h #include intel_fbo.h @@ -191,6 +192,26 @@ intel_hiz_exec(struct brw_context *brw, struct intel_mipmap_tree *mt, void brw_blorp_exec(struct brw_context *brw, const brw_blorp_params *params) { + struct gl_context *ctx = brw-ctx; + uint32_t estimated_max_batch_usage = 1500; + bool check_aperture_failed_once = false; + + /* Flush the sampler and render caches. We definitely need to flush the +* sampler cache so that we get updated contents from the render cache for +* the glBlitFramebuffer() source. Also, we are sometimes warned in the +* docs to flush the cache between reinterpretations of the same surface +* data with different formats, which blorp does for stencil and depth +* data. +*/ + intel_batchbuffer_emit_mi_flush(brw); + +retry: + intel_batchbuffer_require_space(brw, estimated_max_batch_usage, false); + intel_batchbuffer_save_state(brw); + drm_intel_bo *saved_bo = brw-batch.bo; + uint32_t saved_used = brw-batch.used; + uint32_t saved_state_batch_offset = brw-batch.state_batch_offset; + switch (brw-gen) { case 6: gen6_blorp_exec(brw, params); @@ -204,6 +225,35 @@ brw_blorp_exec(struct brw_context *brw, const brw_blorp_params *params) break; } Would it be feasible to add an assertion here to verify that the amount of batch space actually used by this blorp call is less than or equal to estimated_max_batch_usage? That would give me a lot of increased confidence that the magic number 1500 is correct. With the added assertion, the series is: Reviewed-by: Paul Berry stereotype...@gmail.com That's this code: + /* Make sure we didn't wrap the batch unintentionally, and make sure we +* reserved enough space that a wrap will never happen. +*/ + assert(brw-batch.bo == saved_bo); + assert((brw-batch.used - saved_used) * 4 + + (saved_state_batch_offset - brw-batch.state_batch_offset) + estimated_max_batch_usage); pgpKlts8zbMLd.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/gen7: Use the base_level field of the sampler to handle GL's BASE_LEVEL.
On 08/28/2013 03:27 PM, Eric Anholt wrote: This avoids the need to get the inter- and intra-tile offset and adjust our miptree info based on them. --- src/mesa/drivers/dri/i965/gen7_sampler_state.c| 19 +-- src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 14 +++--- 2 files changed, 12 insertions(+), 21 deletions(-) This miniseries is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw: fix point/line/triangle determination in draw_need_pipeline()
The previous point/line/triangle() functions didn't handle GS primitives. --- src/gallium/auxiliary/draw/draw_pipe_validate.c | 31 +-- 1 file changed, 6 insertions(+), 25 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_validate.c b/src/gallium/auxiliary/draw/draw_pipe_validate.c index 3562acd..356f4d6 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_validate.c +++ b/src/gallium/auxiliary/draw/draw_pipe_validate.c @@ -30,28 +30,13 @@ #include util/u_memory.h #include util/u_math.h +#include util/u_prim.h #include pipe/p_defines.h #include draw_private.h #include draw_pipe.h #include draw_context.h #include draw_vbuf.h -static boolean points( unsigned prim ) -{ - return (prim == PIPE_PRIM_POINTS); -} - -static boolean lines( unsigned prim ) -{ - return (prim == PIPE_PRIM_LINES || - prim == PIPE_PRIM_LINE_STRIP || - prim == PIPE_PRIM_LINE_LOOP); -} - -static boolean triangles( unsigned prim ) -{ - return prim = PIPE_PRIM_TRIANGLES; -} /** * Default version of a function to check if we need any special @@ -66,6 +51,8 @@ draw_need_pipeline(const struct draw_context *draw, const struct pipe_rasterizer_state *rasterizer, unsigned int prim ) { + unsigned reduced_prim = u_reduced_prim(prim); + /* If the driver has overridden this, use that version: */ if (draw-render @@ -80,8 +67,7 @@ draw_need_pipeline(const struct draw_context *draw, * and triggering the pipeline, because we have to trigger the * pipeline *anyway* if unfilled mode is active. */ - if (lines(prim)) - { + if (reduced_prim == PIPE_PRIM_LINES) { /* line stipple */ if (rasterizer-line_stipple_enable draw-pipeline.line_stipple) return TRUE; @@ -97,9 +83,7 @@ draw_need_pipeline(const struct draw_context *draw, if (draw_current_shader_num_written_culldistances(draw)) return TRUE; } - - if (points(prim)) - { + else if (reduced_prim == PIPE_PRIM_POINTS) { /* large points */ if (rasterizer-point_size draw-pipeline.wide_point_threshold) return TRUE; @@ -117,10 +101,7 @@ draw_need_pipeline(const struct draw_context *draw, if (rasterizer-sprite_coord_enable draw-pipeline.point_sprite) return TRUE; } - - - if (triangles(prim)) - { + else if (reduced_prim == PIPE_PRIM_TRIANGLES) { /* polygon stipple */ if (rasterizer-poly_stipple_enable draw-pipeline.pstipple) return TRUE; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/vs: Detect GRF sources in split_virtual_grfs send-from-GRF code.
Kenneth Graunke kenn...@whitecape.org writes: It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF. VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 uses an IMM as src[0], and stores the GRF as src[1]. To be safe, loop over all the source registers and mark any GRFs. We probably won't ever have more than one, but it's simpler to just check all three rather than attempting to bail early. Fixes assertion failures in Unigine Sanctuary since we started making register allocation rely on split_virtual_grfs working. (The register classes were actually sufficient, we were just interpreting an IMM as a virtual GRF number.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68637 Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: mesa-sta...@lists.freedesktop.org These are: Reviewed-by: Eric Anholt e...@anholt.net pgpjjyMq4hDQB.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: fix point/line/triangle determination in draw_need_pipeline()
Am 29.08.2013 01:14, schrieb Brian Paul: The previous point/line/triangle() functions didn't handle GS primitives. --- src/gallium/auxiliary/draw/draw_pipe_validate.c | 31 +-- 1 file changed, 6 insertions(+), 25 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_validate.c b/src/gallium/auxiliary/draw/draw_pipe_validate.c index 3562acd..356f4d6 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_validate.c +++ b/src/gallium/auxiliary/draw/draw_pipe_validate.c @@ -30,28 +30,13 @@ #include util/u_memory.h #include util/u_math.h +#include util/u_prim.h #include pipe/p_defines.h #include draw_private.h #include draw_pipe.h #include draw_context.h #include draw_vbuf.h -static boolean points( unsigned prim ) -{ - return (prim == PIPE_PRIM_POINTS); -} - -static boolean lines( unsigned prim ) -{ - return (prim == PIPE_PRIM_LINES || - prim == PIPE_PRIM_LINE_STRIP || - prim == PIPE_PRIM_LINE_LOOP); -} - -static boolean triangles( unsigned prim ) -{ - return prim = PIPE_PRIM_TRIANGLES; -} /** * Default version of a function to check if we need any special @@ -66,6 +51,8 @@ draw_need_pipeline(const struct draw_context *draw, const struct pipe_rasterizer_state *rasterizer, unsigned int prim ) { + unsigned reduced_prim = u_reduced_prim(prim); + /* If the driver has overridden this, use that version: */ if (draw-render @@ -80,8 +67,7 @@ draw_need_pipeline(const struct draw_context *draw, * and triggering the pipeline, because we have to trigger the * pipeline *anyway* if unfilled mode is active. */ - if (lines(prim)) - { + if (reduced_prim == PIPE_PRIM_LINES) { /* line stipple */ if (rasterizer-line_stipple_enable draw-pipeline.line_stipple) return TRUE; @@ -97,9 +83,7 @@ draw_need_pipeline(const struct draw_context *draw, if (draw_current_shader_num_written_culldistances(draw)) return TRUE; } - - if (points(prim)) - { + else if (reduced_prim == PIPE_PRIM_POINTS) { /* large points */ if (rasterizer-point_size draw-pipeline.wide_point_threshold) return TRUE; @@ -117,10 +101,7 @@ draw_need_pipeline(const struct draw_context *draw, if (rasterizer-sprite_coord_enable draw-pipeline.point_sprite) return TRUE; } - - - if (triangles(prim)) - { + else if (reduced_prim == PIPE_PRIM_TRIANGLES) { /* polygon stipple */ if (rasterizer-poly_stipple_enable draw-pipeline.pstipple) return TRUE; Reviewed-by: Roland Scheidegger srol...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallivm: support per-pixel min/mag filter in SoA path
From: Roland Scheidegger srol...@vmware.com Since we can have per-pixel lod we should also honor the filter per-pixel (in fact we didn't honor it per quad neither in the multiple quad case). Do this by running the linear path and simply beating the weights into shape (the sample with the higher weight is the one which should have been chosen with nearest filtering hence adjust filter weight to 1.0/0.0 based on that). If all pixels use nearest filter (either min and mag) then still run just a nearest filter as this is way cheaper (probably around 4 times faster for 2d, more for 3d case) and it should be relatively rare that pixels really need different filtering. OTOH if all pixels would require linear don't do anything special since the linear path with filter adjustments shouldn't really be all that much more expensive than ordinary linear, and we think it's rare that min/mag filters are configured differently so there doesn't seem much value in trying to optimize this further. This does not yet fix the AoS path (though currently AoS is only used for single quads hence it could be considered less broken, just never honoring per-pixel filter decision but doing it per quad). --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 320 ++--- 1 file changed, 276 insertions(+), 44 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index c686d82..5c5ab87 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -827,11 +827,14 @@ lp_build_masklerp2d(struct lp_build_context *bld, /** * Generate code to sample a mipmap level with linear filtering. * If sampling a cube texture, r = cube face in [0,5]. + * If linear_mask is present, only pixels having their mask set + * will receive linear filtering, the rest will use nearest. */ static void lp_build_sample_image_linear(struct lp_build_sample_context *bld, unsigned sampler_unit, LLVMValueRef size, + LLVMValueRef linear_mask, LLVMValueRef row_stride_vec, LLVMValueRef img_stride_vec, LLVMValueRef data_ptr, @@ -905,6 +908,31 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, lp_build_name(z1, tex.z1.layer); } + if (linear_mask) { + /* + * Whack filter weights into place. Whatever pixel had more weight is + * the one which should have been selected by nearest filtering hence + * just use 100% weight for it. + */ + struct lp_build_context *c_bld = bld-coord_bld; + LLVMValueRef w1_mask, w1_weight; + LLVMValueRef half = lp_build_const_vec(bld-gallivm, c_bld-type, 0.5f); + + w1_mask = lp_build_cmp(c_bld, PIPE_FUNC_GREATER, s_fpart, half); + /* this select is really just a and */ + w1_weight = lp_build_select(c_bld, w1_mask, c_bld-one, c_bld-zero); + s_fpart = lp_build_select(c_bld, linear_mask, s_fpart, w1_weight); + if (dims = 2) { + w1_mask = lp_build_cmp(c_bld, PIPE_FUNC_GREATER, t_fpart, half); + w1_weight = lp_build_select(c_bld, w1_mask, c_bld-one, c_bld-zero); + t_fpart = lp_build_select(c_bld, linear_mask, t_fpart, w1_weight); + if (dims == 3) { +w1_mask = lp_build_cmp(c_bld, PIPE_FUNC_GREATER, r_fpart, half); +w1_weight = lp_build_select(c_bld, w1_mask, c_bld-one, c_bld-zero); +r_fpart = lp_build_select(c_bld, linear_mask, r_fpart, w1_weight); + } + } + } /* * Get texture colors. @@ -1053,8 +1081,8 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, /** * Sample the texture/mipmap using given image filter and mip filter. - * data0_ptr and data1_ptr point to the two mipmap levels to sample - * from. width0/1_vec, height0/1_vec, depth0/1_vec indicate their sizes. + * ilevel0 and ilevel1 indicate the two mipmap levels to sample + * from (vectors or scalars). * If we're using nearest miplevel sampling the '1' values will be null/unused. */ static void @@ -1105,7 +1133,7 @@ lp_build_sample_mipmap(struct lp_build_sample_context *bld, else { assert(img_filter == PIPE_TEX_FILTER_LINEAR); lp_build_sample_image_linear(bld, sampler_unit, - size0, + size0, NULL, row_stride0_vec, img_stride0_vec, data_ptr0, mipoff0, coords, offsets, colors0); @@ -1131,15 +1159,8 @@ lp_build_sample_mipmap(struct lp_build_sample_context *bld, * We'll do mip filtering if any of the quads (or individual * pixel in case of per-pixel lod) need it. * It might be better to split the vectors here and only
[Mesa-dev] [PATCH 1/6] i965: Remove unused ATTRIB_BIT_DWORDS define.
--- src/mesa/drivers/dri/i965/brw_context.h | 7 --- 1 file changed, 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index c456e61..3cb6dc6 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -780,13 +780,6 @@ struct brw_cached_batch_item { struct brw_cached_batch_item *next; }; - - -/* Protect against a future where VERT_ATTRIB_MAX 32. Wouldn't life - * be easier if C allowed arrays of packed elements? - */ -#define ATTRIB_BIT_DWORDS ((VERT_ATTRIB_MAX+31)/32) - struct brw_vertex_buffer { /** Buffer object containing the uploaded vertex data */ drm_intel_bo *bo; -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] i965: Combine brw_emit_prim and gen7_emit_prim.
These functions have almost identical code; the only difference is that a few of the bits moved around. Adding a few trivial conditionals allows the same function to work on all generations, and the resulting code is still quite readable. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_draw.c | 80 1 file changed, 17 insertions(+), 63 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index c7164ac..df9b750 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -171,11 +171,15 @@ static void brw_emit_prim(struct brw_context *brw, start_vertex_location = prim-start; base_vertex_location = prim-basevertex; if (prim-indexed) { - vertex_access_type = GEN4_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM; + vertex_access_type = brw-gen = 7 ? + GEN7_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM : + GEN4_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM; start_vertex_location += brw-ib.start_vertex_offset; base_vertex_location += brw-vb.start_vertex_bias; } else { - vertex_access_type = GEN4_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL; + vertex_access_type = brw-gen = 7 ? + GEN7_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL : + GEN4_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL; start_vertex_location += brw-vb.start_vertex_bias; } @@ -198,65 +202,16 @@ static void brw_emit_prim(struct brw_context *brw, intel_batchbuffer_emit_mi_flush(brw); } - BEGIN_BATCH(6); - OUT_BATCH(CMD_3D_PRIM 16 | (6 - 2) | -hw_prim GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT | -vertex_access_type); - OUT_BATCH(verts_per_instance); - OUT_BATCH(start_vertex_location); - OUT_BATCH(prim-num_instances); - OUT_BATCH(prim-base_instance); - OUT_BATCH(base_vertex_location); - ADVANCE_BATCH(); - - brw-batch.need_workaround_flush = true; - - if (brw-always_flush_cache) { - intel_batchbuffer_emit_mi_flush(brw); - } -} - -static void gen7_emit_prim(struct brw_context *brw, - const struct _mesa_prim *prim, - uint32_t hw_prim) -{ - int verts_per_instance; - int vertex_access_type; - int start_vertex_location; - int base_vertex_location; - - DBG(PRIM: %s %d %d\n, _mesa_lookup_enum_by_nr(prim-mode), - prim-start, prim-count); - - start_vertex_location = prim-start; - base_vertex_location = prim-basevertex; - if (prim-indexed) { - vertex_access_type = GEN7_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM; - start_vertex_location += brw-ib.start_vertex_offset; - base_vertex_location += brw-vb.start_vertex_bias; + if (brw-gen = 7) { + BEGIN_BATCH(7); + OUT_BATCH(CMD_3D_PRIM 16 | (7 - 2)); + OUT_BATCH(hw_prim | vertex_access_type); } else { - vertex_access_type = GEN7_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL; - start_vertex_location += brw-vb.start_vertex_bias; + BEGIN_BATCH(6); + OUT_BATCH(CMD_3D_PRIM 16 | (6 - 2) | +hw_prim GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT | +vertex_access_type); } - - verts_per_instance = prim-count; - - /* If nothing to emit, just return. */ - if (verts_per_instance == 0) - return; - - /* If we're set to always flush, do it before and after the primitive emit. -* We want to catch both missed flushes that hurt instruction/state cache -* and missed flushes of the render cache as it heads to other parts of -* the besides the draw code. -*/ - if (brw-always_flush_cache) { - intel_batchbuffer_emit_mi_flush(brw); - } - - BEGIN_BATCH(7); - OUT_BATCH(CMD_3D_PRIM 16 | (7 - 2)); - OUT_BATCH(hw_prim | vertex_access_type); OUT_BATCH(verts_per_instance); OUT_BATCH(start_vertex_location); OUT_BATCH(prim-num_instances); @@ -264,6 +219,8 @@ static void gen7_emit_prim(struct brw_context *brw, OUT_BATCH(base_vertex_location); ADVANCE_BATCH(); + brw-batch.need_workaround_flush = true; + if (brw-always_flush_cache) { intel_batchbuffer_emit_mi_flush(brw); } @@ -453,10 +410,7 @@ retry: brw_upload_state(brw); } - if (brw-gen = 7) -gen7_emit_prim(brw, prim[i], brw-primitive); - else -brw_emit_prim(brw, prim[i], brw-primitive); + brw_emit_prim(brw, prim[i], brw-primitive); brw-no_batch_wrap = false; -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] i965: Use the proper element of the prim array in brw_try_draw_prims.
The VBO module actually calls us with an array of _mesa_prim objects. For example, it may break up a DrawArrays() call into multiple primitives when primitive restart is enabled. Previously, we treated prim like a pointer, always accessing element 0. This worked because all of the primitive objects in a single draw call have the same value for num_instances and basevertex. However, accessing an array as a pointer and using the wrong object's fields is misleading. For stylistic reasons alone, we should use the right object. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_draw.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index df9b750..2583a6f 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -386,12 +386,12 @@ static bool brw_try_draw_prims( struct gl_context *ctx, intel_batchbuffer_require_space(brw, estimated_max_prim_size, false); intel_batchbuffer_save_state(brw); - if (brw-num_instances != prim-num_instances) { - brw-num_instances = prim-num_instances; + if (brw-num_instances != prim[i].num_instances) { + brw-num_instances = prim[i].num_instances; brw-state.dirty.brw |= BRW_NEW_VERTICES; } - if (brw-basevertex != prim-basevertex) { - brw-basevertex = prim-basevertex; + if (brw-basevertex != prim[i].basevertex) { + brw-basevertex = prim[i].basevertex; brw-state.dirty.brw |= BRW_NEW_VERTICES; } if (brw-gen 6) -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] i965: Clarify that we only check one prim's type for cut index support.
can_cut_index_handle_prims() was passed an array of _mesa_prim objects and a count, and runs a loop for that many iterations. However, it treats the array like a pointer, repeatedly checking the first element. This is wasteful and bizarre. The VBO module will never call us with multiple primitives of different topologies, so it's actually reasonable to just check the first element. Once. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_primitive_restart.c | 37 +++ 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_primitive_restart.c b/src/mesa/drivers/dri/i965/brw_primitive_restart.c index 0dbc48f..ca2e6b7 100644 --- a/src/mesa/drivers/dri/i965/brw_primitive_restart.c +++ b/src/mesa/drivers/dri/i965/brw_primitive_restart.c @@ -76,7 +76,6 @@ can_cut_index_handle_restart_index(struct gl_context *ctx, static bool can_cut_index_handle_prims(struct gl_context *ctx, const struct _mesa_prim *prim, - GLuint nr_prims, const struct _mesa_index_buffer *ib) { struct brw_context *brw = brw_context(ctx); @@ -92,24 +91,22 @@ can_cut_index_handle_prims(struct gl_context *ctx, return false; } - for ( ; nr_prims 0; nr_prims--) { - switch(prim-mode) { - case GL_POINTS: - case GL_LINES: - case GL_LINE_STRIP: - case GL_TRIANGLES: - case GL_TRIANGLE_STRIP: - /* Cut index supports these primitive types */ - break; - default: - /* Cut index does not support these primitive types */ - //case GL_LINE_LOOP: - //case GL_TRIANGLE_FAN: - //case GL_QUADS: - //case GL_QUAD_STRIP: - //case GL_POLYGON: - return false; - } + switch (prim-mode) { + case GL_POINTS: + case GL_LINES: + case GL_LINE_STRIP: + case GL_TRIANGLES: + case GL_TRIANGLE_STRIP: + /* Cut index supports these primitive types */ + break; + default: + /* Cut index does not support these primitive types */ + //case GL_LINE_LOOP: + //case GL_TRIANGLE_FAN: + //case GL_QUADS: + //case GL_QUAD_STRIP: + //case GL_POLYGON: + return false; } return true; @@ -161,7 +158,7 @@ brw_handle_primitive_restart(struct gl_context *ctx, */ brw-prim_restart.in_progress = true; - if (can_cut_index_handle_prims(ctx, prim, nr_prims, ib)) { + if (can_cut_index_handle_prims(ctx, prim[0], ib)) { /* Cut index should work for primitive restart, so use it */ brw-prim_restart.enable_cut_index = true; -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] i965: Rename prim parameter to prims where it's an array.
Some drawing functions take a single _mesa_prim object, while others take an array of primitives. Both kinds of functions used a parameter called prim (the singular form), which was confusing. Using the plural form, prims, clearly communicates that the parameter is an array of primitives. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_draw.c | 26 +++ src/mesa/drivers/dri/i965/brw_draw.h | 2 +- src/mesa/drivers/dri/i965/brw_primitive_restart.c | 8 +++ 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 2583a6f..d14f7f0 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -314,7 +314,7 @@ static void brw_postdraw_set_buffers_need_resolve(struct brw_context *brw) */ static bool brw_try_draw_prims( struct gl_context *ctx, const struct gl_client_array *arrays[], -const struct _mesa_prim *prim, +const struct _mesa_prim *prims, GLuint nr_prims, const struct _mesa_index_buffer *ib, GLuint min_index, @@ -386,18 +386,18 @@ static bool brw_try_draw_prims( struct gl_context *ctx, intel_batchbuffer_require_space(brw, estimated_max_prim_size, false); intel_batchbuffer_save_state(brw); - if (brw-num_instances != prim[i].num_instances) { - brw-num_instances = prim[i].num_instances; + if (brw-num_instances != prims[i].num_instances) { + brw-num_instances = prims[i].num_instances; brw-state.dirty.brw |= BRW_NEW_VERTICES; } - if (brw-basevertex != prim[i].basevertex) { - brw-basevertex = prim[i].basevertex; + if (brw-basevertex != prims[i].basevertex) { + brw-basevertex = prims[i].basevertex; brw-state.dirty.brw |= BRW_NEW_VERTICES; } if (brw-gen 6) -brw_set_prim(brw, prim[i]); +brw_set_prim(brw, prims[i]); else -gen6_set_prim(brw, prim[i]); +gen6_set_prim(brw, prims[i]); retry: /* Note that before the loop, brw-state.dirty.brw was set to != 0, and @@ -410,7 +410,7 @@ retry: brw_upload_state(brw); } - brw_emit_prim(brw, prim[i], brw-primitive); + brw_emit_prim(brw, prims[i], brw-primitive); brw-no_batch_wrap = false; @@ -446,7 +446,7 @@ retry: } void brw_draw_prims( struct gl_context *ctx, -const struct _mesa_prim *prim, +const struct _mesa_prim *prims, GLuint nr_prims, const struct _mesa_index_buffer *ib, GLboolean index_bounds_valid, @@ -461,7 +461,7 @@ void brw_draw_prims( struct gl_context *ctx, return; /* Handle primitive restart if needed */ - if (brw_handle_primitive_restart(ctx, prim, nr_prims, ib)) { + if (brw_handle_primitive_restart(ctx, prims, nr_prims, ib)) { /* The draw was handled, so we can exit now */ return; } @@ -471,7 +471,7 @@ void brw_draw_prims( struct gl_context *ctx, * to upload. */ if (!vbo_all_varyings_in_vbos(arrays) !index_bounds_valid) - vbo_get_minmax_indices(ctx, prim, ib, min_index, max_index, nr_prims); + vbo_get_minmax_indices(ctx, prims, ib, min_index, max_index, nr_prims); /* Do GL_SELECT and GL_FEEDBACK rendering using swrast, even though it * won't support all the extensions we support. @@ -481,7 +481,7 @@ void brw_draw_prims( struct gl_context *ctx, _mesa_lookup_enum_by_nr(ctx-RenderMode)); _swsetup_Wakeup(ctx); _tnl_wakeup(ctx); - _tnl_draw_prims(ctx, arrays, prim, nr_prims, ib, min_index, max_index); + _tnl_draw_prims(ctx, arrays, prims, nr_prims, ib, min_index, max_index); return; } @@ -489,7 +489,7 @@ void brw_draw_prims( struct gl_context *ctx, * manage it. swrast doesn't support our featureset, so we can't fall back * to it. */ - brw_try_draw_prims(ctx, arrays, prim, nr_prims, ib, min_index, max_index); + brw_try_draw_prims(ctx, arrays, prims, nr_prims, ib, min_index, max_index); } void brw_draw_init( struct brw_context *brw ) diff --git a/src/mesa/drivers/dri/i965/brw_draw.h b/src/mesa/drivers/dri/i965/brw_draw.h index c915bc3..aac375f 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.h +++ b/src/mesa/drivers/dri/i965/brw_draw.h @@ -49,7 +49,7 @@ void brw_draw_destroy( struct brw_context *brw ); /* brw_primitive_restart.c */ GLboolean brw_handle_primitive_restart(struct gl_context *ctx, - const struct _mesa_prim *prim, + const struct _mesa_prim *prims, GLuint nr_prims,
[Mesa-dev] [PATCH 6/6] mesa: Rename gl_context::swtnl_im to vbo_context; use proper type.
The main GL context's swtnl_im field is the VBO module's vbo_context structure. Using the name swtnl in the name is confusing since some drivers use hardware texturing and lighting, but still rely on the VBO module for drawing. v2: Forward declare the type and use that instead of void * (suggested by Eric Anholt) Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/main/mtypes.h | 3 ++- src/mesa/vbo/vbo_context.c | 4 ++-- src/mesa/vbo/vbo_context.h | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 22bb58c..7d56322 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -80,6 +80,7 @@ struct prog_instruction; struct gl_program_parameter_list; struct set; struct set_entry; +struct vbo_context; /*@}*/ @@ -3669,7 +3670,7 @@ struct gl_context void *swrast_context; void *swsetup_context; void *swtnl_context; - void *swtnl_im; + struct vbo_context *vbo_context; struct st_context *st; void *aelt_context; /*@}*/ diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c index b97313d..2aa5bbc 100644 --- a/src/mesa/vbo/vbo_context.c +++ b/src/mesa/vbo/vbo_context.c @@ -152,7 +152,7 @@ GLboolean _vbo_CreateContext( struct gl_context *ctx ) { struct vbo_context *vbo = CALLOC_STRUCT(vbo_context); - ctx-swtnl_im = (void *)vbo; + ctx-vbo_context = (void *) vbo; /* Initialize the arrayelt helper */ @@ -224,7 +224,7 @@ void _vbo_DestroyContext( struct gl_context *ctx ) if (ctx-API == API_OPENGL_COMPAT) vbo_save_destroy(ctx); free(vbo); - ctx-swtnl_im = NULL; + ctx-vbo_context = NULL; } } diff --git a/src/mesa/vbo/vbo_context.h b/src/mesa/vbo/vbo_context.h index 27fae83..db47a8b 100644 --- a/src/mesa/vbo/vbo_context.h +++ b/src/mesa/vbo/vbo_context.h @@ -90,7 +90,7 @@ struct vbo_context { static inline struct vbo_context *vbo_context(struct gl_context *ctx) { - return (struct vbo_context *)(ctx-swtnl_im); + return (struct vbo_context *) ctx-vbo_context; } -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Allow precision qualifiers for sampler types
On 08/27/2013 12:52 PM, Anuj Phogat wrote: On Tue, Aug 27, 2013 at 11:53 AM, Ian Romanick i...@freedesktop.org wrote: On 08/27/2013 10:45 AM, Anuj Phogat wrote: GLSL 1.30 doesn't allow precision qualifiers on sampler types, but in GLSL ES, sampler types are also allowed. This seems like an oversight (since the intention of including these in GLSL 1.30 is to allow compatibility with ES shaders). Currently, Mesa allows default precision qualifiers to be set for sampler types in GLSL (commit d5948f2). This patch makes it follow GLSL ES rules and also allow declaring sampler variables with a precision qualifier in GLSL. I think our current behavior is incorrect even in the ES case. GLSL ES 3.30 You mean to say GLSL ES 3.00? Yes. That's about the fifth time I've made that typo in the last week... and desktop GLSL 4.40 say the following in section 4.5.3 (Precision Qualifiers): Any floating point or any integer declaration can have the type preceded by one of these precision qualifiers... Yes, samplers are now allowed in GLSL 4.4. They were not in GLSL 4.3. The also both say the following in section 4.5.4 (Default Precision Qualifiers): The precision statement...can be used to establish a default precision qualifier. The type field can be either int or float or any of the sampler types... So I believe precision mediump sampler2D; should be legal in all versions, but uniform mediump sampler2D s; should not. Yes, there is no clear statement in GLSL spec which allows: uniform mediump sampler2D s; Which syntax is the test using? test uses: uniform mediump sampler2D s; I haven't yet tested if it is accepted by NVIDIA. There is an example in section 8 (Built-in Functions) that uses this syntax: uniform lowp sampler2D sampler; highp vec2 coord; ... lowp vec4 col = texture (sampler, coord); // texture() returns lowp It seems that this syntax should be legal. I've submitted a spec bug to clarify the language in section 4.5. I have also attached a patch to fix up the comment in that piece of code. Go ahead and combine my patch (with my Signed-off-by) with your code changes. With the one other change suggested below, Reviewed-by: Ian Romanick ian.d.roman...@intel.com This fixes a shader compilation error in Khronos OpenGL conformance test depth_texture_mipmap. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/glsl/ast_to_hir.cpp | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 192130a..b3d6d8c 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -3131,8 +3131,8 @@ ast_declarator_list::hir(exec_list *instructions, state-check_precision_qualifiers_allowed(loc); } - - /* Precision qualifiers only apply to floating point and integer types. + /* Precision qualifiers apply to floating point, integer and sampler + * types. * * From section 4.5.2 of the GLSL 1.30 spec: *Any floating point or any integer declaration can have the type @@ -3144,20 +3144,24 @@ ast_declarator_list::hir(exec_list *instructions, * * From page 87 of the GLSL ES spec: *RESOLUTION: Allow sampler types to take a precision qualifier. + * + * GLSL 1.30 doesn't allow precision qualifiers on sampler types, but + * this seems like an oversight (since the intention of including these + * in GLSL 1.30 is to allow compatibility with ES shaders). So we allow + * int, float, and all sampler types regardless of GLSL version. */ if (this-type-qualifier.precision != ast_precision_none !var-type-is_float() !var-type-is_integer() !var-type-is_record() - !(var-type-is_sampler() state-es_shader) + !(var-type-is_sampler()) You can delete the extra ( and ). !(var-type-is_array() (var-type-fields.array-is_float() || var-type-fields.array-is_integer( { _mesa_glsl_error(loc, state, precision qualifiers apply only to floating point - %s types, state-es_shader ? , integer, and sampler - : and integer); + , integer and sampler types); } /* From page 17 (page 23 of the PDF) of the GLSL 1.20 spec: comment-fix-up.patch Description: application/pgp-keys ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/22] i965: Move data from brw-vs into a base class if gs will also need it.
On 08/26/2013 03:12 PM, Paul Berry wrote: This paves the way for sharing the code that will set up the vertex and geometry shader pipeline state. --- src/mesa/drivers/dri/i965/brw_context.h | 47 ++-- src/mesa/drivers/dri/i965/brw_draw.c | 3 +- src/mesa/drivers/dri/i965/brw_misc_state.c | 6 +-- src/mesa/drivers/dri/i965/brw_vs.c | 12 +++--- src/mesa/drivers/dri/i965/brw_vs_state.c | 24 ++-- src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 43 -- src/mesa/drivers/dri/i965/brw_vtbl.c | 2 +- src/mesa/drivers/dri/i965/brw_wm_sampler_state.c | 8 ++-- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 4 +- src/mesa/drivers/dri/i965/gen6_sampler_state.c | 2 +- src/mesa/drivers/dri/i965/gen6_vs_state.c| 23 +++- src/mesa/drivers/dri/i965/gen7_vs_state.c| 18 + 12 files changed, 107 insertions(+), 85 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index dcd4c9a..9784956 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -818,6 +818,32 @@ struct brw_query_object { /** + * Data shared between brw_context::vs and brw_context::gs + */ +struct brw_vec4_context_base +{ + drm_intel_bo *scratch_bo; + drm_intel_bo *const_bo; + /** Offset in the program cache to the program */ + uint32_t prog_offset; + uint32_t state_offset; + + uint32_t push_const_offset; /* Offset in the batchbuffer */ + int push_const_size; /* in 256-bit register increments */ + + uint32_t bind_bo_offset; + uint32_t surf_offset[BRW_MAX_VEC4_SURFACES]; + + /** SAMPLER_STATE count and table offset */ + uint32_t sampler_count; + uint32_t sampler_offset; + + /** Offsets in the batch to sampler default colors (texture border color) */ + uint32_t sdc_offset[BRW_MAX_TEX_UNIT]; +}; I like what this patch is doing, but I really don't like the names. With the exception of ralloc, context/ctx generally always mean the global GL context: gl_context or a subclass like brw_context. (For ralloc, we inherited the context terminology from talloc, so it kind of stuck.) vec4_ctx/brw_vec4_context_base are something totally different. This is a structure that represents the shader program state for a particular pipeline stage. Also, other than BRW_MAX_VEC4_SURFACES, there's nothing vec4 specific about this at all. The pixel shader could use every one of these fields (and should eventually). So I dislike vec4 in the name - we're just going to have to change it. I had suggested names like brw_shader_state or brw_pipeline_state...I'm open to other ideas. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Remove never used RSR and RSL opcodes.
Existed since the initial import, but appear to never have actually existed. --- Rotate? src/mesa/drivers/dri/i965/brw_defines.h | 2 -- src/mesa/drivers/dri/i965/brw_eu.h | 2 -- src/mesa/drivers/dri/i965/brw_eu_emit.c | 2 -- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 -- 4 files changed, 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 832ff55..7e5be2a 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -691,8 +691,6 @@ enum opcode { BRW_OPCODE_XOR =7, BRW_OPCODE_SHR =8, BRW_OPCODE_SHL =9, - BRW_OPCODE_RSR =10, - BRW_OPCODE_RSL =11, BRW_OPCODE_ASR =12, BRW_OPCODE_CMP =16, BRW_OPCODE_CMPN = 17, diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 387450b..6ac1c68 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -154,8 +154,6 @@ ALU2(OR) ALU2(XOR) ALU2(SHR) ALU2(SHL) -ALU2(RSR) -ALU2(RSL) ALU2(ASR) ALU1(F32TO16) ALU1(F16TO32) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index ecf8597..f26c913 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -936,8 +936,6 @@ ALU2(OR) ALU2(XOR) ALU2(SHR) ALU2(SHL) -ALU2(RSR) -ALU2(RSL) ALU2(ASR) ALU1(F32TO16) ALU1(F16TO32) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index e715c37..ccd4e5e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -53,8 +53,6 @@ is_expression(const fs_inst *const inst) case BRW_OPCODE_XOR: case BRW_OPCODE_SHR: case BRW_OPCODE_SHL: - case BRW_OPCODE_RSR: - case BRW_OPCODE_RSL: case BRW_OPCODE_ASR: case BRW_OPCODE_ADD: case BRW_OPCODE_MUL: -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: Remove never used DPA2 opcode.
Existed since the initial import, but appears to never have actually existed. --- src/mesa/drivers/dri/i965/brw_defines.h | 1 - 1 file changed, 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 7e5be2a..21c8baa 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -738,7 +738,6 @@ enum opcode { BRW_OPCODE_DPH =85, BRW_OPCODE_DP3 =86, BRW_OPCODE_DP2 =87, - BRW_OPCODE_DPA2 = 88, BRW_OPCODE_LINE = 89, BRW_OPCODE_PLN =90, BRW_OPCODE_MAD =91, -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] nv50: implement new float comparison instructions
Am Dienstag, den 13.08.2013, 20:14 +0200 schrieb Christoph Bumiller: On 13.08.2013 19:04, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com untested. Looks like it should work though, thanks. nv50 only supported u32 result all along and on nvc0 both cases are already handled by the rest of the code, too. This commit beaks Xonotic on NV92 for me. Dmesg has a lot of those: TRAP_MP_EXEC - TP 0 MP 0: TIMEOUT at 07fed0 warp 20, opcode 90001204 82051008 --- .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp | 17 + 1 file changed, 17 insertions(+) diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp index 56eccac..a2ad9f4 100644 --- a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp @@ -440,6 +440,11 @@ nv50_ir::DataType Instruction::inferDstType() const switch (getOpcode()) { case TGSI_OPCODE_F2U: return nv50_ir::TYPE_U32; case TGSI_OPCODE_F2I: return nv50_ir::TYPE_S32; + case TGSI_OPCODE_FSEQ: + case TGSI_OPCODE_FSGE: + case TGSI_OPCODE_FSLT: + case TGSI_OPCODE_FSNE: + return nv50_ir::TYPE_U32; case TGSI_OPCODE_I2F: case TGSI_OPCODE_U2F: return nv50_ir::TYPE_F32; @@ -456,19 +461,23 @@ nv50_ir::CondCode Instruction::getSetCond() const case TGSI_OPCODE_SLT: case TGSI_OPCODE_ISLT: case TGSI_OPCODE_USLT: + case TGSI_OPCODE_FSLT: return CC_LT; case TGSI_OPCODE_SLE: return CC_LE; case TGSI_OPCODE_SGE: case TGSI_OPCODE_ISGE: case TGSI_OPCODE_USGE: + case TGSI_OPCODE_FSGE: return CC_GE; case TGSI_OPCODE_SGT: return CC_GT; case TGSI_OPCODE_SEQ: case TGSI_OPCODE_USEQ: + case TGSI_OPCODE_FSEQ: return CC_EQ; case TGSI_OPCODE_SNE: + case TGSI_OPCODE_FSNE: return CC_NEU; case TGSI_OPCODE_USNE: return CC_NE; @@ -556,6 +565,10 @@ static nv50_ir::operation translateOpcode(uint opcode) NV50_IR_OPCODE_CASE(KILL_IF, DISCARD); NV50_IR_OPCODE_CASE(F2I, CVT); + NV50_IR_OPCODE_CASE(FSEQ, SET); + NV50_IR_OPCODE_CASE(FSGE, SET); + NV50_IR_OPCODE_CASE(FSLT, SET); + NV50_IR_OPCODE_CASE(FSNE, SET); NV50_IR_OPCODE_CASE(IDIV, DIV); NV50_IR_OPCODE_CASE(IMAX, MAX); NV50_IR_OPCODE_CASE(IMIN, MIN); @@ -2354,6 +2367,10 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) case TGSI_OPCODE_SLE: case TGSI_OPCODE_SNE: case TGSI_OPCODE_STR: + case TGSI_OPCODE_FSEQ: + case TGSI_OPCODE_FSGE: + case TGSI_OPCODE_FSLT: + case TGSI_OPCODE_FSNE: case TGSI_OPCODE_ISGE: case TGSI_OPCODE_ISLT: case TGSI_OPCODE_USEQ: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/22] i965: Make sure constants re-sent after constant buffer reallocation.
On 08/26/2013 03:12 PM, Paul Berry wrote: The hardware requires that after constant buffers for a stage are allocated using a 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} command, and prior to execution of a 3DPRIMITIVE, the corresponding stage's constant buffers must be reprogrammed using a 3DSTATE_CONSTANT_{VS,HS,DS,GS,PS} command. Previously we didn't need to worry about this, because we only programmed 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} once on startup. But now that we reallocate the constant buffers whenever geometry shaders are switched on and off, we need to make sure the constant buffers are reprogrammed. Not exactly. The change to do PUSH_CONSTANT_ALLOC once at startup is very recent - I only committed it on June 10th (fc800f0c60a2) Previously, we had a state atom which did PUSH_CONSTANT_ALLOC whenever BRW_NEW_CONTEXT was flagged. That's still vaguely once at startup, but keep in mind that before hardware contexts were mandatory, BRW_NEW_CONTEXT got flagged on every batch. The atoms list looked like this: gen7_push_constant_alloc, ... gen7_vs_state, ... gen7_ps_state, Both VS and PS state listen to BRW_NEW_BATCH, so on every batch, we'd end up doing: 3DSTATE_PUSH_CONSTANT_ALLOC_VS (if hw_ctx == NULL) 3DSTATE_PUSH_CONSTANT_ALLOC_PS (if hw_ctx == NULL) 3DSTATE_CONSTANT_VS 3DSTATE_CONSTANT_PS which meant that we always obeyed this rule, even when we didn't do the allocation once at startup and never again. But this only worked because we always allocated push constant space at the start of a batch. Your previous patch cause reallocations to happen mid-batch whenever the geometry program changes. This makes the old tricks quit working, and we do need a new flag. So, I was pretty skeptical of this patch, but on further review, it does appear to be necessary and looks fine as is. We do this by adding a new bit, BRW_NEW_PUSH_CONSTANT_ALLOCATION, to brw-state.dirty.brw. --- src/mesa/drivers/dri/i965/brw_context.h | 2 ++ src/mesa/drivers/dri/i965/gen6_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen6_vs_state.c | 3 ++- src/mesa/drivers/dri/i965/gen6_wm_state.c | 3 ++- src/mesa/drivers/dri/i965/gen7_urb.c | 13 + src/mesa/drivers/dri/i965/gen7_vs_state.c | 3 ++- src/mesa/drivers/dri/i965/gen7_wm_state.c | 3 ++- 7 files changed, 24 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 95f9bb2..35193a6 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -158,6 +158,7 @@ enum brw_state_id { BRW_STATE_UNIFORM_BUFFER, BRW_STATE_META_IN_PROGRESS, BRW_STATE_INTERPOLATION_MAP, + BRW_STATE_PUSH_CONSTANT_ALLOCATION, BRW_NUM_STATE_BITS }; @@ -194,6 +195,7 @@ enum brw_state_id { #define BRW_NEW_UNIFORM_BUFFER (1 BRW_STATE_UNIFORM_BUFFER) #define BRW_NEW_META_IN_PROGRESS(1 BRW_STATE_META_IN_PROGRESS) #define BRW_NEW_INTERPOLATION_MAP (1 BRW_STATE_INTERPOLATION_MAP) +#define BRW_NEW_PUSH_CONSTANT_ALLOCATION (1 BRW_STATE_PUSH_CONSTANT_ALLOCATION) struct brw_state_flags { /** State update flags signalled by mesa internals */ diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c b/src/mesa/drivers/dri/i965/gen6_gs_state.c index ac78286..9648fb7 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_state.c +++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c @@ -81,7 +81,7 @@ upload_gs_state(struct brw_context *brw) const struct brw_tracked_state gen6_gs_state = { .dirty = { .mesa = _NEW_TRANSFORM, - .brw = BRW_NEW_CONTEXT, + .brw = BRW_NEW_CONTEXT | BRW_NEW_PUSH_CONSTANT_ALLOCATION, .cache = CACHE_NEW_FF_GS_PROG }, .emit = upload_gs_state, diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c b/src/mesa/drivers/dri/i965/gen6_vs_state.c index c099342..9f99db8 100644 --- a/src/mesa/drivers/dri/i965/gen6_vs_state.c +++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c @@ -206,7 +206,8 @@ const struct brw_tracked_state gen6_vs_state = { .mesa = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS, .brw = (BRW_NEW_CONTEXT | BRW_NEW_VERTEX_PROGRAM | - BRW_NEW_BATCH), + BRW_NEW_BATCH | +BRW_NEW_PUSH_CONSTANT_ALLOCATION), .cache = CACHE_NEW_VS_PROG | CACHE_NEW_SAMPLER }, .emit = upload_vs_state, diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c b/src/mesa/drivers/dri/i965/gen6_wm_state.c index e286785..6725805 100644 --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c @@ -229,7 +229,8 @@ const struct brw_tracked_state gen6_wm_state = { _NEW_POLYGON | _NEW_MULTISAMPLE), .brw = (BRW_NEW_FRAGMENT_PROGRAM | - BRW_NEW_BATCH), + BRW_NEW_BATCH | +BRW_NEW_PUSH_CONSTANT_ALLOCATION), .cache
Re: [Mesa-dev] [PATCH 09/22] i965/gs: Allocate push constant space for use by GS.
On 08/26/2013 03:12 PM, Paul Berry wrote: Previously, we would always use the same push constant allocation regardless of what shader programs were being run: the available push constant space was split into 2 equal size partitions, one for the vertex shader, and one for the fragment shader. Now that we are adding geometry shader support, we need to do something smarter. This patch adjusts things so that when a geometry shader is in use, we split the available push constant space into 3 nearly-equal size partitions instead of 2. Since the push constant allocation is now affected by GL state, it can no longer be set up by brw_upload_initial_gpu_state(); instead it must be set up by a state atom. --- src/mesa/drivers/dri/i965/brw_context.h | 3 +- src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/brw_state.h| 4 +- src/mesa/drivers/dri/i965/brw_state_upload.c | 5 +- src/mesa/drivers/dri/i965/gen7_blorp.cpp | 6 ++ src/mesa/drivers/dri/i965/gen7_urb.c | 101 +++ 6 files changed, 98 insertions(+), 22 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 77f2a6b..95f9bb2 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1508,7 +1508,8 @@ gen6_get_sample_position(struct gl_context *ctx, /* gen7_urb.c */ void -gen7_allocate_push_constants(struct brw_context *brw); +gen7_emit_push_constant_state(struct brw_context *brw, unsigned vs_size, + unsigned gs_size, unsigned fs_size); void gen7_emit_urb_state(struct brw_context *brw, diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 832ff55..8d9a824 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1284,6 +1284,7 @@ enum brw_message_target { # define GEN7_URB_STARTING_ADDRESS_SHIFT25 #define _3DSTATE_PUSH_CONSTANT_ALLOC_VS 0x7912 /* GEN7+ */ +#define _3DSTATE_PUSH_CONSTANT_ALLOC_GS 0x7915 /* GEN7+ */ #define _3DSTATE_PUSH_CONSTANT_ALLOC_PS 0x7916 /* GEN7+ */ # define GEN7_PUSH_CONSTANT_BUFFER_OFFSET_SHIFT 16 diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index 85f82fe..4814639 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -112,6 +112,7 @@ extern const struct brw_tracked_state gen7_cc_viewport_state_pointer; extern const struct brw_tracked_state gen7_clip_state; extern const struct brw_tracked_state gen7_disable_stages; extern const struct brw_tracked_state gen7_ps_state; +extern const struct brw_tracked_state gen7_push_constant_space; extern const struct brw_tracked_state gen7_sbe_state; extern const struct brw_tracked_state gen7_sf_clip_viewport; extern const struct brw_tracked_state gen7_sf_state; @@ -220,9 +221,6 @@ uint32_t get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset, int fs_attr, bool two_side_color, uint32_t *max_source_attr); -/* gen7_urb.c */ -void gen7_allocate_push_constants(struct brw_context *brw); - #ifdef __cplusplus } #endif diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index b883002..9638c69 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -188,6 +188,7 @@ static const struct brw_tracked_state *gen7_atoms[] = gen7_cc_viewport_state_pointer, /* must do after brw_cc_vp */ gen7_sf_clip_viewport, + gen7_push_constant_space, gen7_urb, gen6_blend_state, /* must do before cc unit */ gen6_color_calc_state,/* must do before cc unit */ @@ -251,10 +252,6 @@ brw_upload_initial_gpu_state(struct brw_context *brw) return; brw_upload_invariant_state(brw); - - if (brw-gen = 7) { - gen7_allocate_push_constants(brw); - } } void brw_init_state( struct brw_context *brw ) diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp b/src/mesa/drivers/dri/i965/gen7_blorp.cpp index 6c798b1..9df3d92 100644 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp @@ -51,6 +51,12 @@ static void gen7_blorp_emit_urb_config(struct brw_context *brw, const brw_blorp_params *params) { + unsigned urb_size = (brw-is_haswell brw-gt == 3) ? 32 : 16; + gen7_emit_push_constant_state(brw, + urb_size / 2 /* vs_size */, + 0 /* gs_size */, + urb_size / 2 /* fs_size */); + /* The minimum valid number of VS entries is 32. See 3DSTATE_URB_VS, Dword * 1.15:0 VS Number of URB Entries. */ diff --git a/src/mesa/drivers/dri/i965/gen7_urb.c
Re: [Mesa-dev] [PATCH 11/22] i965: generalize brw_vs_pull_constants in preparation for GS.
On 08/26/2013 03:12 PM, Paul Berry wrote: --- src/mesa/drivers/dri/i965/brw_state.h| 8 +++ src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 66 +++- 2 files changed, 50 insertions(+), 24 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index 4814639..e7a1b40 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -221,6 +221,14 @@ uint32_t get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset, int fs_attr, bool two_side_color, uint32_t *max_source_attr); +/* brw_vs_surface_state.c */ +void +brw_upload_vec4_pull_constants(struct brw_context *brw, + GLbitfield64 brw_new_constbuf, FWIW, brw-state.dirty.brw is only 32-bits currently. That said, it's probably going to change in the not-too-distant future, so using GLbitfield64 preemptively isn't crazy. + const struct gl_program *prog, + struct brw_vec4_context_base *vec4_ctx, + const struct brw_vec4_prog_data *prog_data); + #ifdef __cplusplus } #endif diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c index 629eb96..48124bf 100644 --- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c @@ -35,56 +35,50 @@ #include brw_context.h #include brw_state.h -/* Creates a new VS constant buffer reflecting the current VS program's - * constants, if needed by the VS program. - * - * Otherwise, constants go through the CURBEs using the brw_constant_buffer - * state atom. - */ -static void -brw_upload_vs_pull_constants(struct brw_context *brw) -{ - struct brw_vec4_context_base *vec4_ctx = brw-vs.base; - /* BRW_NEW_VERTEX_PROGRAM */ - struct brw_vertex_program *vp = - (struct brw_vertex_program *) brw-vertex_program; +void +brw_upload_vec4_pull_constants(struct brw_context *brw, + GLbitfield64 brw_new_constbuf, + const struct gl_program *prog, + struct brw_vec4_context_base *vec4_ctx, + const struct brw_vec4_prog_data *prog_data) +{ int i; /* Updates the ParamaterValues[i] pointers for all parameters of the * basic type of PROGRAM_STATE_VAR. */ - _mesa_load_state_parameters(brw-ctx, vp-program.Base.Parameters); + _mesa_load_state_parameters(brw-ctx, prog-Parameters); - /* CACHE_NEW_VS_PROG */ - if (!brw-vs.prog_data-base.nr_pull_params) { + if (!prog_data-nr_pull_params) { if (vec4_ctx-const_bo) { drm_intel_bo_unreference(vec4_ctx-const_bo); vec4_ctx-const_bo = NULL; vec4_ctx-surf_offset[SURF_INDEX_VEC4_CONST_BUFFER] = 0; -brw-state.dirty.brw |= BRW_NEW_VS_CONSTBUF; +brw-state.dirty.brw |= brw_new_constbuf; } return; } /* _NEW_PROGRAM_CONSTANTS */ drm_intel_bo_unreference(vec4_ctx-const_bo); - uint32_t size = brw-vs.prog_data-base.nr_pull_params * 4; - vec4_ctx-const_bo = drm_intel_bo_alloc(brw-bufmgr, vp_const_buffer, + uint32_t size = prog_data-nr_pull_params * 4; + vec4_ctx-const_bo = drm_intel_bo_alloc(brw-bufmgr, vec4_const_buffer, size, 64); drm_intel_gem_bo_map_gtt(vec4_ctx-const_bo); - for (i = 0; i brw-vs.prog_data-base.nr_pull_params; i++) { + + for (i = 0; i prog_data-nr_pull_params; i++) { memcpy(vec4_ctx-const_bo-virtual + i * 4, -brw-vs.prog_data-base.pull_param[i], +prog_data-pull_param[i], 4); } if (0) { - for (i = 0; i ALIGN(brw-vs.prog_data-base.nr_pull_params, 4) / 4; + for (i = 0; i ALIGN(prog_data-nr_pull_params, 4) / 4; i++) { You could probably move the i++ up a line since it's shorter now. This patch is great. float *row = (float *)vec4_ctx-const_bo-virtual + i * 4; -printf(vs const surface %3d: %4.3f %4.3f %4.3f %4.3f\n, +printf(const surface %3d: %4.3f %4.3f %4.3f %4.3f\n, i, row[0], row[1], row[2], row[3]); } } @@ -95,7 +89,31 @@ brw_upload_vs_pull_constants(struct brw_context *brw) brw-vtbl.create_constant_surface(brw, vec4_ctx-const_bo, 0, size, vec4_ctx-surf_offset[surf], false); - brw-state.dirty.brw |= BRW_NEW_VS_CONSTBUF; + brw-state.dirty.brw |= brw_new_constbuf; +} + + +/* Creates a new VS constant buffer reflecting the current VS program's + * constants, if needed by the VS program. + * + * Otherwise, constants go through the CURBEs using the brw_constant_buffer + * state atom. + */ +static void +brw_upload_vs_pull_constants(struct brw_context *brw) +{ + struct brw_vec4_context_base *vec4_ctx =
Re: [Mesa-dev] [PATCH 13/22] i965/gs: Implement support for geometry shader surfaces.
On 08/26/2013 03:12 PM, Paul Berry wrote: This patch implements pull constant upload, binding table upload, and surface setup for geometry shaders, by re-using vertex shader code that was generalized in previous patches. Based on work by Eric Anholt e...@anholt.net. This looks a lot better than the previous version. I've never really been crazy about having binding table code split across brw_vs_surface_state.c and brw_wm_surface_state.c, with the bulk of the code in the WM file for some reason. This adds a third file, and I'm not crazy about that either. Still, I vote that we land brw_gs_surface_state.c as is now; we can always do more tidying and code motion after the fact. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 20/22] i965/gen7: merge defines for 3DSTATE{VS, GS, WM} dword 2
On 08/26/2013 03:12 PM, Paul Berry wrote: Dword 2 of all 3DSTATE commands is the same, so there's no need tohave Well, not -all- 3DSTATE commands...just these :) It's weird that you decided to share the bits for 3DSTATE_VS, 3DSTATE_GS, and 3DSTATE_WM on SNB, but not GEN7_PS_* for 3DSTATE_PS on IVB. If you're going to do WM, you might as well do PS too... separate defines for it. This will allow us to unify some of the state setup code between VS and GS. --- src/mesa/drivers/dri/i965/brw_defines.h | 30 +- src/mesa/drivers/dri/i965/gen6_blorp.cpp | 2 +- src/mesa/drivers/dri/i965/gen6_gs_state.c | 6 +++--- src/mesa/drivers/dri/i965/gen6_vs_state.c | 4 ++-- src/mesa/drivers/dri/i965/gen6_wm_state.c | 4 ++-- src/mesa/drivers/dri/i965/gen7_disable.c | 4 ++-- src/mesa/drivers/dri/i965/gen7_vs_state.c | 4 ++-- 7 files changed, 21 insertions(+), 33 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index ec6c854..d698757 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1303,14 +1303,16 @@ enum brw_message_target { #define _3DSTATE_SCISSOR_STATE_POINTERS 0x780f /* GEN6+ */ -#define _3DSTATE_VS0x7810 /* GEN6+ */ +/* Common to _3DSTATE_{VS,GS} */ No mention of WM here. Maybe: /* Common to 3DSTATE_{VS,GS,PS|WM} */ /* DW2 */ -# define GEN6_VS_SPF_MODE (1 31) -# define GEN6_VS_VECTOR_MASK_ENABLE(1 30) -# define GEN6_VS_SAMPLER_COUNT_SHIFT 27 -# define GEN6_VS_BINDING_TABLE_ENTRY_COUNT_SHIFT 18 -# define GEN6_VS_FLOATING_POINT_MODE_IEEE_754 (0 16) -# define GEN6_VS_FLOATING_POINT_MODE_ALT (1 16) +# define GEN6_SPF_MODE (1 31) +# define GEN6_VECTOR_MASK_ENABLE (1 30) +# define GEN6_SAMPLER_COUNT_SHIFT 27 +# define GEN6_BINDING_TABLE_ENTRY_COUNT_SHIFT 18 +# define GEN6_FLOATING_POINT_MODE_IEEE_754 (0 16) +# define GEN6_FLOATING_POINT_MODE_ALT (1 16) + +#define _3DSTATE_VS0x7810 /* GEN6+ */ /* DW4 */ # define GEN6_VS_DISPATCH_START_GRF_SHIFT 20 # define GEN6_VS_URB_READ_LENGTH_SHIFT11 @@ -1323,13 +1325,6 @@ enum brw_message_target { # define GEN6_VS_ENABLE (1 0) #define _3DSTATE_GS 0x7811 /* GEN6+ */ -/* DW2 */ -# define GEN6_GS_SPF_MODE (1 31) -# define GEN6_GS_VECTOR_MASK_ENABLE(1 30) -# define GEN6_GS_SAMPLER_COUNT_SHIFT 27 -# define GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT 18 -# define GEN6_GS_FLOATING_POINT_MODE_IEEE_754 (0 16) -# define GEN6_GS_FLOATING_POINT_MODE_ALT (1 16) /* DW4 */ # define GEN6_GS_URB_READ_LENGTH_SHIFT11 # define GEN7_GS_INCLUDE_VERTEX_HANDLES (1 10) @@ -1518,13 +1513,6 @@ enum brw_wm_barycentric_interp_mode { #define _3DSTATE_WM 0x7814 /* GEN6+ */ /* DW1: kernel pointer */ -/* DW2 */ -# define GEN6_WM_SPF_MODE (1 31) -# define GEN6_WM_VECTOR_MASK_ENABLE(1 30) -# define GEN6_WM_SAMPLER_COUNT_SHIFT 27 -# define GEN6_WM_BINDING_TABLE_ENTRY_COUNT_SHIFT 18 -# define GEN6_WM_FLOATING_POINT_MODE_IEEE_754 (0 16) -# define GEN6_WM_FLOATING_POINT_MODE_ALT (1 16) /* DW3: scratch space */ /* DW4 */ # define GEN6_WM_STATISTICS_ENABLE(1 31) diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp b/src/mesa/drivers/dri/i965/gen6_blorp.cpp index 1c85921..4b11d72 100644 --- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp @@ -727,7 +727,7 @@ gen6_blorp_emit_wm_config(struct brw_context *brw, dw6 |= 0 GEN6_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT; /* No interp */ dw6 |= 0 GEN6_WM_NUM_SF_OUTPUTS_SHIFT; /* No inputs from SF */ if (params-use_wm_prog) { - dw2 |= 1 GEN6_WM_SAMPLER_COUNT_SHIFT; /* Up to 4 samplers */ + dw2 |= 1 GEN6_SAMPLER_COUNT_SHIFT; /* Up to 4 samplers */ dw4 |= prog_data-first_curbe_grf GEN6_WM_DISPATCH_START_GRF_SHIFT_0; dw5 |= GEN6_WM_16_DISPATCH_ENABLE; dw5 |= GEN6_WM_KILL_ENABLE; /* TODO: temporarily smash on */ diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c b/src/mesa/drivers/dri/i965/gen6_gs_state.c index 9648fb7..29f9042 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_state.c +++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c @@ -46,7 +46,7 @@ upload_gs_state(struct brw_context *brw) BEGIN_BATCH(7); OUT_BATCH(_3DSTATE_GS 16 | (7 - 2)); OUT_BATCH(brw-ff_gs.prog_offset); -
Re: [Mesa-dev] [PATCH 21/22] i965/gen7: Generalize gen7_vs_state in preparation for GS.
On 08/26/2013 03:12 PM, Paul Berry wrote: --- src/mesa/drivers/dri/i965/brw_state.h | 41 ++ src/mesa/drivers/dri/i965/gen7_vs_state.c | 123 -- 2 files changed, 122 insertions(+), 42 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index b54338a..efef994 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -128,6 +128,38 @@ extern const struct brw_tracked_state gen7_wm_state; extern const struct brw_tracked_state haswell_cut_index; +/** + * Parameters that differ between Gen7 VS and GS state upload commands. + */ +struct gen7_vec4_upload_params +{ + /** +* Command used to set the binding table pointers for this stage. +*/ + unsigned binding_table_pointers_cmd; + + /** +* Command used to set the sampler state pointers for this stage. +*/ + unsigned sampler_state_pointers_cmd; + + /** +* Command used to send constants for this stage. +*/ + unsigned constant_cmd; + + /** +* Command used to send state for this stage. +*/ + unsigned state_cmd; + + /** +* Size of the state command for this stage. +*/ + unsigned state_cmd_size; +}; + + /* brw_misc_state.c */ void brw_upload_invariant_state(struct brw_context *brw); uint32_t @@ -240,6 +272,15 @@ brw_vec4_upload_binding_table(struct brw_context *brw, struct brw_vec4_context_base *vec4_ctx, const struct brw_vec4_prog_data *prog_data); +/* gen7_vs_state.c */ +void +gen7_upload_vec4_state(struct brw_context *brw, + const struct gen7_vec4_upload_params *upload_params, + const struct brw_vec4_context_base *vec4_ctx, + bool active, bool alt_floating_point_mode, + const struct brw_vec4_prog_data *prog_data, + const unsigned *stage_specific_cmd_data); + #ifdef __cplusplus } #endif diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c b/src/mesa/drivers/dri/i965/gen7_vs_state.c index 30fe802..fd81112 100644 --- a/src/mesa/drivers/dri/i965/gen7_vs_state.c +++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c @@ -29,33 +29,31 @@ #include program/prog_statevars.h #include intel_batchbuffer.h -static void -upload_vs_state(struct brw_context *brw) -{ - struct gl_context *ctx = brw-ctx; - const struct brw_vec4_context_base *vec4_ctx = brw-vs.base; - uint32_t floating_point_mode = 0; - const int max_threads_shift = brw-is_haswell ? - HSW_VS_MAX_THREADS_SHIFT : GEN6_VS_MAX_THREADS_SHIFT; - gen7_emit_vs_workaround_flush(brw); - - /* BRW_NEW_VS_BINDING_TABLE */ +void +gen7_upload_vec4_state(struct brw_context *brw, + const struct gen7_vec4_upload_params *upload_params, + const struct brw_vec4_context_base *vec4_ctx, + bool active, bool alt_floating_point_mode, + const struct brw_vec4_prog_data *prog_data, + const unsigned *stage_specific_cmd_data) +{ + /* BRW_NEW_*_BINDING_TABLE */ BEGIN_BATCH(2); - OUT_BATCH(_3DSTATE_BINDING_TABLE_POINTERS_VS 16 | (2 - 2)); + OUT_BATCH(upload_params-binding_table_pointers_cmd 16 | (2 - 2)); OUT_BATCH(vec4_ctx-bind_bo_offset); ADVANCE_BATCH(); /* CACHE_NEW_SAMPLER */ BEGIN_BATCH(2); - OUT_BATCH(_3DSTATE_SAMPLER_STATE_POINTERS_VS 16 | (2 - 2)); + OUT_BATCH(upload_params-sampler_state_pointers_cmd 16 | (2 - 2)); OUT_BATCH(vec4_ctx-sampler_offset); ADVANCE_BATCH(); - if (vec4_ctx-push_const_size == 0) { + if (!active || vec4_ctx-push_const_size == 0) { /* Disable the push constant buffers. */ BEGIN_BATCH(7); - OUT_BATCH(_3DSTATE_CONSTANT_VS 16 | (7 - 2)); + OUT_BATCH(upload_params-constant_cmd 16 | (7 - 2)); OUT_BATCH(0); OUT_BATCH(0); OUT_BATCH(0); @@ -65,10 +63,10 @@ upload_vs_state(struct brw_context *brw) ADVANCE_BATCH(); } else { BEGIN_BATCH(7); - OUT_BATCH(_3DSTATE_CONSTANT_VS 16 | (7 - 2)); + OUT_BATCH(upload_params-constant_cmd 16 | (7 - 2)); OUT_BATCH(vec4_ctx-push_const_size); OUT_BATCH(0); - /* Pointer to the VS constant buffer. Covered by the set of + /* Pointer to the stage's constant buffer. Covered by the set of * state flags from gen6_prepare_wm_contants */ OUT_BATCH(vec4_ctx-push_const_offset | GEN7_MOCS_L3); @@ -78,36 +76,77 @@ upload_vs_state(struct brw_context *brw) ADVANCE_BATCH(); } + BEGIN_BATCH(upload_params-state_cmd_size); + OUT_BATCH(upload_params-state_cmd 16 | + (upload_params-state_cmd_size - 2)); + if (active) { + OUT_BATCH(vec4_ctx-prog_offset); + OUT_BATCH((alt_floating_point_mode ? GEN6_FLOATING_POINT_MODE_ALT + : GEN6_FLOATING_POINT_MODE_IEEE_754) |
Re: [Mesa-dev] [PATCH 08/22] i965/gs: Allocate URB space for use by GS.
On 08/26/2013 03:12 PM, Paul Berry wrote: Previously, we gave all of the URB space (other than the small amount that is used for push constants) to the vertex shader. However, when a geometry shader is active, we need to divide it up between the vertex and geometry shaders. The size of the URB entries for the vertex and geometry shaders can vary dramatically from one shader to the next. So it doesn't make sense to simply split the available space in two. In particular: - On Ivy Bridge GT1, this would not leave enough space for the worst case geometry shader, which requires 64k of URB space. - Due to hardware-imposed limits on the maximum number of URB entries, sometimes a given shader stage will only be capable of using a small amount of URB space. When this happens, it may make sense to allocate substantially less than half of the available space to that stage. Our algorithm for dividing space between the two stages is to first compute (a) the minimum amount of URB space that each stage needs in order to function properly, and (b) the amount of additional URB space that each stage wants (i.e. that it would be capable of making use of). If the total amount of space available is not enough to satisfy needs + wants, then each stage's wants amount is scaled back by the same factor in order to fit. When only a vertex shader is active, this algorithm produces equivalent results to the old algorithm (if the vertex shader stage can make use of all the available URB space, we assign all the space to it; if it can't, we let it use as much as it can). In the future, when we need to support tessellation control and tessellation evaluation pipeline stages, it should be straightforward to expand this algorithm to cover them. v2: Use unsigned rather than GLuint. --- src/mesa/drivers/dri/i965/brw_context.h | 6 +- src/mesa/drivers/dri/i965/gen7_blorp.cpp | 16 ++-- src/mesa/drivers/dri/i965/gen7_urb.c | 155 +-- 3 files changed, 142 insertions(+), 35 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index be5175f..77f2a6b 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1511,8 +1511,10 @@ void gen7_allocate_push_constants(struct brw_context *brw); void -gen7_emit_urb_state(struct brw_context *brw, GLuint nr_vs_entries, -GLuint vs_size, GLuint vs_start); +gen7_emit_urb_state(struct brw_context *brw, +unsigned nr_vs_entries, unsigned vs_size, +unsigned vs_start, unsigned nr_gs_entries, +unsigned gs_size, unsigned gs_start); diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp b/src/mesa/drivers/dri/i965/gen7_blorp.cpp index a387836..6c798b1 100644 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp @@ -51,14 +51,16 @@ static void gen7_blorp_emit_urb_config(struct brw_context *brw, const brw_blorp_params *params) { - /* The minimum valid value is 32. See 3DSTATE_URB_VS, -* Dword 1.15:0 VS Number of URB Entries. + /* The minimum valid number of VS entries is 32. See 3DSTATE_URB_VS, Dword +* 1.15:0 VS Number of URB Entries. */ - int num_vs_entries = 32; - int vs_size = 2; - int vs_start = 2; /* skip over push constants */ - - gen7_emit_urb_state(brw, num_vs_entries, vs_size, vs_start); + gen7_emit_urb_state(brw, + 32 /* num_vs_entries */, + 2 /* vs_size */, + 2 /* vs_start */, + 0 /* num_gs_entries */, + 1 /* gs_size */, + 2 /* gs_start */); } diff --git a/src/mesa/drivers/dri/i965/gen7_urb.c b/src/mesa/drivers/dri/i965/gen7_urb.c index 927af37..2d10cc12 100644 --- a/src/mesa/drivers/dri/i965/gen7_urb.c +++ b/src/mesa/drivers/dri/i965/gen7_urb.c @@ -74,34 +74,136 @@ gen7_upload_urb(struct brw_context *brw) { const int push_size_kB = brw-is_haswell brw-gt == 3 ? 32 : 16; - /* Total space for entries is URB size - 16kB for push constants */ - int handle_region_size = (brw-urb.size - push_size_kB) * 1024; /* bytes */ - /* CACHE_NEW_VS_PROG */ unsigned vs_size = MAX2(brw-vs.prog_data-base.urb_entry_size, 1); - - int nr_vs_entries = handle_region_size / (vs_size * 64); - if (nr_vs_entries brw-urb.max_vs_entries) - nr_vs_entries = brw-urb.max_vs_entries; - - /* According to volume 2a, nr_vs_entries must be a multiple of 8. */ - brw-urb.nr_vs_entries = ROUND_DOWN_TO(nr_vs_entries, 8); - - /* URB Starting Addresses are specified in multiples of 8kB. */ - brw-urb.vs_start = push_size_kB / 8; /* skip over push constants */ - - assert(brw-urb.nr_vs_entries % 8 == 0); - assert(brw-urb.nr_gs_entries % 8 == 0); - /* GS requirement */ - assert(!brw-ff_gs.prog_active); + unsigned