Re: [Mesa-dev] [PATCH 2/2] vdpau: Handle destination rectangles correctly
On 11.12.2011 07:13, Younes Manton wrote: On Tue, Dec 6, 2011 at 4:51 PM, Andy Furnissandy...@ukfsn.org wrote: Maarten Lankhorst wrote: The brokenness in vlVdpVideoMixerRender was compensating for brokenness in vlVdpPresentationQueueDisplay, so fix both at the same time. These fix the two remaining issues (aspect not maintained when fullscreen and subtitle position getting changed when toggling back from fullscreen to windowed) I had with R600 + mplayer + -vo vdpau (sw decode). Can you provide a tested-by line for this and any other patches you've tried that you can confirm don't regress? That way I can push the ones that don't require discussion. It is at least breaking the XvMC sub-picture case, and if I understand Maartens changes correctly he mixed the destination rectangle in source coordinate system (which is only used for XvMC sub-pictures) and the destination rectangle in destination coordinate system (which should be used for VDPAU and practically everything else). Currently looking into it and trying to figure out how to reproduce the bug that he tried to fix in the first place. Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24
I saw this email yesterday, but did not have time to reply. - Original Message - On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net wrote: On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák mar...@gmail.com wrote: unpack_float_z_Z24_X8 fails with -O2 in: - fbo-blit-d24s8 - fbo-depth-sample-compare - fbo-readpixels-depth-formats - glean/depthStencil With -O0, it works fine. I am removing all the assertions. There's not much point in having them, is there? Is -ffast-math involved at all? Yes, -fno-fast-math makes the problem go away. At a guess, replacing * scale with / (float)0xff makes the failure happen either way? Yes. Only using GLdouble fixes it. It makes sense. The mantissa without the sign has 23 bits, but 24 bits are required to exactly represent 0xff if I am not mistaken. I'm afraid this is wrong: single precision floating point can describe 24bits uints (and therefore unorms) without any loss, because although the mantissa has 23bits, the mantissa is only used to represent all but the most significant bit, ie., there's an implicit 24th bit always set to one. The fact that -fno-fast-math makes the problem go away is yet another clear proof that the issue is _not_ lack of precision of single-precision floating point -- as -fno-fast-math will still use single-precision floating point numbers. Actually, fno-fast-math, it will ensure that all intermediate steps are done in single precision instead of higher intel x87 80bit floating points, on the 32bit x86 architecture. And I suspect the problem here is that the rounding w/ x80 yields wrong results. I also disagree that using double precision is a good solution. Either fast-math serves our purposes, or it doesn't. Using double precision to work-around issues with single-precision fast-math is the *worst* thing we could do. Does the assertion failure even happen on 64bits? I doubt, as x64 ABI establishes the use of sse for all floating point, and x87 80bit floating point will never be used. So effectively this is making all archs slower. Honestly, I think the patch should be recalled. And we should consider disabling fast-math. And maybe enabling -mfpmath=sse on 32bits x86 (i.e., require sse). Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24
On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca jfons...@vmware.com wrote: I saw this email yesterday, but did not have time to reply. - Original Message - On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net wrote: On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák mar...@gmail.com wrote: unpack_float_z_Z24_X8 fails with -O2 in: - fbo-blit-d24s8 - fbo-depth-sample-compare - fbo-readpixels-depth-formats - glean/depthStencil With -O0, it works fine. I am removing all the assertions. There's not much point in having them, is there? Is -ffast-math involved at all? Yes, -fno-fast-math makes the problem go away. At a guess, replacing * scale with / (float)0xff makes the failure happen either way? Yes. Only using GLdouble fixes it. It makes sense. The mantissa without the sign has 23 bits, but 24 bits are required to exactly represent 0xff if I am not mistaken. I'm afraid this is wrong: single precision floating point can describe 24bits uints (and therefore unorms) without any loss, because although the mantissa has 23bits, the mantissa is only used to represent all but the most significant bit, ie., there's an implicit 24th bit always set to one. The fact that -fno-fast-math makes the problem go away is yet another clear proof that the issue is _not_ lack of precision of single-precision floating point -- as -fno-fast-math will still use single-precision floating point numbers. Actually, fno-fast-math, it will ensure that all intermediate steps are done in single precision instead of higher intel x87 80bit floating points, on the 32bit x86 architecture. And I suspect the problem here is that the rounding w/ x80 yields wrong results. I also disagree that using double precision is a good solution. Either fast-math serves our purposes, or it doesn't. Using double precision to work-around issues with single-precision fast-math is the *worst* thing we could do. Does the assertion failure even happen on 64bits? I doubt, as x64 ABI establishes the use of sse for all floating point, and x87 80bit floating point will never be used. So effectively this is making all archs slower. Honestly, I think the patch should be recalled. And we should consider disabling fast-math. And maybe enabling -mfpmath=sse on 32bits x86 (i.e., require sse). You're probably right. Feel free to revert fix the issue in some other way. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24
- Original Message - On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca jfons...@vmware.com wrote: I saw this email yesterday, but did not have time to reply. - Original Message - On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net wrote: On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák mar...@gmail.com wrote: unpack_float_z_Z24_X8 fails with -O2 in: - fbo-blit-d24s8 - fbo-depth-sample-compare - fbo-readpixels-depth-formats - glean/depthStencil With -O0, it works fine. I am removing all the assertions. There's not much point in having them, is there? Is -ffast-math involved at all? Yes, -fno-fast-math makes the problem go away. At a guess, replacing * scale with / (float)0xff makes the failure happen either way? Yes. Only using GLdouble fixes it. It makes sense. The mantissa without the sign has 23 bits, but 24 bits are required to exactly represent 0xff if I am not mistaken. I'm afraid this is wrong: single precision floating point can describe 24bits uints (and therefore unorms) without any loss, because although the mantissa has 23bits, the mantissa is only used to represent all but the most significant bit, ie., there's an implicit 24th bit always set to one. The fact that -fno-fast-math makes the problem go away is yet another clear proof that the issue is _not_ lack of precision of single-precision floating point -- as -fno-fast-math will still use single-precision floating point numbers. Actually, fno-fast-math, it will ensure that all intermediate steps are done in single precision instead of higher intel x87 80bit floating points, on the 32bit x86 architecture. And I suspect the problem here is that the rounding w/ x80 yields wrong results. I also disagree that using double precision is a good solution. Either fast-math serves our purposes, or it doesn't. Using double precision to work-around issues with single-precision fast-math is the *worst* thing we could do. Does the assertion failure even happen on 64bits? I doubt, as x64 ABI establishes the use of sse for all floating point, and x87 80bit floating point will never be used. So effectively this is making all archs slower. Honestly, I think the patch should be recalled. And we should consider disabling fast-math. And maybe enabling -mfpmath=sse on 32bits x86 (i.e., require sse). You're probably right. Feel free to revert fix the issue in some other way. OK. Could you please confirm whether the assertions failures you saw were on 32bits or 64bits mode? Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24
On Mon, Dec 12, 2011 at 2:10 PM, Jose Fonseca jfons...@vmware.com wrote: - Original Message - On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca jfons...@vmware.com wrote: I saw this email yesterday, but did not have time to reply. - Original Message - On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net wrote: On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák mar...@gmail.com wrote: unpack_float_z_Z24_X8 fails with -O2 in: - fbo-blit-d24s8 - fbo-depth-sample-compare - fbo-readpixels-depth-formats - glean/depthStencil With -O0, it works fine. I am removing all the assertions. There's not much point in having them, is there? Is -ffast-math involved at all? Yes, -fno-fast-math makes the problem go away. At a guess, replacing * scale with / (float)0xff makes the failure happen either way? Yes. Only using GLdouble fixes it. It makes sense. The mantissa without the sign has 23 bits, but 24 bits are required to exactly represent 0xff if I am not mistaken. I'm afraid this is wrong: single precision floating point can describe 24bits uints (and therefore unorms) without any loss, because although the mantissa has 23bits, the mantissa is only used to represent all but the most significant bit, ie., there's an implicit 24th bit always set to one. The fact that -fno-fast-math makes the problem go away is yet another clear proof that the issue is _not_ lack of precision of single-precision floating point -- as -fno-fast-math will still use single-precision floating point numbers. Actually, fno-fast-math, it will ensure that all intermediate steps are done in single precision instead of higher intel x87 80bit floating points, on the 32bit x86 architecture. And I suspect the problem here is that the rounding w/ x80 yields wrong results. I also disagree that using double precision is a good solution. Either fast-math serves our purposes, or it doesn't. Using double precision to work-around issues with single-precision fast-math is the *worst* thing we could do. Does the assertion failure even happen on 64bits? I doubt, as x64 ABI establishes the use of sse for all floating point, and x87 80bit floating point will never be used. So effectively this is making all archs slower. Honestly, I think the patch should be recalled. And we should consider disabling fast-math. And maybe enabling -mfpmath=sse on 32bits x86 (i.e., require sse). You're probably right. Feel free to revert fix the issue in some other way. OK. Could you please confirm whether the assertions failures you saw were on 32bits or 64bits mode? 32-bit. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/11] gallium: interface changes necessary to implement transform feedback (v5)
- Original Message - From: Marek Olšák mar...@gmail.com Namely: - EXT_transform_feedback - ARB_transform_feedback2 - ARB_transform_feedback_instanced The old interface was not useful for OpenGL and had to be reworked. This interface was originally designed for OpenGL, but additional changes have been made in order to make st/d3d1x support easier. The most notable change is the stream-out info must be linked with a vertex or geometry shader and cannot be set independently. This is due to limitations of existing hardware (special shader instructions must be used to write into stream-out buffers), and it's also how OpenGL works (stream outputs must be specified prior to linking shaders). Other than that, each stream output buffer has a view into it that internally maintains the number of bytes which have been written into it. (one buffer can be bound in several different transform feedback objects in OpenGL, so we must be able to have several views around) The set_stream_output_targets function contains a parameter saying whether new data should be appended or not. Also, the view can optionally be used to provide the vertex count for draw_vbo. Note that the count is supposed to be stored in device memory and the CPU never gets to know its value. OpenGL way | Gallium way BeginTF= set_so_targets(append_bitmask = 0) PauseTF= set_so_targets(num_targets = 0) ResumeTF = set_so_targets(append_bitmask = ~0) EndTF = set_so_targets(num_targets = 0) DrawTF = use pipe_draw_info::count_from_stream_output v2: * removed the reset_stream_output_targets function * added a parameter append_bitmask to set_stream_output_targets, each bit specifies whether new data should be appended to each buffer or not. v3: * added PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME for ARB_tfb2, note that the draw-auto subset is always required (for d3d10), only the pause/resume functionality is limited if the CAP is not advertised v4: * update gallium/docs v5: * compactified struct pipe_stream_output_info, updated dump/trace [...] Thanks for the update. Just a minor suggestion: @@ -267,6 +268,24 @@ void trace_dump_shader_state(const struct pipe_shader_state *state) trace_dump_string(str); trace_dump_member_end(); + trace_dump_member_begin(stream_output); + trace_dump_struct_begin(pipe_stream_output_info); + trace_dump_member(uint, state-stream_output, num_outputs); + trace_dump_member(uint, state-stream_output, stride); + trace_dump_array_begin(); + for(i = 0; i Elements(state-stream_output.output); ++i) { It should be state-stream_output.num_outputs IIUC. Other than that, Reviewed-By: Jose Fonseca jfons...@vmware.com Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/11] mesa: implement DrawTransformFeedback from ARB_transform_feedback2
- Original Message - From: Marek Olšák mar...@gmail.com It's like DrawArrays, but the count is taken from a transform feedback object. This removes DrawTransformFeedback from dd_function_table and adds the same function to GLvertexformat (with the function parameters matching GL). The vbo_draw_func callback has a new parameter struct gl_transform_feedback_object *tfb_vertcount. The rest of the code just validates states and forwards the transform feedback object into vbo_draw_func. I ventured reviewing this too, and I have a question: the unfinished code in master had comments saying /* * Get number of vertices in obj's feedback buffer. * Call ctx-Exec.DrawArrays(mode, 0, count); */ which seems simpler than passing the new tfb_vertcount parameter everywhere, just for the sake of obtaining count, so what's the rationale for doing this differently here? Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600g LLVM shader backend
- Original Message - Hi, I have just pushed a branch containing an LLVM shader backend for r600g to my personal git repo: http://cgit.freedesktop.org/~tstellar/mesa/ r600g-llvm-shader Hi Tom, This is pretty cool stuff. The fact that you have a similar passing rate to the existing compiler makes it quite remarkable. I was aware of several closed/open-source projects to develop GPU backends for LLVM, and LunarG made a middle end, but this is the first working OpenGL stack based on a LLVM GPU backend that I'm aware. There are three main components to this branch: 1. A TGSI-LLVM converter (commit ec9bb644cf7dde055c6c3ee5b8890a2d367337e5) The goal of this converter is to give all gallium drivers a convenient way to convert from TGSI to LLVM. The interface is still evolving, and I'm interested in getting some feedback for it. The interface looks a good start, but I'd like it to be even more overridable. I don't even think there's anything GPU specific there -- I also had some plans to do TGSI translation in llvmpipe in two stages: first TGSI - high-level LLVM IR w/ custom gallivm/tgsi intrinsics - low-level LLVM IR w/ x86 SIMD instrinsincs, to allow optimizations and other decisions to happen at a higher level, before starting to emit lower-level code. So I'd like us to have a flexible hierarchy of TGSI translators that can be shared for GPU/CPUs alike. BTW, I'd prefer that all reusable Gallium+LLVM code (be it meant for GPU or CPU) lives in src/gallium/auxiliary/gallivm , as it make code maintenance and build integration simpler. So tgsi_llvm.c should be moved into gallivm. Also, beware that the ability to build core gallium/mesa without LLVM must be preserved. (Having particular drivers to have hard dependencies on LLVM is of course at discretion of drivers developers though.) 2. Changes to gallivm so that code can be shared between it and the TGSI-LLVM converter. These changes are attached, please review. I'll review them separately. 3. An LLVM backend for r600g. This backend is mostly based on AMD's AMDIL LLVM backend for OpenCL with a few changes added for emitting machine code. Currently, it passes about 99% of the piglit tests that pass with the current r600g shader backend. Most of the failures are due to some unimplemented texture instructions. Indirect addressing is also missing from the LLVM backend, and it relies on the current r600g shader code to do this. There's a 30K line file src/gallium/drivers/radeon/macrodb_gen.h . Is this generated code? Also, maybe it would make sense to have amdil backend distributed separately from mesa, as it looks like a component that has other consumers beyond mesa/gallium/r600g, right? In reality, the LLVM backend does not emit actual machine code, but rather a byte stream that is converted to struct r600_bytecode. The final transformations are done by r600_asm.c, just like in the current shader backend. The LLVM backend is not optimized for VLIW, and it only emits one instruction per group. The optimizations in r600_asm.c are able to do some instruction packing, but the resulting code is not yet as good as the current backend. Why is the result code worse: due to limitations in the assembler, in the AMDIL LLVM backend, or in LLVM itself? The main motivation for this LLVM backend is to help bring compute/OpenCL support to r600g by making it easier to support different compiler frontends. I don't have a concrete plan for integrating this into mainline Mesa yet, but I don't expect it to be in the next release. I would really like to make it compatible with LLVM 3.0 before it gets merged (it only works with LLVM 2.9 now), but if compute support evolves quickly, I might be tempted to push the 2.9 version into the master branch. What are the state trackers that you envision this will use? (e.g., Are you targeting clover? do you hope this will be the default compiler backend for Mesa? Or is Mesa/Gallium just a way to test AMDIL backend?) I'm also interested in your general opinion on using LLVM for GPU. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/15] gallivm: Move tgsi_soa helper macros to header file
- Original Message - --- src/gallium/auxiliary/gallivm/lp_bld_tgsi.h | 12 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 14 -- 2 files changed, 12 insertions(+), 14 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h index 554b4cb..1258620 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h @@ -51,6 +51,18 @@ #define LP_MAX_INSTRUCTIONS 256 +#define FOR_EACH_CHANNEL( CHAN )\ + for (CHAN = 0; CHAN NUM_CHANNELS; CHAN++) + Tom, Symbols that are mean to be used by more that one .c/cpp file should have an unique name prefix to avoid name clashes. For this particular case, these are not LLVM related, so it would make sense to put these macros somewhere in src/gallium/auxiliary/tgsi/ with a TGSI_ prefix so they can be used by all TGSI translation code. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24
- Original Message - On Mon, Dec 12, 2011 at 2:10 PM, Jose Fonseca jfons...@vmware.com wrote: - Original Message - On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca jfons...@vmware.com wrote: I saw this email yesterday, but did not have time to reply. - Original Message - On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net wrote: On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák mar...@gmail.com wrote: unpack_float_z_Z24_X8 fails with -O2 in: - fbo-blit-d24s8 - fbo-depth-sample-compare - fbo-readpixels-depth-formats - glean/depthStencil With -O0, it works fine. I am removing all the assertions. There's not much point in having them, is there? Is -ffast-math involved at all? Yes, -fno-fast-math makes the problem go away. At a guess, replacing * scale with / (float)0xff makes the failure happen either way? Yes. Only using GLdouble fixes it. It makes sense. The mantissa without the sign has 23 bits, but 24 bits are required to exactly represent 0xff if I am not mistaken. I'm afraid this is wrong: single precision floating point can describe 24bits uints (and therefore unorms) without any loss, because although the mantissa has 23bits, the mantissa is only used to represent all but the most significant bit, ie., there's an implicit 24th bit always set to one. The fact that -fno-fast-math makes the problem go away is yet another clear proof that the issue is _not_ lack of precision of single-precision floating point -- as -fno-fast-math will still use single-precision floating point numbers. Actually, fno-fast-math, it will ensure that all intermediate steps are done in single precision instead of higher intel x87 80bit floating points, on the 32bit x86 architecture. And I suspect the problem here is that the rounding w/ x80 yields wrong results. I also disagree that using double precision is a good solution. Either fast-math serves our purposes, or it doesn't. Using double precision to work-around issues with single-precision fast-math is the *worst* thing we could do. Does the assertion failure even happen on 64bits? I doubt, as x64 ABI establishes the use of sse for all floating point, and x87 80bit floating point will never be used. So effectively this is making all archs slower. Honestly, I think the patch should be recalled. And we should consider disabling fast-math. And maybe enabling -mfpmath=sse on 32bits x86 (i.e., require sse). You're probably right. Feel free to revert fix the issue in some other way. OK. Could you please confirm whether the assertions failures you saw were on 32bits or 64bits mode? 32-bit. Thanks. In that case I propose dropping fast-math, at least on x86 32bits, as it makes (in particular the unorm24 - f32 conversions happen in a lot of places in Mesa's source tree) and some times worse code [1], and simply use the subset of finer grained options [2]: -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range. which really meets our needs. About -mfpmath=sse on 32bits, although I believe that depending on sse would be alright, it looks like -mfpmath=sse it's not a clear winner on 32bits , because calls to libc still need to use x87 registers. Jose [1] http://programerror.com/2009/09/when-gccs-ffast-math-isnt/ [2] http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: fix an out-of-bounds access in prog_print.c
On Sun, Dec 11, 2011 at 10:30 PM, Marek Olšák mar...@gmail.com wrote: --- src/mesa/program/prog_print.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/src/mesa/program/prog_print.c b/src/mesa/program/prog_print.c index e9bf3aa..b4d142f 100644 --- a/src/mesa/program/prog_print.c +++ b/src/mesa/program/prog_print.c @@ -112,6 +112,7 @@ arb_input_attrib_string(GLint index, GLenum progType) vertex.texcoord[5], vertex.texcoord[6], vertex.texcoord[7], + vertex.pointsize, vertex.attrib[0], vertex.attrib[1], vertex.attrib[2], -- 1.7.4.1 Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600g LLVM shader backend
On Sat, 2011-12-10 at 03:16 -0800, Stéphane Marchesin wrote: On Fri, Dec 9, 2011 at 14:15, Tom Stellard tstel...@gmail.com wrote: Hi, I have just pushed a branch containing an LLVM shader backend for r600g to my personal git repo: http://cgit.freedesktop.org/~tstellar/mesa/ r600g-llvm-shader There are three main components to this branch: 1. A TGSI-LLVM converter (commit ec9bb644cf7dde055c6c3ee5b8890a2d367337e5) The goal of this converter is to give all gallium drivers a convenient way to convert from TGSI to LLVM. The interface is still evolving, and I'm interested in getting some feedback for it. 2. Changes to gallivm so that code can be shared between it and the TGSI-LLVM converter. These changes are attached, please review. 3. An LLVM backend for r600g. I can't help but notice the additional license restrictions in that code -- is this something that could be sorted out? The AMDIL portion of the code (basically any file with an AMDIL prefix plus the macrod* files) is currently licensed under the 3-clause BSD license with an addition clause for complying with export laws. I need to get some more information from our legal team about the export law clause before I discuss it further, but I'm hoping it's something we can work out. The non-AMDIL portion of the code is still licensed under the MIT license and has that license included in the file. -Tom Stéphane ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24
- Original Message - On Mon, Dec 12, 2011 at 8:20 AM, Jose Fonseca jfons...@vmware.com wrote: - Original Message - On Mon, Dec 12, 2011 at 2:10 PM, Jose Fonseca jfons...@vmware.com wrote: - Original Message - On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca jfons...@vmware.com wrote: I saw this email yesterday, but did not have time to reply. - Original Message - On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net wrote: On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák mar...@gmail.com wrote: unpack_float_z_Z24_X8 fails with -O2 in: - fbo-blit-d24s8 - fbo-depth-sample-compare - fbo-readpixels-depth-formats - glean/depthStencil With -O0, it works fine. I am removing all the assertions. There's not much point in having them, is there? Is -ffast-math involved at all? Yes, -fno-fast-math makes the problem go away. At a guess, replacing * scale with / (float)0xff makes the failure happen either way? Yes. Only using GLdouble fixes it. It makes sense. The mantissa without the sign has 23 bits, but 24 bits are required to exactly represent 0xff if I am not mistaken. I'm afraid this is wrong: single precision floating point can describe 24bits uints (and therefore unorms) without any loss, because although the mantissa has 23bits, the mantissa is only used to represent all but the most significant bit, ie., there's an implicit 24th bit always set to one. The fact that -fno-fast-math makes the problem go away is yet another clear proof that the issue is _not_ lack of precision of single-precision floating point -- as -fno-fast-math will still use single-precision floating point numbers. Actually, fno-fast-math, it will ensure that all intermediate steps are done in single precision instead of higher intel x87 80bit floating points, on the 32bit x86 architecture. And I suspect the problem here is that the rounding w/ x80 yields wrong results. I also disagree that using double precision is a good solution. Either fast-math serves our purposes, or it doesn't. Using double precision to work-around issues with single-precision fast-math is the *worst* thing we could do. Does the assertion failure even happen on 64bits? I doubt, as x64 ABI establishes the use of sse for all floating point, and x87 80bit floating point will never be used. So effectively this is making all archs slower. Honestly, I think the patch should be recalled. And we should consider disabling fast-math. And maybe enabling -mfpmath=sse on 32bits x86 (i.e., require sse). You're probably right. Feel free to revert fix the issue in some other way. OK. Could you please confirm whether the assertions failures you saw were on 32bits or 64bits mode? 32-bit. Thanks. In that case I propose dropping fast-math, at least on x86 32bits, as it makes (in particular the unorm24 - f32 conversions happen in a lot of places in Mesa's source tree) and some times worse code [1], and simply use the subset of finer grained options [2]: -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range. which really meets our needs. About -mfpmath=sse on 32bits, although I believe that depending on sse would be alright, it looks like -mfpmath=sse it's not a clear winner on 32bits , because calls to libc still need to use x87 registers. Jose [1] http://programerror.com/2009/09/when-gccs-ffast-math-isnt/ [2] http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html Thanks for digging into this, guys. I'm happy to drop -ffast-math (or use -fno-fast-math) but it would be interesting to do at least a few before/after comparisons just to make sure there's no surprises in performance. Yep, that was my plan. In any case, don't we still need to use double-precision for Z32 packing/unpacking? I ran into a failure in _mesa_pack_float_z_row() for Z32 on Saturday (debug build on x64). Agreed. I don't know an accurate efficient way converting unorm32 = f32 without doubles. The trick I use in llvmpipe to convert f32 to unorm32, in src/gallium/auxiliary/gallivm/lp_bld_conv.c , only gets 31bits right (i.e, drops one bit from values near 0.0): /* * The destination exceeds what can be represented in the floating point. * So multiply by the largest power two we get away with, and when * subtract the most significant bit to rescale to normalized values. * * The largest power of two factor we can get away is * (1
Re: [Mesa-dev] shaders for g3dvl compositor (was vdpau: Handle destination rectangles correctly)
Hi Maarten, On 12.12.2011 13:25, Maarten Lankhorst wrote: The problem is destination rectangle can be different for video and subpictures, this breaks mplayer OSD if you handle it incorrectly. ftp://download.nvidia.com/XFree86/vdpau/doxygen/html/group___vdp_video_mixer.html#ga62bf3bf8c5f01322a03b07065c5ea3db has info on how it works. destination video rectangle is relative to target image, not source. Yeah, but that is what the destination rectangle in the render call is good for. So you actually don't need to change anything here you just need to another parameter and everything starts to work fine. Get mplayer to play any video fullscreen with -vo vdpau, if things work correctly black borders should appear when aspect ratio of fullscreen doesn't match aspect ratio of video. The onscreen display should also be centered and scaled to the entire screen. Ah, well I see the problem now. Let's split the discussion here a bit. and I'm also CCing Andy, Younes and the mailing list. Please take a look at the attached patches, they should fix the mentioned problems, but without breaking XvMC or anything else (at least I hope so). @Andy: Could you please confirm that they are also working? (They should apply ontop of current master). Thanks, Christian. From d074aab1602c88ca18bb9472c6ab2afd99f2b53d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= deathsim...@vodafone.de Date: Mon, 12 Dec 2011 15:27:34 +0100 Subject: [PATCH 1/3] g3dvl/compositor: improve dirty area handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Take viewport and scissors into account and make the dirty area a parameter instead of a member. Signed-off-by: Christian König deathsim...@vodafone.de --- src/gallium/auxiliary/vl/vl_compositor.c | 104 src/gallium/auxiliary/vl/vl_compositor.h |8 +- src/gallium/state_trackers/vdpau/mixer.c |2 +- src/gallium/state_trackers/vdpau/output.c |2 +- src/gallium/state_trackers/vdpau/presentation.c|4 +- src/gallium/state_trackers/vdpau/vdpau_private.h |2 + src/gallium/state_trackers/xorg/xvmc/surface.c |4 +- .../state_trackers/xorg/xvmc/xvmc_private.h|2 + 8 files changed, 81 insertions(+), 47 deletions(-) diff --git a/src/gallium/auxiliary/vl/vl_compositor.c b/src/gallium/auxiliary/vl/vl_compositor.c index 322ef8e..98cb616 100644 --- a/src/gallium/auxiliary/vl/vl_compositor.c +++ b/src/gallium/auxiliary/vl/vl_compositor.c @@ -40,6 +40,9 @@ #include vl_types.h #include vl_compositor.h +#define MIN_DIRTY (0) +#define MAX_DIRTY (1 15) + typedef float csc_matrix[16]; static void * @@ -470,8 +473,27 @@ gen_rect_verts(struct vertex4f *vb, struct vl_compositor_layer *layer) vb[3].w = layer-src.br.y; } +static INLINE struct u_rect +calc_drawn_area(struct vl_compositor *c, struct vl_compositor_layer *layer) +{ + struct u_rect result; + + // scale + result.x0 = layer-dst.tl.x * c-viewport.scale[0] + c-viewport.translate[0]; + result.y0 = layer-dst.tl.y * c-viewport.scale[1] + c-viewport.translate[1]; + result.x1 = layer-dst.br.x * c-viewport.scale[0] + c-viewport.translate[0]; + result.y1 = layer-dst.br.y * c-viewport.scale[1] + c-viewport.translate[1]; + + // and clip + result.x0 = MAX2(result.x0, c-scissor.minx); + result.y0 = MAX2(result.y0, c-scissor.miny); + result.x1 = MIN2(result.x1, c-scissor.maxx); + result.y1 = MIN2(result.y1, c-scissor.maxy); + return result; +} + static void -gen_vertex_data(struct vl_compositor *c) +gen_vertex_data(struct vl_compositor *c, struct u_rect *dirty) { struct vertex4f *vb; struct pipe_transfer *buf_transfer; @@ -497,15 +519,18 @@ gen_vertex_data(struct vl_compositor *c) gen_rect_verts(vb, layer); vb += 4; - if (layer-clearing - c-dirty_tl.x = layer-dst.tl.x - c-dirty_tl.y = layer-dst.tl.y - c-dirty_br.x = layer-dst.br.x - c-dirty_br.y = layer-dst.br.y) { - -// We clear the dirty area anyway, no need for clear_render_target -c-dirty_tl.x = c-dirty_tl.y = 1.0f; -c-dirty_br.x = c-dirty_br.y = 0.0f; + if (dirty layer-clearing) { +struct u_rect drawn = calc_drawn_area(c, layer); +if ( + dirty-x0 = drawn.x0 + dirty-y0 = drawn.y0 + dirty-x1 = drawn.x1 + dirty-y1 = drawn.y1) { + + // We clear the dirty area anyway, no need for clear_render_target + dirty-x0 = dirty-y0 = MAX_DIRTY; + dirty-x1 = dirty-y1 = MIN_DIRTY; +} } } } @@ -514,7 +539,7 @@ gen_vertex_data(struct vl_compositor *c) } static void -draw_layers(struct vl_compositor *c) +draw_layers(struct vl_compositor *c, struct u_rect *dirty) { unsigned vb_index, i; @@ -533,22 +558,25 @@
Re: [Mesa-dev] [PATCH] meta: rework dest image allocation in mipmap generation code
On Sat, 10 Dec 2011 12:00:27 -0700, Brian Paul bri...@vmware.com wrote: This fixes two things: 1. If the texture object was created with glTexStorage2D, the call to _mesa_TexImage2D() would generate INVALID_OPERATION since the texture is marked as immutable. 2. _mesa_TexImage2D() always frees any existing texture image memory before allocating new memory. That's inefficient since the existing image is usually the right size already. Reviewed-by: Eric Anholt e...@anholt.net pgpNb1oPLuuMh.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/11] mesa: implement DrawTransformFeedback from ARB_transform_feedback2
- Original Message - On 12/12/2011 02:49 PM, Jose Fonseca wrote: - Original Message - From: Marek Olšák mar...@gmail.com It's like DrawArrays, but the count is taken from a transform feedback object. This removes DrawTransformFeedback from dd_function_table and adds the same function to GLvertexformat (with the function parameters matching GL). The vbo_draw_func callback has a new parameter struct gl_transform_feedback_object *tfb_vertcount. The rest of the code just validates states and forwards the transform feedback object into vbo_draw_func. I ventured reviewing this too, and I have a question: the unfinished code in master had comments saying /* * Get number of vertices in obj's feedback buffer. * Call ctx-Exec.DrawArrays(mode, 0, count); */ which seems simpler than passing the new tfb_vertcount parameter everywhere, just for the sake of obtaining count, so what's the rationale for doing this differently here? The count is contained in the TFB object (through its association with a pipe_stream_output_target which tracks the amount of vertices written to it) and is not supposed to be read back by the CPU because that would mean waiting. The driver will let the GPU write the count into the command buffer, so it only needs to sync command processing with TFB. I see. Makes sense. I have no more objections against the series FWIW. It would be nice to fix softpipe/llvmpipe though, but IIUC they could not possibly have worked with Mesa before, and I don't think anybody is relying on that softpipe/llvmpipe functionality w/ other state trackers at this point, so I suppose it could wait. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl_to_tgsi: make sure copied instructions don't lose texture target.
On Sat, Dec 10, 2011 at 11:34 AM, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com The piglit draw-pixel-with-texture was asserting in the glsl-tgsi code, due to 0 texture target, this makes sure the texture target is copied over correctly when we copy instructions around. Signed-off-by: Dave Airlie airl...@redhat.com --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 6cc655d..68eaddc 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -3786,6 +3786,7 @@ get_pixel_transfer_visitor(struct st_fragment_program *fp, * new visitor. */ foreach_iter(exec_list_iterator, iter, original-instructions) { glsl_to_tgsi_instruction *inst = (glsl_to_tgsi_instruction *)iter.get(); + glsl_to_tgsi_instruction *newinst; st_src_reg src_regs[3]; if (inst-dst.file == PROGRAM_OUTPUT) @@ -3803,7 +3804,8 @@ get_pixel_transfer_visitor(struct st_fragment_program *fp, prog-InputsRead |= BITFIELD64_BIT(src_regs[i].index); } - v-emit(NULL, inst-op, inst-dst, src_regs[0], src_regs[1], src_regs[2]); + newinst = v-emit(NULL, inst-op, inst-dst, src_regs[0], src_regs[1], src_regs[2]); + newinst-tex_target = inst-tex_target; } /* Make modifications to fragment program info. */ Looks good to me. I had started to look into the glDrawPixels failures and found tex_target to be zero. I didn't know where it was supposed to come from though. Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meta: rework dest image allocation in mipmap generation code
On Sat, 10 Dec 2011 12:00:27 -0700, Brian Paul bri...@vmware.com wrote: This fixes two things: 1. If the texture object was created with glTexStorage2D, the call to _mesa_TexImage2D() would generate INVALID_OPERATION since the texture is marked as immutable. 2. _mesa_TexImage2D() always frees any existing texture image memory before allocating new memory. That's inefficient since the existing image is usually the right size already. Actually, scratch that review. I think I see a bug: update_fbo_texture() is no longer called with this, so FBOs attached to this texture may have incorrect FBO completeness after a GenerateMipmap. pgpKThIkyrUtw.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] swrast: rewrite color buffer clearing to use Map/UnmapRenderbuffer()
On Sat, 10 Dec 2011 12:00:53 -0700, Brian Paul bri...@vmware.com wrote: - /* Note that masking will change the color values, but only the -* channels for which the write mask is GL_FALSE. The channels -* which which are write-enabled won't get modified. -*/ - for (i = 0; i height; i++) { - span.x = x; - span.y = y + i; - _swrast_mask_rgba_span(ctx, rb, span, buf); - /* write masked row */ - rb-PutRow(ctx, rb, width, x, y + i, span.array-rgba, NULL); + if (doMasking) { + /* Convert the boolean mask to a color */ I think this would be more informative as /* Convert the boolean mask to a color value that will be packed to * produce the bitmask for the renderbuffer's format. */ but overall, this looks like a pretty nice (if tricky) way to handle all these formats. Reviewed-by: Eric Anholt e...@anholt.net pgpDwhlmTYsUB.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] shaders for g3dvl compositor (was vdpau: Handle destination rectangles correctly)
Christian König wrote: @Andy: Could you please confirm that they are also working? (They should apply ontop of current master). Yes - they are working OK for me. One thing I noticed while messing with mplayer subs/OSD is that vdpau in full screen positions them relative to the screen rather than the active picture, so they can appear over the black bars. This is the same with the previous patches, and also -vo gl does this. Xvmc, xv and x11 all keep them in the active picture area - I don't know which, if any, is considered right or wrong behavior. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] swrast: do depth/stencil clearing with Map/UnmapRenderbuffer()
On Mon, Dec 12, 2011 at 10:28 AM, Eric Anholt e...@anholt.net wrote: On Sat, 10 Dec 2011 12:00:52 -0700, Brian Paul bri...@vmware.com wrote: Another step toward getting rid of the renderbuffer PutRow/etc functions. + if (buffers BUFFER_DS) { + struct gl_renderbuffer *rb = + ctx-DrawBuffer-Attachment[BUFFER_DEPTH].Renderbuffer; + + if ((buffers BUFFER_DS) == BUFFER_DS rb + _mesa_is_format_packed_depth_stencil(rb-Format)) { + /* clear depth and stencil together */ + _swrast_clear_depth_stencil_buffer(ctx); I think this would be wrong for the case of Attachment[BUFFER_DEPTH] = depthstencil texture Attachment[BUFFER_STENCIL = separate stencil RB Granted, that's a pretty crazy setup, but I think this just needs an rb == Attachment[BUFFER_STENCIL].Renderbuffer check. Will do. + mapMode = GL_MAP_WRITE_BIT; + if (_mesa_get_format_bits(rb-Format, GL_STENCIL_BITS) 0) { + mapMode |= GL_MAP_READ_BIT; + } You only set the READ bit if there are stencil bits... + case MESA_FORMAT_S8_Z24: + case MESA_FORMAT_X8_Z24: + case MESA_FORMAT_Z24_S8: + case MESA_FORMAT_Z24_X8: for (i = 0; i height; i++) { - rb-PutMonoRow(ctx, rb, width, x, y + i, clearVal16, NULL); + GLuint *row = (GLuint *) map; + for (j = 0; j width; j++) { + row[j] = (row[j] mask) | clearVal; + } + map += rowStride; } + but then the S8_Z24 and X8_Z24 paths both do the same set of reads. + case MESA_FORMAT_Z32_FLOAT_X24S8: + /* XXX untested */ + { + GLfloat clearVal = (GLfloat) ctx-Depth.Clear; for (i = 0; i height; i++) { - rb-PutMonoRow(ctx, rb, width, x, y + i, clearValue, NULL); + GLfloat *row = (GLfloat *) map; + for (j = 0; j width; j++) { + row[j * 2] = clearVal; + } + map += rowStride; } It looks good, at least. Once I land my outstanding driver bits it should be easy to test. +/** + * Clear both depth and stencil values in a combined depth+stencil buffer. + */ +void +_swrast_clear_depth_stencil_buffer(struct gl_context *ctx) +{ + case MESA_FORMAT_Z32_FLOAT_X24S8: + /* XXX untested */ + { + GLfloat zClear = (GLfloat) ctx-Depth.Clear; + GLuint sClear = ctx-Stencil.Clear; + for (i = 0; i height; i++) { + GLfloat *zRow = (GLfloat *) map; + GLuint *sRow = (GLuint *) map; + for (j = 0; j width; j++) { + zRow[j * 2 + 0] = zClear; + } + for (j = 0; j width; j++) { + sRow[j * 2 + 1] = sClear; + } Missing stencil mask treatment. Will fix. diff --git a/src/mesa/swrast/s_stencil.c b/src/mesa/swrast/s_stencil.c index 101ee50..5c5ebd4 100644 --- a/src/mesa/swrast/s_stencil.c +++ b/src/mesa/swrast/s_stencil.c @@ -1124,121 +1124,108 @@ _swrast_write_stencil_span(struct gl_context *ctx, GLint n, GLint x, GLint y, /** - * Clear the stencil buffer. + * Clear the stencil buffer. If the buffer is a combined + * depth+stencil buffer, only the stencil bits will be touched. */ void -_swrast_clear_stencil_buffer( struct gl_context *ctx, struct gl_renderbuffer *rb ) +_swrast_clear_stencil_buffer(struct gl_context *ctx) { + struct gl_renderbuffer *rb = + ctx-DrawBuffer-Attachment[BUFFER_STENCIL].Renderbuffer; const GLubyte stencilBits = ctx-DrawBuffer-Visual.stencilBits; - const GLuint mask = ctx-Stencil.WriteMask[0]; - const GLuint invMask = ~mask; - const GLuint clearVal = (ctx-Stencil.Clear mask); + const GLuint writeMask = ctx-Stencil.WriteMask[0]; + const GLuint clearVal = (ctx-Stencil.Clear writeMask); const GLuint stencilMax = (1 stencilBits) - 1; GLint x, y, width, height; + GLubyte *map; + GLint rowStride, i, j; + GLbitfield mapMode; + mapMode = GL_MAP_WRITE_BIT; + if ((writeMask stencilMax) != stencilMax) { + /* need to mask stencil values */ + mapMode |= GL_MAP_READ_BIT; + } + else if (_mesa_get_format_bits(rb-Format, GL_DEPTH_BITS) 0) { + /* combined depth+stencil, need to mask Z values */ + mapMode |= GL_MAP_READ_BIT; + } + switch (rb-Format) { + case MESA_FORMAT_S8: + { + GLubyte clear = clearVal 0xff; + GLubyte mask = (~writeMask) 0xff; + if (mask != 0) { mask != 0xff, right? 0 was already tested at the top, and there's that else case below. I think I have that right. mask is the complement of the stencil WriteMask value here. And if mask==0 it means that we're replacing the dest values completely. That's what the else clauses do. - _mesa_memset16((short unsigned int*) stencil, clearVal, width); note, _mesa_memset16 is dead code now. OK, I'll rm that later. -Brian
Re: [Mesa-dev] [PATCH 2/3] swrast: rewrite color buffer clearing to use Map/UnmapRenderbuffer()
On Mon, Dec 12, 2011 at 10:38 AM, Eric Anholt e...@anholt.net wrote: On Sat, 10 Dec 2011 12:00:53 -0700, Brian Paul bri...@vmware.com wrote: - /* Note that masking will change the color values, but only the - * channels for which the write mask is GL_FALSE. The channels - * which which are write-enabled won't get modified. - */ - for (i = 0; i height; i++) { - span.x = x; - span.y = y + i; - _swrast_mask_rgba_span(ctx, rb, span, buf); - /* write masked row */ - rb-PutRow(ctx, rb, width, x, y + i, span.array-rgba, NULL); + if (doMasking) { + /* Convert the boolean mask to a color */ I think this would be more informative as /* Convert the boolean mask to a color value that will be packed to * produce the bitmask for the renderbuffer's format. */ but overall, this looks like a pretty nice (if tricky) way to handle all these formats. Reviewed-by: Eric Anholt e...@anholt.net Actually, I found a couple issues with the code when I was looking at it yesterday. I'll post an updated patch later. I'll update the comments too. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600g LLVM shader backend
On Mon, 2011-12-12 at 07:05 -0800, Jose Fonseca wrote: - Original Message - Hi, I have just pushed a branch containing an LLVM shader backend for r600g to my personal git repo: http://cgit.freedesktop.org/~tstellar/mesa/ r600g-llvm-shader Hi Tom, This is pretty cool stuff. The fact that you have a similar passing rate to the existing compiler makes it quite remarkable. I was aware of several closed/open-source projects to develop GPU backends for LLVM, and LunarG made a middle end, but this is the first working OpenGL stack based on a LLVM GPU backend that I'm aware. There are three main components to this branch: 1. A TGSI-LLVM converter (commit ec9bb644cf7dde055c6c3ee5b8890a2d367337e5) The goal of this converter is to give all gallium drivers a convenient way to convert from TGSI to LLVM. The interface is still evolving, and I'm interested in getting some feedback for it. The interface looks a good start, but I'd like it to be even more overridable. I don't even think there's anything GPU specific there -- I also had some plans to do TGSI translation in llvmpipe in two stages: first TGSI - high-level LLVM IR w/ custom gallivm/tgsi intrinsics - low-level LLVM IR w/ x86 SIMD instrinsincs, to allow optimizations and other decisions to happen at a higher level, before starting to emit lower-level code. What else would you like to see overridable? I think it might be nice to map TGSI opcodes to C functions rather than intrinsic strings. So I'd like us to have a flexible hierarchy of TGSI translators that can be shared for GPU/CPUs alike. BTW, I'd prefer that all reusable Gallium+LLVM code (be it meant for GPU or CPU) lives in src/gallium/auxiliary/gallivm , as it make code maintenance and build integration simpler. So tgsi_llvm.c should be moved into gallivm. Also, beware that the ability to build core gallium/mesa without LLVM must be preserved. (Having particular drivers to have hard dependencies on LLVM is of course at discretion of drivers developers though.) 2. Changes to gallivm so that code can be shared between it and the TGSI-LLVM converter. These changes are attached, please review. I'll review them separately. 3. An LLVM backend for r600g. This backend is mostly based on AMD's AMDIL LLVM backend for OpenCL with a few changes added for emitting machine code. Currently, it passes about 99% of the piglit tests that pass with the current r600g shader backend. Most of the failures are due to some unimplemented texture instructions. Indirect addressing is also missing from the LLVM backend, and it relies on the current r600g shader code to do this. There's a 30K line file src/gallium/drivers/radeon/macrodb_gen.h . Is this generated code? I'm pretty sure this can be removed. I think it's only useful for generating AMDIL assembly, but I need to examine it more closely to make sure. Also, maybe it would make sense to have amdil backend distributed separately from mesa, as it looks like a component that has other consumers beyond mesa/gallium/r600g, right? Eventually, the AMDIL backend will be distributed as a part of llvm [1], but we still have a lot of work to do to make that happen. The r600g backend is basically a subclass of the AMDIL backend, so if the AMDIL backend is in LLVM the r600g backend would probably have to be too. The AMDIL code is most likely to stay in Mesa until it's upstream in LLVM. I think it does make sense to explore having some sort of libAMDIL, but whether or not that happens depends a lot on what other people want to do with the AMDIL backend. In reality, the LLVM backend does not emit actual machine code, but rather a byte stream that is converted to struct r600_bytecode. The final transformations are done by r600_asm.c, just like in the current shader backend. The LLVM backend is not optimized for VLIW, and it only emits one instruction per group. The optimizations in r600_asm.c are able to do some instruction packing, but the resulting code is not yet as good as the current backend. Why is the result code worse: due to limitations in the assembler, in the AMDIL LLVM backend, or in LLVM itself? I guess it's due to limitations in the assembler. When the code is translated from TGSI, the vector instructions fit much better into the VLIW architecture, so the lack of a proper assembler is not as noticeable, but the r600g LLVM backend assumes non-VLIW hardware, which makes the lack of a good assembler really noticeable. The main motivation for this LLVM backend is to help bring compute/OpenCL support to r600g by making it easier to support different compiler frontends. I don't have a concrete plan for integrating this into mainline Mesa yet, but I don't expect it to be in the next release. I would really like to make it compatible with LLVM 3.0 before it gets merged (it
Re: [Mesa-dev] [PATCH 3/3] mesa: remove gl_renderbufer::PutMonoRow() and PutMonoValues()
On Sat, 10 Dec 2011 12:00:54 -0700, Brian Paul bri...@vmware.com wrote: The former was only used for clearing buffers. The later wasn't used anywhere! Remove them and all implementations of those functions. Reviewed-by: Eric Anholt e...@anholt.net pgpeKarsJcUw7.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: add const flags to skip MaxVarying and MaxUniform linker checks (v2)
On 12/11/2011 11:07 PM, Marek Olšák wrote: This is only temporary until a better solution is available. v2: print warnings and add gallium CAPs Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/gallium/drivers/r300/r300_screen.c |2 + src/gallium/include/pipe/p_defines.h |4 ++- src/glsl/linker.cpp| 43 --- src/mesa/main/mtypes.h |9 ++ src/mesa/state_tracker/st_extensions.c |6 5 files changed, 53 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/r300/r300_screen.c b/src/gallium/drivers/r300/r300_screen.c index e734ff2..0bae065 100644 --- a/src/gallium/drivers/r300/r300_screen.c +++ b/src/gallium/drivers/r300/r300_screen.c @@ -100,6 +100,8 @@ static int r300_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER: case PIPE_CAP_CONDITIONAL_RENDER: case PIPE_CAP_TEXTURE_BARRIER: +case PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS: +case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS: return 1; /* r300 cannot do swizzling of compressed textures. Supported otherwise. */ diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index f00077c..30f1d7f 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -465,7 +465,9 @@ enum pipe_cap { PIPE_CAP_MIN_TEXEL_OFFSET = 50, PIPE_CAP_MAX_TEXEL_OFFSET = 51, PIPE_CAP_CONDITIONAL_RENDER = 52, - PIPE_CAP_TEXTURE_BARRIER = 53 + PIPE_CAP_TEXTURE_BARRIER = 53, + PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS = 54, /* temporary */ + PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS = 55 /* temporary */ }; /** diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 3527088..b8a7126 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -1815,18 +1815,34 @@ assign_varying_locations(struct gl_context *ctx, if (ctx-API == API_OPENGLES2 || prog-Version == 100) { if (varying_vectors ctx-Const.MaxVarying) { -linker_error(prog, shader uses too many varying vectors - (%u %u)\n, - varying_vectors, ctx-Const.MaxVarying); -return false; + if (ctx-Const.GLSLSkipStrictMaxVaryingLimitCheck) { +linker_warning(prog, shader uses too many varying vectors + (%u %u), but the driver will try to optimize + them out; this is non-portable out-of-spec + behavior\n, + varying_vectors, ctx-Const.MaxVarying); + } else { +linker_error(prog, shader uses too many varying vectors + (%u %u)\n, + varying_vectors, ctx-Const.MaxVarying); +return false; + } } } else { const unsigned float_components = varying_vectors * 4; if (float_components ctx-Const.MaxVarying * 4) { -linker_error(prog, shader uses too many varying components - (%u %u)\n, - float_components, ctx-Const.MaxVarying * 4); -return false; + if (ctx-Const.GLSLSkipStrictMaxVaryingLimitCheck) { +linker_warning(prog, shader uses too many varying components + (%u %u), but the driver will try to optimize + them out; this is non-portable out-of-spec + behavior\n, + float_components, ctx-Const.MaxVarying * 4); + } else { +linker_error(prog, shader uses too many varying components + (%u %u)\n, + float_components, ctx-Const.MaxVarying * 4); +return false; + } } } @@ -1960,8 +1976,15 @@ check_resources(struct gl_context *ctx, struct gl_shader_program *prog) } if (sh-num_uniform_components max_uniform_components[i]) { - linker_error(prog, Too many %s shader uniform components, - shader_names[i]); + if (ctx-Const.GLSLSkipStrictMaxUniformLimitCheck) { +linker_warning(prog, Too many %s shader uniform components, + but the driver will try to optimize them out; + this is non-portable out-of-spec behavior\n, + shader_names[i]); + } else { +linker_error(prog, Too many %s shader uniform components, + shader_names[i]); + } } } diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index fc494f7..2073c8f 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -2829,6 +2829,15 @@ struct gl_constants * Texture borders are deprecated in GL 3.0. **/ GLboolean StripTextureBorder; + + /** +* For drivers
Re: [Mesa-dev] [PATCH] mesa: add const flags to skip MaxVarying and MaxUniform linker checks (v2)
On 12/11/2011 11:07 PM, Marek Olšák wrote: This is only temporary until a better solution is available. v2: print warnings and add gallium CAPs Thanks, Marek---this looks good to me. Reviewed-by: Kenneth Graunke kenn...@whitecape.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallivm: Fix build with llvm-3.1svn.
llvm-3.1svn r145714 moved global variables into a new TargetOptions class. TargetMachine constructor now needs a TargetOptions object as well. Signed-off-by: Vinson Lee v...@vmware.com --- src/gallium/auxiliary/gallivm/lp_bld_debug.cpp | 14 +- src/gallium/auxiliary/gallivm/lp_bld_misc.cpp |2 ++ 2 files changed, 15 insertions(+), 1 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_debug.cpp b/src/gallium/auxiliary/gallivm/lp_bld_debug.cpp index 62825a2..a50a51d 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_debug.cpp +++ b/src/gallium/auxiliary/gallivm/lp_bld_debug.cpp @@ -250,7 +250,19 @@ lp_disassemble(const void* func) return; } -#if HAVE_LLVM = 0x0300 +#if HAVE_LLVM = 0x0301 + TargetOptions options; +#if defined(DEBUG) + options.JITEmitDebugInfo = true; +#endif +#if defined(PIPE_ARCH_X86) + options.StackAlignmentOverride = 4; +#endif +#if defined(DEBUG) || defined(PROFILE) + options.NoFramePointerElim = true; +#endif + TargetMachine *TM = T-createTargetMachine(Triple, sys::getHostCPUName(), , options); +#elif HAVE_LLVM == 0x0300 TargetMachine *TM = T-createTargetMachine(Triple, sys::getHostCPUName(), ); #else TargetMachine *TM = T-createTargetMachine(Triple, ); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp index 41a..fe7616b 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp @@ -69,6 +69,7 @@ lp_register_oprofile_jit_event_listener(LLVMExecutionEngineRef EE) extern C void lp_set_target_options(void) { +#if HAVE_LLVM = 0x0300 #if defined(DEBUG) #if HAVE_LLVM = 0x0207 llvm::JITEmitDebugInfo = true; @@ -102,6 +103,7 @@ lp_set_target_options(void) #if 0 llvm::UnsafeFPMath = true; #endif +#endif /* HAVE_LLVM = 0x0300 */ #if HAVE_LLVM 0x0209 /* -- 1.7.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] Add mismatch check for glGetTexImage or it will return -1 and may lead to segment fault.
From: Jian Zhao jian.j.z...@intel.com --- src/mesa/main/texgetimage.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c index ae0d51f..3f24187 100644 --- a/src/mesa/main/texgetimage.c +++ b/src/mesa/main/texgetimage.c @@ -708,6 +708,14 @@ getteximage_error_check(struct gl_context *ctx, GLenum target, GLint level, return GL_TRUE; } + if (!_mesa_is_legal_format_and_type(ctx, format, type)) { + /*GL_INVALID_OPERATION is generated by a format/type + * mismatch (see the 1.2 spec page 94, sec 3.6.4.) + */ + _mesa_error(ctx, GL_INVALID_OPERATION, glGetTexImage(target)); + return GL_TRUE; + } + baseFormat = _mesa_get_format_base_format(texImage-TexFormat); /* Make sure the requested image format is compatible with the -- 1.7.0.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev