Re: [Mesa-dev] [PATCH 2/2] vdpau: Handle destination rectangles correctly

2011-12-12 Thread Christian König

On 11.12.2011 07:13, Younes Manton wrote:

On Tue, Dec 6, 2011 at 4:51 PM, Andy Furnissandy...@ukfsn.org  wrote:

Maarten Lankhorst wrote:

The brokenness in vlVdpVideoMixerRender was compensating for
brokenness in vlVdpPresentationQueueDisplay, so fix both at
the same time.


These fix the two remaining issues (aspect not maintained when fullscreen
and subtitle position getting changed when toggling back from fullscreen to
windowed) I had with R600 + mplayer + -vo vdpau (sw decode).

Can you provide a tested-by line for this and any other patches you've
tried that you can confirm don't regress? That way I can push the ones
that don't require discussion.
It is at least breaking the XvMC sub-picture case, and if I understand 
Maartens changes correctly he mixed the destination rectangle in source 
coordinate system (which is only used for XvMC sub-pictures) and the 
destination rectangle in destination coordinate system (which should be 
used for VDPAU and practically everything else).


Currently looking into it and trying to figure out how to reproduce the 
bug that he tried to fix in the first place.


Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24

2011-12-12 Thread Jose Fonseca
I saw this email yesterday, but did not have time to reply.

- Original Message -
 On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net wrote:
  On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák mar...@gmail.com
  wrote:
  unpack_float_z_Z24_X8 fails with -O2 in:
  - fbo-blit-d24s8
  - fbo-depth-sample-compare
  - fbo-readpixels-depth-formats
  - glean/depthStencil
 
  With -O0, it works fine.
 
  I am removing all the assertions. There's not much point in having
  them,
  is there?
 
  Is -ffast-math involved at all?
 
 Yes, -fno-fast-math makes the problem go away.
 
 
  At a guess, replacing * scale with / (float)0xff makes the
  failure happen either way?
 
 Yes. Only using GLdouble fixes it.
 
 It makes sense. The mantissa without the sign has 23 bits, but 24
 bits
 are required to exactly represent 0xff if I am not mistaken.

I'm afraid this is wrong: single precision floating point can describe 24bits 
uints (and therefore unorms) without any loss, because although the mantissa 
has 23bits, the mantissa is only used to represent all but the most significant 
bit, ie., there's an implicit 24th bit always set to one.

The fact that -fno-fast-math makes the problem go away is yet another clear 
proof that the issue is _not_ lack of precision of single-precision floating 
point -- as -fno-fast-math will still use single-precision floating point 
numbers.  Actually, fno-fast-math, it will ensure that all intermediate steps 
are done in single precision instead of higher intel x87 80bit floating points, 
on the 32bit x86 architecture.  And I suspect the problem here is that the 
rounding w/ x80 yields wrong results.


I also disagree that using double precision is a good solution. Either 
fast-math serves our purposes, or it doesn't.  Using double precision to 
work-around issues with single-precision fast-math is the *worst* thing we 
could do.


Does the assertion failure even happen on 64bits? I doubt, as x64 ABI 
establishes the use of sse for all floating point, and x87 80bit floating point 
will never be used. So effectively this is making all archs slower.


Honestly, I think the patch should be recalled.  And we should consider 
disabling fast-math.  And maybe enabling -mfpmath=sse on 32bits x86 (i.e., 
require sse).


Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24

2011-12-12 Thread Marek Olšák
On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca jfons...@vmware.com wrote:
 I saw this email yesterday, but did not have time to reply.

 - Original Message -
 On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net wrote:
  On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák mar...@gmail.com
  wrote:
  unpack_float_z_Z24_X8 fails with -O2 in:
  - fbo-blit-d24s8
  - fbo-depth-sample-compare
  - fbo-readpixels-depth-formats
  - glean/depthStencil
 
  With -O0, it works fine.
 
  I am removing all the assertions. There's not much point in having
  them,
  is there?
 
  Is -ffast-math involved at all?

 Yes, -fno-fast-math makes the problem go away.

 
  At a guess, replacing * scale with / (float)0xff makes the
  failure happen either way?

 Yes. Only using GLdouble fixes it.

 It makes sense. The mantissa without the sign has 23 bits, but 24
 bits
 are required to exactly represent 0xff if I am not mistaken.

 I'm afraid this is wrong: single precision floating point can describe 24bits 
 uints (and therefore unorms) without any loss, because although the mantissa 
 has 23bits, the mantissa is only used to represent all but the most 
 significant bit, ie., there's an implicit 24th bit always set to one.

 The fact that -fno-fast-math makes the problem go away is yet another clear 
 proof that the issue is _not_ lack of precision of single-precision floating 
 point -- as -fno-fast-math will still use single-precision floating point 
 numbers.  Actually, fno-fast-math, it will ensure that all intermediate steps 
 are done in single precision instead of higher intel x87 80bit floating 
 points, on the 32bit x86 architecture.  And I suspect the problem here is 
 that the rounding w/ x80 yields wrong results.


 I also disagree that using double precision is a good solution. Either 
 fast-math serves our purposes, or it doesn't.  Using double precision to 
 work-around issues with single-precision fast-math is the *worst* thing we 
 could do.


 Does the assertion failure even happen on 64bits? I doubt, as x64 ABI 
 establishes the use of sse for all floating point, and x87 80bit floating 
 point will never be used. So effectively this is making all archs slower.


 Honestly, I think the patch should be recalled.  And we should consider 
 disabling fast-math.  And maybe enabling -mfpmath=sse on 32bits x86 (i.e., 
 require sse).

You're probably right. Feel free to revert  fix the issue in some other way.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24

2011-12-12 Thread Jose Fonseca


- Original Message -
 On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca jfons...@vmware.com
 wrote:
  I saw this email yesterday, but did not have time to reply.
 
  - Original Message -
  On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net
  wrote:
   On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák
   mar...@gmail.com
   wrote:
   unpack_float_z_Z24_X8 fails with -O2 in:
   - fbo-blit-d24s8
   - fbo-depth-sample-compare
   - fbo-readpixels-depth-formats
   - glean/depthStencil
  
   With -O0, it works fine.
  
   I am removing all the assertions. There's not much point in
   having
   them,
   is there?
  
   Is -ffast-math involved at all?
 
  Yes, -fno-fast-math makes the problem go away.
 
  
   At a guess, replacing * scale with / (float)0xff makes
   the
   failure happen either way?
 
  Yes. Only using GLdouble fixes it.
 
  It makes sense. The mantissa without the sign has 23 bits, but 24
  bits
  are required to exactly represent 0xff if I am not mistaken.
 
  I'm afraid this is wrong: single precision floating point can
  describe 24bits uints (and therefore unorms) without any loss,
  because although the mantissa has 23bits, the mantissa is only
  used to represent all but the most significant bit, ie., there's
  an implicit 24th bit always set to one.
 
  The fact that -fno-fast-math makes the problem go away is yet
  another clear proof that the issue is _not_ lack of precision of
  single-precision floating point -- as -fno-fast-math will still
  use single-precision floating point numbers.  Actually,
  fno-fast-math, it will ensure that all intermediate steps are done
  in single precision instead of higher intel x87 80bit floating
  points, on the 32bit x86 architecture.  And I suspect the problem
  here is that the rounding w/ x80 yields wrong results.
 
 
  I also disagree that using double precision is a good solution.
  Either fast-math serves our purposes, or it doesn't.  Using double
  precision to work-around issues with single-precision fast-math is
  the *worst* thing we could do.
 
 
  Does the assertion failure even happen on 64bits? I doubt, as x64
  ABI establishes the use of sse for all floating point, and x87
  80bit floating point will never be used. So effectively this is
  making all archs slower.
 
 
  Honestly, I think the patch should be recalled.  And we should
  consider disabling fast-math.  And maybe enabling -mfpmath=sse on
  32bits x86 (i.e., require sse).
 
 You're probably right. Feel free to revert  fix the issue in some
 other way.

OK.

Could you please confirm whether the assertions failures you saw were on 32bits 
or 64bits mode?

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24

2011-12-12 Thread Marek Olšák
On Mon, Dec 12, 2011 at 2:10 PM, Jose Fonseca jfons...@vmware.com wrote:


 - Original Message -
 On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca jfons...@vmware.com
 wrote:
  I saw this email yesterday, but did not have time to reply.
 
  - Original Message -
  On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net
  wrote:
   On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák
   mar...@gmail.com
   wrote:
   unpack_float_z_Z24_X8 fails with -O2 in:
   - fbo-blit-d24s8
   - fbo-depth-sample-compare
   - fbo-readpixels-depth-formats
   - glean/depthStencil
  
   With -O0, it works fine.
  
   I am removing all the assertions. There's not much point in
   having
   them,
   is there?
  
   Is -ffast-math involved at all?
 
  Yes, -fno-fast-math makes the problem go away.
 
  
   At a guess, replacing * scale with / (float)0xff makes
   the
   failure happen either way?
 
  Yes. Only using GLdouble fixes it.
 
  It makes sense. The mantissa without the sign has 23 bits, but 24
  bits
  are required to exactly represent 0xff if I am not mistaken.
 
  I'm afraid this is wrong: single precision floating point can
  describe 24bits uints (and therefore unorms) without any loss,
  because although the mantissa has 23bits, the mantissa is only
  used to represent all but the most significant bit, ie., there's
  an implicit 24th bit always set to one.
 
  The fact that -fno-fast-math makes the problem go away is yet
  another clear proof that the issue is _not_ lack of precision of
  single-precision floating point -- as -fno-fast-math will still
  use single-precision floating point numbers.  Actually,
  fno-fast-math, it will ensure that all intermediate steps are done
  in single precision instead of higher intel x87 80bit floating
  points, on the 32bit x86 architecture.  And I suspect the problem
  here is that the rounding w/ x80 yields wrong results.
 
 
  I also disagree that using double precision is a good solution.
  Either fast-math serves our purposes, or it doesn't.  Using double
  precision to work-around issues with single-precision fast-math is
  the *worst* thing we could do.
 
 
  Does the assertion failure even happen on 64bits? I doubt, as x64
  ABI establishes the use of sse for all floating point, and x87
  80bit floating point will never be used. So effectively this is
  making all archs slower.
 
 
  Honestly, I think the patch should be recalled.  And we should
  consider disabling fast-math.  And maybe enabling -mfpmath=sse on
  32bits x86 (i.e., require sse).

 You're probably right. Feel free to revert  fix the issue in some
 other way.

 OK.

 Could you please confirm whether the assertions failures you saw were on 
 32bits or 64bits mode?

32-bit.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/11] gallium: interface changes necessary to implement transform feedback (v5)

2011-12-12 Thread Jose Fonseca
- Original Message -
 From: Marek Olšák mar...@gmail.com
 
 Namely:
 - EXT_transform_feedback
 - ARB_transform_feedback2
 - ARB_transform_feedback_instanced
 
 The old interface was not useful for OpenGL and had to be reworked.
 
 This interface was originally designed for OpenGL, but additional
 changes have been made in order to make st/d3d1x support easier.
 
 The most notable change is the stream-out info must be linked
 with a vertex or geometry shader and cannot be set independently.
 This is due to limitations of existing hardware (special shader
 instructions must be used to write into stream-out buffers),
 and it's also how OpenGL works (stream outputs must be specified
 prior to linking shaders).
 
 Other than that, each stream output buffer has a view into it that
 internally maintains the number of bytes which have been written
 into it. (one buffer can be bound in several different transform
 feedback objects in OpenGL, so we must be able to have several views
 around) The set_stream_output_targets function contains a parameter
 saying whether new data should be appended or not.
 
 Also, the view can optionally be used to provide the vertex
 count for draw_vbo. Note that the count is supposed to be stored
 in device memory and the CPU never gets to know its value.
 
 OpenGL way | Gallium way
 
 BeginTF= set_so_targets(append_bitmask = 0)
 PauseTF= set_so_targets(num_targets = 0)
 ResumeTF   = set_so_targets(append_bitmask = ~0)
 EndTF  = set_so_targets(num_targets = 0)
 DrawTF = use pipe_draw_info::count_from_stream_output
 
 v2: * removed the reset_stream_output_targets function
 * added a parameter append_bitmask to set_stream_output_targets,
   each bit specifies whether new data should be appended to each
   buffer or not.
 v3: * added PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME for ARB_tfb2,
   note that the draw-auto subset is always required (for d3d10),
   only the pause/resume functionality is limited if the CAP is
   not
   advertised
 v4: * update gallium/docs
 v5: * compactified struct pipe_stream_output_info, updated dump/trace
[...]

Thanks for the update. Just a minor suggestion:

 @@ -267,6 +268,24 @@ void trace_dump_shader_state(const struct
 pipe_shader_state *state)
 trace_dump_string(str);
 trace_dump_member_end();
  
 +   trace_dump_member_begin(stream_output);
 +   trace_dump_struct_begin(pipe_stream_output_info);
 +   trace_dump_member(uint, state-stream_output, num_outputs);
 +   trace_dump_member(uint, state-stream_output, stride);
 +   trace_dump_array_begin();
 +   for(i = 0; i  Elements(state-stream_output.output); ++i) {

It should be state-stream_output.num_outputs IIUC.

Other than that, 

Reviewed-By: Jose Fonseca jfons...@vmware.com

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/11] mesa: implement DrawTransformFeedback from ARB_transform_feedback2

2011-12-12 Thread Jose Fonseca
- Original Message -
 From: Marek Olšák mar...@gmail.com
 
 It's like DrawArrays, but the count is taken from a transform
 feedback
 object.
 
 This removes DrawTransformFeedback from dd_function_table and adds
 the same
 function to GLvertexformat (with the function parameters matching
 GL).
 
 The vbo_draw_func callback has a new parameter
 struct gl_transform_feedback_object *tfb_vertcount.
 
 The rest of the code just validates states and forwards the transform
 feedback object into vbo_draw_func.

I ventured reviewing this too, and I have a question: the unfinished code in 
master had comments saying

   /*
* Get number of vertices in obj's feedback buffer.
* Call ctx-Exec.DrawArrays(mode, 0, count);
*/

which seems simpler than passing the new tfb_vertcount parameter everywhere, 
just for the sake of obtaining count, so what's the rationale for doing this 
differently here?

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600g LLVM shader backend

2011-12-12 Thread Jose Fonseca
- Original Message -
 Hi,
 
 I have just pushed a branch containing an LLVM shader backend for
 r600g to my
 personal git repo:
 
 http://cgit.freedesktop.org/~tstellar/mesa/ r600g-llvm-shader

Hi Tom,

This is pretty cool stuff.  The fact that you have a similar passing rate to 
the existing compiler makes it quite remarkable.  I was aware of several 
closed/open-source projects to develop GPU backends for LLVM, and LunarG made a 
middle end, but this is the first working OpenGL stack based on a LLVM GPU 
backend that I'm aware. 

 There are three main components to this branch:
 
 1. A TGSI-LLVM converter (commit
 ec9bb644cf7dde055c6c3ee5b8890a2d367337e5)

 The goal of this converter is to give all gallium drivers a
 convenient
 way to convert from TGSI to LLVM.  The interface is still evolving,
 and I'm interested in getting some feedback for it.

The interface looks a good start, but I'd like it to be even more overridable.  
I don't even think there's anything GPU specific there -- I also had some plans 
to do TGSI translation in llvmpipe in two stages: first TGSI - high-level LLVM 
IR w/ custom gallivm/tgsi intrinsics - low-level LLVM IR w/ x86 SIMD 
instrinsincs, to allow optimizations and other decisions to happen at a higher 
level, before starting to emit lower-level code.

So I'd like us to have a flexible hierarchy of TGSI translators that can be 
shared for GPU/CPUs alike.

BTW, I'd prefer that all reusable Gallium+LLVM code (be it meant for GPU or 
CPU) lives in src/gallium/auxiliary/gallivm , as it make code maintenance and 
build integration simpler.  So tgsi_llvm.c should be moved into gallivm.  Also, 
beware that the ability to build core gallium/mesa without LLVM must be 
preserved. (Having particular drivers to have hard dependencies on LLVM is of 
course at discretion of drivers developers though.) 

 2. Changes to gallivm so that code can be shared between it and
 the TGSI-LLVM converter.  These changes are attached, please review.

I'll review them separately.

 3. An LLVM backend for r600g.
 
 This backend is mostly based on AMD's AMDIL LLVM backend for OpenCL
 with a
 few changes added for emitting machine code.  Currently, it passes
 about
 99% of the piglit tests that pass with the current r600g shader
 backend.
 Most of the failures are due to some unimplemented texture
 instructions.
 Indirect addressing is also missing from the LLVM backend, and it
 relies
 on the current r600g shader code to do this.

There's a 30K line file src/gallium/drivers/radeon/macrodb_gen.h . Is this 
generated code?

Also, maybe it would make sense to have amdil backend distributed separately 
from mesa, as it looks like a component that has other consumers beyond 
mesa/gallium/r600g, right?

 In reality, the LLVM backend does not emit actual machine code,
 but rather a byte stream that is converted to struct r600_bytecode.
 The final transformations are done by r600_asm.c, just like in the
 current
 shader backend.  The LLVM backend is not optimized for VLIW, and it
 only
 emits one instruction per group.  The optimizations in r600_asm.c are
 able to do some instruction packing, but the resulting code is not
 yet
 as good as the current backend.

Why is the result code worse: due to limitations in the assembler, in the AMDIL 
LLVM backend, or in LLVM itself?

 The main motivation for this LLVM backend is to help bring
 compute/OpenCL
 support to r600g by making it easier to support different compiler
 frontends.  I don't have a concrete plan for integrating this into
 mainline Mesa yet, but I don't expect it to be in the next release.
 I would really like to make it compatible with LLVM 3.0 before it
 gets
 merged (it only works with LLVM 2.9 now), but if compute support
 evolves
 quickly, I might be tempted to push the 2.9 version into the master
 branch.

What are the state trackers that you envision this will use? (e.g., Are you 
targeting clover? do you hope this will be the default compiler backend for 
Mesa? Or is Mesa/Gallium just a way to test AMDIL backend?)

I'm also interested in your general opinion on using LLVM for GPU.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/15] gallivm: Move tgsi_soa helper macros to header file

2011-12-12 Thread Jose Fonseca


- Original Message -
 ---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi.h |   12 
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |   14
  --
  2 files changed, 12 insertions(+), 14 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
 b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
 index 554b4cb..1258620 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
 @@ -51,6 +51,18 @@
  
  #define LP_MAX_INSTRUCTIONS 256
  
 +#define FOR_EACH_CHANNEL( CHAN )\
 +   for (CHAN = 0; CHAN  NUM_CHANNELS; CHAN++)
 +

Tom,

Symbols that are mean to be used by more that one .c/cpp file should have an 
unique name prefix to avoid name clashes.

For this particular case, these are not LLVM related, so it would make sense to 
put these macros somewhere in src/gallium/auxiliary/tgsi/ with a TGSI_ 
prefix so they can be used  by all TGSI translation code.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24

2011-12-12 Thread Jose Fonseca
- Original Message -
 On Mon, Dec 12, 2011 at 2:10 PM, Jose Fonseca jfons...@vmware.com
 wrote:
 
 
  - Original Message -
  On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca
  jfons...@vmware.com
  wrote:
   I saw this email yesterday, but did not have time to reply.
  
   - Original Message -
   On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt e...@anholt.net
   wrote:
On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák
mar...@gmail.com
wrote:
unpack_float_z_Z24_X8 fails with -O2 in:
- fbo-blit-d24s8
- fbo-depth-sample-compare
- fbo-readpixels-depth-formats
- glean/depthStencil
   
With -O0, it works fine.
   
I am removing all the assertions. There's not much point in
having
them,
is there?
   
Is -ffast-math involved at all?
  
   Yes, -fno-fast-math makes the problem go away.
  
   
At a guess, replacing * scale with / (float)0xff
makes
the
failure happen either way?
  
   Yes. Only using GLdouble fixes it.
  
   It makes sense. The mantissa without the sign has 23 bits, but
   24
   bits
   are required to exactly represent 0xff if I am not
   mistaken.
  
   I'm afraid this is wrong: single precision floating point can
   describe 24bits uints (and therefore unorms) without any loss,
   because although the mantissa has 23bits, the mantissa is only
   used to represent all but the most significant bit, ie., there's
   an implicit 24th bit always set to one.
  
   The fact that -fno-fast-math makes the problem go away is yet
   another clear proof that the issue is _not_ lack of precision of
   single-precision floating point -- as -fno-fast-math will still
   use single-precision floating point numbers.  Actually,
   fno-fast-math, it will ensure that all intermediate steps are
   done
   in single precision instead of higher intel x87 80bit floating
   points, on the 32bit x86 architecture.  And I suspect the
   problem
   here is that the rounding w/ x80 yields wrong results.
  
  
   I also disagree that using double precision is a good solution.
   Either fast-math serves our purposes, or it doesn't.  Using
   double
   precision to work-around issues with single-precision fast-math
   is
   the *worst* thing we could do.
  
  
   Does the assertion failure even happen on 64bits? I doubt, as
   x64
   ABI establishes the use of sse for all floating point, and x87
   80bit floating point will never be used. So effectively this is
   making all archs slower.
  
  
   Honestly, I think the patch should be recalled.  And we should
   consider disabling fast-math.  And maybe enabling -mfpmath=sse
   on
   32bits x86 (i.e., require sse).
 
  You're probably right. Feel free to revert  fix the issue in some
  other way.
 
  OK.
 
  Could you please confirm whether the assertions failures you saw
  were on 32bits or 64bits mode?
 
 32-bit.

Thanks.

In that case I propose dropping fast-math, at least on x86 32bits, as it makes 
(in particular the unorm24 - f32 conversions happen in a lot of places in 
Mesa's source tree) and some times worse code [1], and simply use the subset of 
finer grained options  [2]:

  -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, 
-fno-rounding-math, -fno-signaling-nans and -fcx-limited-range. 

which really meets our needs.

About -mfpmath=sse on 32bits, although I believe that depending on sse would be 
alright, it looks like -mfpmath=sse  it's not a clear winner on 32bits , 
because calls to libc still need to use x87 registers.


Jose


[1] http://programerror.com/2009/09/when-gccs-ffast-math-isnt/
[2] http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: fix an out-of-bounds access in prog_print.c

2011-12-12 Thread Brian Paul
On Sun, Dec 11, 2011 at 10:30 PM, Marek Olšák mar...@gmail.com wrote:
 ---
  src/mesa/program/prog_print.c |    1 +
  1 files changed, 1 insertions(+), 0 deletions(-)

 diff --git a/src/mesa/program/prog_print.c b/src/mesa/program/prog_print.c
 index e9bf3aa..b4d142f 100644
 --- a/src/mesa/program/prog_print.c
 +++ b/src/mesa/program/prog_print.c
 @@ -112,6 +112,7 @@ arb_input_attrib_string(GLint index, GLenum progType)
       vertex.texcoord[5],
       vertex.texcoord[6],
       vertex.texcoord[7],
 +      vertex.pointsize,
       vertex.attrib[0],
       vertex.attrib[1],
       vertex.attrib[2],
 --
 1.7.4.1



Reviewed-by: Brian Paul bri...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600g LLVM shader backend

2011-12-12 Thread Tom Stellard
On Sat, 2011-12-10 at 03:16 -0800, Stéphane Marchesin wrote:
 On Fri, Dec 9, 2011 at 14:15, Tom Stellard tstel...@gmail.com wrote:
  Hi,
 
  I have just pushed a branch containing an LLVM shader backend for r600g to 
  my
  personal git repo:
 
  http://cgit.freedesktop.org/~tstellar/mesa/ r600g-llvm-shader
 
  There are three main components to this branch:
 
  1. A TGSI-LLVM converter (commit ec9bb644cf7dde055c6c3ee5b8890a2d367337e5)
 
  The goal of this converter is to give all gallium drivers a convenient
  way to convert from TGSI to LLVM.  The interface is still evolving,
  and I'm interested in getting some feedback for it.
 
  2. Changes to gallivm so that code can be shared between it and
  the TGSI-LLVM converter.  These changes are attached, please review.
 
  3. An LLVM backend for r600g.
 
 
 I can't help but notice the additional license restrictions in that
 code -- is this something that could be sorted out?
 

The AMDIL portion of the code (basically any file with an AMDIL prefix
plus the macrod* files) is currently licensed under the 3-clause BSD
license with an addition clause for complying with export laws.  I need
to get some more information from our legal team about the export law
clause before I discuss it further, but I'm hoping it's something we can
work out.

The non-AMDIL portion of the code is still licensed under the MIT
license and has that license included in the file.

-Tom

 Stéphane
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24

2011-12-12 Thread Jose Fonseca


- Original Message -
 On Mon, Dec 12, 2011 at 8:20 AM, Jose Fonseca jfons...@vmware.com
 wrote:
  - Original Message -
  On Mon, Dec 12, 2011 at 2:10 PM, Jose Fonseca
  jfons...@vmware.com
  wrote:
  
  
   - Original Message -
   On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca
   jfons...@vmware.com
   wrote:
I saw this email yesterday, but did not have time to reply.
   
- Original Message -
On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt
e...@anholt.net
wrote:
 On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák
 mar...@gmail.com
 wrote:
 unpack_float_z_Z24_X8 fails with -O2 in:
 - fbo-blit-d24s8
 - fbo-depth-sample-compare
 - fbo-readpixels-depth-formats
 - glean/depthStencil

 With -O0, it works fine.

 I am removing all the assertions. There's not much point
 in
 having
 them,
 is there?

 Is -ffast-math involved at all?
   
Yes, -fno-fast-math makes the problem go away.
   

 At a guess, replacing * scale with / (float)0xff
 makes
 the
 failure happen either way?
   
Yes. Only using GLdouble fixes it.
   
It makes sense. The mantissa without the sign has 23 bits,
but
24
bits
are required to exactly represent 0xff if I am not
mistaken.
   
I'm afraid this is wrong: single precision floating point can
describe 24bits uints (and therefore unorms) without any
loss,
because although the mantissa has 23bits, the mantissa is
only
used to represent all but the most significant bit, ie.,
there's
an implicit 24th bit always set to one.
   
The fact that -fno-fast-math makes the problem go away is yet
another clear proof that the issue is _not_ lack of precision
of
single-precision floating point -- as -fno-fast-math will
still
use single-precision floating point numbers.  Actually,
fno-fast-math, it will ensure that all intermediate steps are
done
in single precision instead of higher intel x87 80bit
floating
points, on the 32bit x86 architecture.  And I suspect the
problem
here is that the rounding w/ x80 yields wrong results.
   
   
I also disagree that using double precision is a good
solution.
Either fast-math serves our purposes, or it doesn't.  Using
double
precision to work-around issues with single-precision
fast-math
is
the *worst* thing we could do.
   
   
Does the assertion failure even happen on 64bits? I doubt, as
x64
ABI establishes the use of sse for all floating point, and
x87
80bit floating point will never be used. So effectively this
is
making all archs slower.
   
   
Honestly, I think the patch should be recalled.  And we
should
consider disabling fast-math.  And maybe enabling
-mfpmath=sse
on
32bits x86 (i.e., require sse).
  
   You're probably right. Feel free to revert  fix the issue in
   some
   other way.
  
   OK.
  
   Could you please confirm whether the assertions failures you saw
   were on 32bits or 64bits mode?
 
  32-bit.
 
  Thanks.
 
  In that case I propose dropping fast-math, at least on x86 32bits,
  as it makes (in particular the unorm24 - f32 conversions happen
  in a lot of places in Mesa's source tree) and some times worse
  code [1], and simply use the subset of finer grained options  [2]:
 
   -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only,
   -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range.
 
  which really meets our needs.
 
  About -mfpmath=sse on 32bits, although I believe that depending on
  sse would be alright, it looks like -mfpmath=sse  it's not a clear
  winner on 32bits , because calls to libc still need to use x87
  registers.
 
 
  Jose
 
 
  [1] http://programerror.com/2009/09/when-gccs-ffast-math-isnt/
  [2]
  http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html
 
 
 Thanks for digging into this, guys.
 
 I'm happy to drop -ffast-math (or use -fno-fast-math) but it would be
 interesting to do at least a few before/after comparisons just to
 make
 sure there's no surprises in performance.

Yep, that was my plan.

 In any case, don't we still need to use double-precision for Z32
 packing/unpacking?  I ran into a failure in _mesa_pack_float_z_row()
 for Z32 on Saturday (debug build on x64).

Agreed. I don't know an accurate  efficient way converting unorm32 = f32 
without doubles.

The trick I use in llvmpipe to convert f32 to unorm32, in 
src/gallium/auxiliary/gallivm/lp_bld_conv.c , only gets 31bits right (i.e, 
drops one bit from values near 0.0):
 
  /*
   * The destination exceeds what can be represented in the floating point.
   * So multiply by the largest power two we get away with, and when
   * subtract the most significant bit to rescale to normalized values.
   *
   * The largest power of two factor we can get away is
   * (1 

Re: [Mesa-dev] shaders for g3dvl compositor (was vdpau: Handle destination rectangles correctly)

2011-12-12 Thread Christian König

Hi Maarten,

On 12.12.2011 13:25, Maarten Lankhorst wrote:
The problem is destination rectangle can be different for video and 
subpictures, this breaks mplayer OSD if you handle it incorrectly. 
ftp://download.nvidia.com/XFree86/vdpau/doxygen/html/group___vdp_video_mixer.html#ga62bf3bf8c5f01322a03b07065c5ea3db 
has info on how it works. destination video rectangle is relative to 
target image, not source.
Yeah, but that is what the destination rectangle in the render call is 
good for. So you actually don't need to change anything here you just 
need to another parameter and everything starts to work fine.



Get mplayer to play any video fullscreen with -vo vdpau, if things work 
correctly black borders should appear when aspect ratio of fullscreen doesn't 
match aspect ratio of video. The onscreen display should also be centered and 
scaled to the entire screen.
Ah, well I see the problem now. Let's split the discussion here a bit. 
and I'm also CCing Andy, Younes and the mailing list.


Please take a look at the attached patches, they should fix the 
mentioned problems, but without breaking XvMC or anything else (at least 
I hope so).


@Andy: Could you please confirm that they are also working? (They should 
apply ontop of current master).


Thanks,
Christian.
From d074aab1602c88ca18bb9472c6ab2afd99f2b53d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= deathsim...@vodafone.de
Date: Mon, 12 Dec 2011 15:27:34 +0100
Subject: [PATCH 1/3] g3dvl/compositor: improve dirty area handling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Take viewport and scissors into account and make
the dirty area a parameter instead of a member.

Signed-off-by: Christian König deathsim...@vodafone.de
---
 src/gallium/auxiliary/vl/vl_compositor.c   |  104 
 src/gallium/auxiliary/vl/vl_compositor.h   |8 +-
 src/gallium/state_trackers/vdpau/mixer.c   |2 +-
 src/gallium/state_trackers/vdpau/output.c  |2 +-
 src/gallium/state_trackers/vdpau/presentation.c|4 +-
 src/gallium/state_trackers/vdpau/vdpau_private.h   |2 +
 src/gallium/state_trackers/xorg/xvmc/surface.c |4 +-
 .../state_trackers/xorg/xvmc/xvmc_private.h|2 +
 8 files changed, 81 insertions(+), 47 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_compositor.c b/src/gallium/auxiliary/vl/vl_compositor.c
index 322ef8e..98cb616 100644
--- a/src/gallium/auxiliary/vl/vl_compositor.c
+++ b/src/gallium/auxiliary/vl/vl_compositor.c
@@ -40,6 +40,9 @@
 #include vl_types.h
 #include vl_compositor.h
 
+#define MIN_DIRTY (0)
+#define MAX_DIRTY (1  15)
+
 typedef float csc_matrix[16];
 
 static void *
@@ -470,8 +473,27 @@ gen_rect_verts(struct vertex4f *vb, struct vl_compositor_layer *layer)
vb[3].w = layer-src.br.y;
 }
 
+static INLINE struct u_rect
+calc_drawn_area(struct vl_compositor *c, struct vl_compositor_layer *layer)
+{
+   struct u_rect result;
+
+   // scale
+   result.x0 = layer-dst.tl.x * c-viewport.scale[0] + c-viewport.translate[0];
+   result.y0 = layer-dst.tl.y * c-viewport.scale[1] + c-viewport.translate[1];
+   result.x1 = layer-dst.br.x * c-viewport.scale[0] + c-viewport.translate[0];
+   result.y1 = layer-dst.br.y * c-viewport.scale[1] + c-viewport.translate[1];
+
+   // and clip
+   result.x0 = MAX2(result.x0, c-scissor.minx);
+   result.y0 = MAX2(result.y0, c-scissor.miny);
+   result.x1 = MIN2(result.x1, c-scissor.maxx);
+   result.y1 = MIN2(result.y1, c-scissor.maxy);
+   return result;
+}
+
 static void
-gen_vertex_data(struct vl_compositor *c)
+gen_vertex_data(struct vl_compositor *c, struct u_rect *dirty)
 {
struct vertex4f *vb;
struct pipe_transfer *buf_transfer;
@@ -497,15 +519,18 @@ gen_vertex_data(struct vl_compositor *c)
  gen_rect_verts(vb, layer);
  vb += 4;
 
- if (layer-clearing 
- c-dirty_tl.x = layer-dst.tl.x 
- c-dirty_tl.y = layer-dst.tl.y 
- c-dirty_br.x = layer-dst.br.x 
- c-dirty_br.y = layer-dst.br.y) {
-
-// We clear the dirty area anyway, no need for clear_render_target
-c-dirty_tl.x = c-dirty_tl.y = 1.0f;
-c-dirty_br.x = c-dirty_br.y = 0.0f;
+ if (dirty  layer-clearing) {
+struct u_rect drawn = calc_drawn_area(c, layer);
+if (
+ dirty-x0 = drawn.x0 
+ dirty-y0 = drawn.y0 
+ dirty-x1 = drawn.x1 
+ dirty-y1 = drawn.y1) {
+
+   // We clear the dirty area anyway, no need for clear_render_target
+   dirty-x0 = dirty-y0 = MAX_DIRTY;
+   dirty-x1 = dirty-y1 = MIN_DIRTY;
+}
  }
   }
}
@@ -514,7 +539,7 @@ gen_vertex_data(struct vl_compositor *c)
 }
 
 static void
-draw_layers(struct vl_compositor *c)
+draw_layers(struct vl_compositor *c, struct u_rect *dirty)
 {
unsigned vb_index, i;
 
@@ -533,22 +558,25 @@ 

Re: [Mesa-dev] [PATCH] meta: rework dest image allocation in mipmap generation code

2011-12-12 Thread Eric Anholt
On Sat, 10 Dec 2011 12:00:27 -0700, Brian Paul bri...@vmware.com wrote:
 This fixes two things:
 1. If the texture object was created with glTexStorage2D, the call
to _mesa_TexImage2D() would generate INVALID_OPERATION since the
texture is marked as immutable.
 2. _mesa_TexImage2D() always frees any existing texture image memory
before allocating new memory.  That's inefficient since the existing
image is usually the right size already.

Reviewed-by: Eric Anholt e...@anholt.net


pgpNb1oPLuuMh.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/11] mesa: implement DrawTransformFeedback from ARB_transform_feedback2

2011-12-12 Thread Jose Fonseca
- Original Message -
 On 12/12/2011 02:49 PM, Jose Fonseca wrote:
  - Original Message -
  From: Marek Olšák mar...@gmail.com
 
  It's like DrawArrays, but the count is taken from a transform
  feedback
  object.
 
  This removes DrawTransformFeedback from dd_function_table and adds
  the same
  function to GLvertexformat (with the function parameters matching
  GL).
 
  The vbo_draw_func callback has a new parameter
  struct gl_transform_feedback_object *tfb_vertcount.
 
  The rest of the code just validates states and forwards the
  transform
  feedback object into vbo_draw_func.
  
  I ventured reviewing this too, and I have a question: the
  unfinished code in master had comments saying
  
 /*
  * Get number of vertices in obj's feedback buffer.
  * Call ctx-Exec.DrawArrays(mode, 0, count);
  */
  
  which seems simpler than passing the new tfb_vertcount parameter
  everywhere, just for the sake of obtaining count, so what's the
  rationale for doing this differently here?
  
 
 The count is contained in the TFB object (through its association
 with
 a pipe_stream_output_target which tracks the amount of vertices
 written
 to it) and is not supposed to be read back by the CPU because that
 would
 mean waiting.
 
 The driver will let the GPU write the count into the command buffer,
 so
 it only needs to sync command processing with TFB.

I see. Makes sense.

I have no more objections against the series FWIW.

It would be nice to fix softpipe/llvmpipe though, but IIUC they could not 
possibly have worked with Mesa before, and I don't think anybody is relying on 
that softpipe/llvmpipe functionality w/ other state trackers at this point, so 
I suppose it could wait.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl_to_tgsi: make sure copied instructions don't lose texture target.

2011-12-12 Thread Brian Paul
On Sat, Dec 10, 2011 at 11:34 AM, Dave Airlie airl...@gmail.com wrote:
 From: Dave Airlie airl...@redhat.com

 The piglit draw-pixel-with-texture was asserting in the glsl-tgsi code,
 due to 0 texture target, this makes sure the texture target is copied over
 correctly when we copy instructions around.

 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |    4 +++-
  1 files changed, 3 insertions(+), 1 deletions(-)

 diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
 b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 index 6cc655d..68eaddc 100644
 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 @@ -3786,6 +3786,7 @@ get_pixel_transfer_visitor(struct st_fragment_program 
 *fp,
     * new visitor. */
    foreach_iter(exec_list_iterator, iter, original-instructions) {
       glsl_to_tgsi_instruction *inst = (glsl_to_tgsi_instruction *)iter.get();
 +      glsl_to_tgsi_instruction *newinst;
       st_src_reg src_regs[3];

       if (inst-dst.file == PROGRAM_OUTPUT)
 @@ -3803,7 +3804,8 @@ get_pixel_transfer_visitor(struct st_fragment_program 
 *fp,
             prog-InputsRead |= BITFIELD64_BIT(src_regs[i].index);
       }

 -      v-emit(NULL, inst-op, inst-dst, src_regs[0], src_regs[1], 
 src_regs[2]);
 +      newinst = v-emit(NULL, inst-op, inst-dst, src_regs[0], src_regs[1], 
 src_regs[2]);
 +      newinst-tex_target = inst-tex_target;
    }

    /* Make modifications to fragment program info. */

Looks good to me.  I had started to look into the glDrawPixels
failures and found tex_target to be zero.  I didn't know where it was
supposed to come from though.

Reviewed-by: Brian Paul bri...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: rework dest image allocation in mipmap generation code

2011-12-12 Thread Eric Anholt
On Sat, 10 Dec 2011 12:00:27 -0700, Brian Paul bri...@vmware.com wrote:
 This fixes two things:
 1. If the texture object was created with glTexStorage2D, the call
to _mesa_TexImage2D() would generate INVALID_OPERATION since the
texture is marked as immutable.
 2. _mesa_TexImage2D() always frees any existing texture image memory
before allocating new memory.  That's inefficient since the existing
image is usually the right size already.

Actually, scratch that review.  I think I see a bug:
update_fbo_texture() is no longer called with this, so FBOs attached to
this texture may have incorrect FBO completeness after a GenerateMipmap.


pgpKThIkyrUtw.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] swrast: rewrite color buffer clearing to use Map/UnmapRenderbuffer()

2011-12-12 Thread Eric Anholt
On Sat, 10 Dec 2011 12:00:53 -0700, Brian Paul bri...@vmware.com wrote:
 -   /* Note that masking will change the color values, but only the
 -* channels for which the write mask is GL_FALSE.  The channels
 -* which which are write-enabled won't get modified.
 -*/
 -   for (i = 0; i  height; i++) {
 -  span.x = x;
 -  span.y = y + i;
 -  _swrast_mask_rgba_span(ctx, rb, span, buf);
 -  /* write masked row */
 -  rb-PutRow(ctx, rb, width, x, y + i, span.array-rgba, NULL);
 +   if (doMasking) {
 +  /* Convert the boolean mask to a color */

I think this would be more informative as

/* Convert the boolean mask to a color value that will be packed to
 * produce the bitmask for the renderbuffer's format.
 */

but overall, this looks like a pretty nice (if tricky) way to handle all
these formats.  Reviewed-by: Eric Anholt e...@anholt.net


pgpDwhlmTYsUB.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] shaders for g3dvl compositor (was vdpau: Handle destination rectangles correctly)

2011-12-12 Thread Andy Furniss

Christian König wrote:


@Andy: Could you please confirm that they are also working? (They should
apply ontop of current master).


Yes - they are working OK for me.

One thing I noticed while messing with mplayer subs/OSD is that vdpau in 
full screen positions them relative to the screen rather than the active 
picture, so they can appear over the black bars. This is the same with 
the previous patches, and also -vo gl does this.


Xvmc, xv and x11 all keep them in the active picture area - I don't know 
which, if any, is considered right or wrong behavior.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] swrast: do depth/stencil clearing with Map/UnmapRenderbuffer()

2011-12-12 Thread Brian Paul
On Mon, Dec 12, 2011 at 10:28 AM, Eric Anholt e...@anholt.net wrote:
 On Sat, 10 Dec 2011 12:00:52 -0700, Brian Paul bri...@vmware.com wrote:
 Another step toward getting rid of the renderbuffer PutRow/etc functions.

 +   if (buffers  BUFFER_DS) {
 +      struct gl_renderbuffer *rb =
 +         ctx-DrawBuffer-Attachment[BUFFER_DEPTH].Renderbuffer;
 +
 +      if ((buffers  BUFFER_DS) == BUFFER_DS  rb 
 +          _mesa_is_format_packed_depth_stencil(rb-Format)) {
 +         /* clear depth and stencil together */
 +         _swrast_clear_depth_stencil_buffer(ctx);

 I think this would be wrong for the case of

 Attachment[BUFFER_DEPTH] = depthstencil texture
 Attachment[BUFFER_STENCIL = separate stencil RB

 Granted, that's a pretty crazy setup, but I think this just needs an rb
 == Attachment[BUFFER_STENCIL].Renderbuffer check.

Will do.


 +   mapMode = GL_MAP_WRITE_BIT;
 +   if (_mesa_get_format_bits(rb-Format, GL_STENCIL_BITS)  0) {
 +      mapMode |= GL_MAP_READ_BIT;
 +   }

 You only set the READ bit if there are stencil bits...

 +   case MESA_FORMAT_S8_Z24:
 +   case MESA_FORMAT_X8_Z24:
 +   case MESA_FORMAT_Z24_S8:
 +   case MESA_FORMAT_Z24_X8:

           for (i = 0; i  height; i++) {
 -            rb-PutMonoRow(ctx, rb, width, x, y + i, clearVal16, NULL);
 +            GLuint *row = (GLuint *) map;
 +            for (j = 0; j  width; j++) {
 +               row[j] = (row[j]  mask) | clearVal;
 +            }
 +            map += rowStride;
           }
 +

 but then the S8_Z24 and X8_Z24 paths both do the same set of reads.

 +   case MESA_FORMAT_Z32_FLOAT_X24S8:
 +      /* XXX untested */
 +      {
 +         GLfloat clearVal = (GLfloat) ctx-Depth.Clear;
           for (i = 0; i  height; i++) {
 -            rb-PutMonoRow(ctx, rb, width, x, y + i, clearValue, NULL);
 +            GLfloat *row = (GLfloat *) map;
 +            for (j = 0; j  width; j++) {
 +               row[j * 2] = clearVal;
 +            }
 +            map += rowStride;
           }

 It looks good, at least.  Once I land my outstanding driver bits it
 should be easy to test.

 +/**
 + * Clear both depth and stencil values in a combined depth+stencil buffer.
 + */
 +void
 +_swrast_clear_depth_stencil_buffer(struct gl_context *ctx)
 +{

 +   case MESA_FORMAT_Z32_FLOAT_X24S8:
 +      /* XXX untested */
 +      {
 +         GLfloat zClear = (GLfloat) ctx-Depth.Clear;
 +         GLuint sClear = ctx-Stencil.Clear;
 +         for (i = 0; i  height; i++) {
 +            GLfloat *zRow = (GLfloat *) map;
 +            GLuint *sRow = (GLuint *) map;
 +            for (j = 0; j  width; j++) {
 +               zRow[j * 2 + 0] = zClear;
 +            }
 +            for (j = 0; j  width; j++) {
 +               sRow[j * 2 + 1] = sClear;
 +            }

 Missing stencil mask treatment.

Will fix.


 diff --git a/src/mesa/swrast/s_stencil.c b/src/mesa/swrast/s_stencil.c
 index 101ee50..5c5ebd4 100644
 --- a/src/mesa/swrast/s_stencil.c
 +++ b/src/mesa/swrast/s_stencil.c
 @@ -1124,121 +1124,108 @@ _swrast_write_stencil_span(struct gl_context *ctx, 
 GLint n, GLint x, GLint y,


  /**
 - * Clear the stencil buffer.
 + * Clear the stencil buffer.  If the buffer is a combined
 + * depth+stencil buffer, only the stencil bits will be touched.
   */
  void
 -_swrast_clear_stencil_buffer( struct gl_context *ctx, struct 
 gl_renderbuffer *rb )
 +_swrast_clear_stencil_buffer(struct gl_context *ctx)
  {
 +   struct gl_renderbuffer *rb =
 +      ctx-DrawBuffer-Attachment[BUFFER_STENCIL].Renderbuffer;
     const GLubyte stencilBits = ctx-DrawBuffer-Visual.stencilBits;
 -   const GLuint mask = ctx-Stencil.WriteMask[0];
 -   const GLuint invMask = ~mask;
 -   const GLuint clearVal = (ctx-Stencil.Clear  mask);
 +   const GLuint writeMask = ctx-Stencil.WriteMask[0];
 +   const GLuint clearVal = (ctx-Stencil.Clear  writeMask);
     const GLuint stencilMax = (1  stencilBits) - 1;
     GLint x, y, width, height;
 +   GLubyte *map;
 +   GLint rowStride, i, j;
 +   GLbitfield mapMode;

 +   mapMode = GL_MAP_WRITE_BIT;
 +   if ((writeMask  stencilMax) != stencilMax) {
 +      /* need to mask stencil values */
 +      mapMode |= GL_MAP_READ_BIT;
 +   }
 +   else if (_mesa_get_format_bits(rb-Format, GL_DEPTH_BITS)  0) {
 +      /* combined depth+stencil, need to mask Z values */
 +      mapMode |= GL_MAP_READ_BIT;
 +   }

 +   switch (rb-Format) {
 +   case MESA_FORMAT_S8:
 +      {
 +         GLubyte clear = clearVal  0xff;
 +         GLubyte mask = (~writeMask)  0xff;
 +         if (mask != 0) {

 mask != 0xff, right?  0 was already tested at the top, and there's that
 else case below.

I think I have that right.  mask is the complement of the stencil
WriteMask value here.  And if mask==0 it means that we're replacing
the dest values completely.  That's what the else clauses do.


 -                  _mesa_memset16((short unsigned int*) stencil, clearVal, 
 width);

 note, _mesa_memset16 is dead code now.

OK, I'll rm that later.

-Brian

Re: [Mesa-dev] [PATCH 2/3] swrast: rewrite color buffer clearing to use Map/UnmapRenderbuffer()

2011-12-12 Thread Brian Paul
On Mon, Dec 12, 2011 at 10:38 AM, Eric Anholt e...@anholt.net wrote:
 On Sat, 10 Dec 2011 12:00:53 -0700, Brian Paul bri...@vmware.com wrote:
 -   /* Note that masking will change the color values, but only the
 -    * channels for which the write mask is GL_FALSE.  The channels
 -    * which which are write-enabled won't get modified.
 -    */
 -   for (i = 0; i  height; i++) {
 -      span.x = x;
 -      span.y = y + i;
 -      _swrast_mask_rgba_span(ctx, rb, span, buf);
 -      /* write masked row */
 -      rb-PutRow(ctx, rb, width, x, y + i, span.array-rgba, NULL);
 +   if (doMasking) {
 +      /* Convert the boolean mask to a color */

 I think this would be more informative as

 /* Convert the boolean mask to a color value that will be packed to
  * produce the bitmask for the renderbuffer's format.
  */

 but overall, this looks like a pretty nice (if tricky) way to handle all
 these formats.  Reviewed-by: Eric Anholt e...@anholt.net

Actually, I found a couple issues with the code when I was looking at
it yesterday.  I'll post an updated patch later.

I'll update the comments too.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600g LLVM shader backend

2011-12-12 Thread Tom Stellard
On Mon, 2011-12-12 at 07:05 -0800, Jose Fonseca wrote:
 - Original Message -
  Hi,
  
  I have just pushed a branch containing an LLVM shader backend for
  r600g to my
  personal git repo:
  
  http://cgit.freedesktop.org/~tstellar/mesa/ r600g-llvm-shader
 
 Hi Tom,
 
 This is pretty cool stuff.  The fact that you have a similar passing rate to 
 the existing compiler makes it quite remarkable.  I was aware of several 
 closed/open-source projects to develop GPU backends for LLVM, and LunarG made 
 a middle end, but this is the first working OpenGL stack based on a LLVM GPU 
 backend that I'm aware. 
 
  There are three main components to this branch:
  
  1. A TGSI-LLVM converter (commit
  ec9bb644cf7dde055c6c3ee5b8890a2d367337e5)
 
  The goal of this converter is to give all gallium drivers a
  convenient
  way to convert from TGSI to LLVM.  The interface is still evolving,
  and I'm interested in getting some feedback for it.
 
 The interface looks a good start, but I'd like it to be even more 
 overridable.  I don't even think there's anything GPU specific there -- I 
 also had some plans to do TGSI translation in llvmpipe in two stages: first 
 TGSI - high-level LLVM IR w/ custom gallivm/tgsi intrinsics - low-level 
 LLVM IR w/ x86 SIMD instrinsincs, to allow optimizations and other decisions 
 to happen at a higher level, before starting to emit lower-level code.
 
What else would you like to see overridable?

I think it might be nice to map TGSI opcodes to C functions rather than
intrinsic strings.


 So I'd like us to have a flexible hierarchy of TGSI translators that can be 
 shared for GPU/CPUs alike.
 
 BTW, I'd prefer that all reusable Gallium+LLVM code (be it meant for GPU or 
 CPU) lives in src/gallium/auxiliary/gallivm , as it make code maintenance and 
 build integration simpler.  So tgsi_llvm.c should be moved into gallivm.  
 Also, beware that the ability to build core gallium/mesa without LLVM must be 
 preserved. (Having particular drivers to have hard dependencies on LLVM is of 
 course at discretion of drivers developers though.) 
 
  2. Changes to gallivm so that code can be shared between it and
  the TGSI-LLVM converter.  These changes are attached, please review.
 
 I'll review them separately.
 
  3. An LLVM backend for r600g.
  
  This backend is mostly based on AMD's AMDIL LLVM backend for OpenCL
  with a
  few changes added for emitting machine code.  Currently, it passes
  about
  99% of the piglit tests that pass with the current r600g shader
  backend.
  Most of the failures are due to some unimplemented texture
  instructions.
  Indirect addressing is also missing from the LLVM backend, and it
  relies
  on the current r600g shader code to do this.
 
 There's a 30K line file src/gallium/drivers/radeon/macrodb_gen.h . Is this 
 generated code?

I'm pretty sure this can be removed.  I think it's only useful for
generating AMDIL assembly, but I need to examine it more closely to make
sure.

 
 Also, maybe it would make sense to have amdil backend distributed separately 
 from mesa, as it looks like a component that has other consumers beyond 
 mesa/gallium/r600g, right?
 

Eventually, the AMDIL backend will be distributed as a part of llvm [1],
but we still have a lot of work to do to make that happen.  The r600g
backend is basically a subclass of the AMDIL backend, so if the AMDIL
backend is in LLVM the r600g backend would probably have to be too.

The AMDIL code is most likely to stay in Mesa until it's upstream in
LLVM.  I think it does make sense to explore having some sort of
libAMDIL, but whether or not that happens depends a lot on what other
people want to do with the AMDIL backend.


  In reality, the LLVM backend does not emit actual machine code,
  but rather a byte stream that is converted to struct r600_bytecode.
  The final transformations are done by r600_asm.c, just like in the
  current
  shader backend.  The LLVM backend is not optimized for VLIW, and it
  only
  emits one instruction per group.  The optimizations in r600_asm.c are
  able to do some instruction packing, but the resulting code is not
  yet
  as good as the current backend.
 
 Why is the result code worse: due to limitations in the assembler, in the 
 AMDIL LLVM backend, or in LLVM itself?
 

I guess it's due to limitations in the assembler.  When the code is
translated from TGSI, the vector instructions fit much better into the
VLIW architecture, so the lack of a proper assembler is not as
noticeable, but the r600g LLVM backend assumes non-VLIW hardware, which
makes the lack of a good assembler really noticeable.


  The main motivation for this LLVM backend is to help bring
  compute/OpenCL
  support to r600g by making it easier to support different compiler
  frontends.  I don't have a concrete plan for integrating this into
  mainline Mesa yet, but I don't expect it to be in the next release.
  I would really like to make it compatible with LLVM 3.0 before it
  gets
  merged (it 

Re: [Mesa-dev] [PATCH 3/3] mesa: remove gl_renderbufer::PutMonoRow() and PutMonoValues()

2011-12-12 Thread Eric Anholt
On Sat, 10 Dec 2011 12:00:54 -0700, Brian Paul bri...@vmware.com wrote:
 The former was only used for clearing buffers.  The later wasn't used
 anywhere!  Remove them and all implementations of those functions.

Reviewed-by: Eric Anholt e...@anholt.net


pgpeKarsJcUw7.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: add const flags to skip MaxVarying and MaxUniform linker checks (v2)

2011-12-12 Thread Ian Romanick

On 12/11/2011 11:07 PM, Marek Olšák wrote:

This is only temporary until a better solution is available.

v2: print warnings and add gallium CAPs


Reviewed-by: Ian Romanick ian.d.roman...@intel.com


---
  src/gallium/drivers/r300/r300_screen.c |2 +
  src/gallium/include/pipe/p_defines.h   |4 ++-
  src/glsl/linker.cpp|   43 ---
  src/mesa/main/mtypes.h |9 ++
  src/mesa/state_tracker/st_extensions.c |6 
  5 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/r300/r300_screen.c 
b/src/gallium/drivers/r300/r300_screen.c
index e734ff2..0bae065 100644
--- a/src/gallium/drivers/r300/r300_screen.c
+++ b/src/gallium/drivers/r300/r300_screen.c
@@ -100,6 +100,8 @@ static int r300_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
  case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER:
  case PIPE_CAP_CONDITIONAL_RENDER:
  case PIPE_CAP_TEXTURE_BARRIER:
+case PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS:
+case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS:
  return 1;

  /* r300 cannot do swizzling of compressed textures. Supported 
otherwise. */
diff --git a/src/gallium/include/pipe/p_defines.h 
b/src/gallium/include/pipe/p_defines.h
index f00077c..30f1d7f 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -465,7 +465,9 @@ enum pipe_cap {
 PIPE_CAP_MIN_TEXEL_OFFSET = 50,
 PIPE_CAP_MAX_TEXEL_OFFSET = 51,
 PIPE_CAP_CONDITIONAL_RENDER = 52,
-   PIPE_CAP_TEXTURE_BARRIER = 53
+   PIPE_CAP_TEXTURE_BARRIER = 53,
+   PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS = 54, /* temporary */
+   PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS = 55 /* temporary */
  };

  /**
diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 3527088..b8a7126 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -1815,18 +1815,34 @@ assign_varying_locations(struct gl_context *ctx,

 if (ctx-API == API_OPENGLES2 || prog-Version == 100) {
if (varying_vectors  ctx-Const.MaxVarying) {
-linker_error(prog, shader uses too many varying vectors 
- (%u  %u)\n,
- varying_vectors, ctx-Const.MaxVarying);
-return false;
+ if (ctx-Const.GLSLSkipStrictMaxVaryingLimitCheck) {
+linker_warning(prog, shader uses too many varying vectors 
+   (%u  %u), but the driver will try to optimize 
+   them out; this is non-portable out-of-spec 
+   behavior\n,
+   varying_vectors, ctx-Const.MaxVarying);
+ } else {
+linker_error(prog, shader uses too many varying vectors 
+ (%u  %u)\n,
+ varying_vectors, ctx-Const.MaxVarying);
+return false;
+ }
}
 } else {
const unsigned float_components = varying_vectors * 4;
if (float_components  ctx-Const.MaxVarying * 4) {
-linker_error(prog, shader uses too many varying components 
- (%u  %u)\n,
- float_components, ctx-Const.MaxVarying * 4);
-return false;
+ if (ctx-Const.GLSLSkipStrictMaxVaryingLimitCheck) {
+linker_warning(prog, shader uses too many varying components 
+   (%u  %u), but the driver will try to optimize 
+   them out; this is non-portable out-of-spec 
+   behavior\n,
+   float_components, ctx-Const.MaxVarying * 4);
+ } else {
+linker_error(prog, shader uses too many varying components 
+ (%u  %u)\n,
+ float_components, ctx-Const.MaxVarying * 4);
+return false;
+ }
}
 }

@@ -1960,8 +1976,15 @@ check_resources(struct gl_context *ctx, struct 
gl_shader_program *prog)
}

if (sh-num_uniform_components  max_uniform_components[i]) {
- linker_error(prog, Too many %s shader uniform components,
- shader_names[i]);
+ if (ctx-Const.GLSLSkipStrictMaxUniformLimitCheck) {
+linker_warning(prog, Too many %s shader uniform components, 
+   but the driver will try to optimize them out; 
+   this is non-portable out-of-spec behavior\n,
+   shader_names[i]);
+ } else {
+linker_error(prog, Too many %s shader uniform components,
+ shader_names[i]);
+ }
}
 }

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index fc494f7..2073c8f 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2829,6 +2829,15 @@ struct gl_constants
  * Texture borders are deprecated in GL 3.0.
  **/
 GLboolean StripTextureBorder;
+
+   /**
+* For drivers 

Re: [Mesa-dev] [PATCH] mesa: add const flags to skip MaxVarying and MaxUniform linker checks (v2)

2011-12-12 Thread Kenneth Graunke
On 12/11/2011 11:07 PM, Marek Olšák wrote:
 This is only temporary until a better solution is available.
 
 v2: print warnings and add gallium CAPs

Thanks, Marek---this looks good to me.

Reviewed-by: Kenneth Graunke kenn...@whitecape.org

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallivm: Fix build with llvm-3.1svn.

2011-12-12 Thread Vinson Lee
llvm-3.1svn r145714 moved global variables into a new TargetOptions
class. TargetMachine constructor now needs a TargetOptions object as
well.

Signed-off-by: Vinson Lee v...@vmware.com
---
 src/gallium/auxiliary/gallivm/lp_bld_debug.cpp |   14 +-
 src/gallium/auxiliary/gallivm/lp_bld_misc.cpp  |2 ++
 2 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_debug.cpp 
b/src/gallium/auxiliary/gallivm/lp_bld_debug.cpp
index 62825a2..a50a51d 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_debug.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_debug.cpp
@@ -250,7 +250,19 @@ lp_disassemble(const void* func)
   return;
}
 
-#if HAVE_LLVM = 0x0300
+#if HAVE_LLVM = 0x0301
+   TargetOptions options;
+#if defined(DEBUG)
+   options.JITEmitDebugInfo = true;
+#endif
+#if defined(PIPE_ARCH_X86)
+   options.StackAlignmentOverride = 4;
+#endif
+#if defined(DEBUG) || defined(PROFILE)
+   options.NoFramePointerElim = true;
+#endif
+   TargetMachine *TM = T-createTargetMachine(Triple, sys::getHostCPUName(), 
, options);
+#elif HAVE_LLVM == 0x0300
TargetMachine *TM = T-createTargetMachine(Triple, sys::getHostCPUName(), 
);
 #else
TargetMachine *TM = T-createTargetMachine(Triple, );
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
index 41a..fe7616b 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -69,6 +69,7 @@ 
lp_register_oprofile_jit_event_listener(LLVMExecutionEngineRef EE)
 extern C void
 lp_set_target_options(void)
 {
+#if HAVE_LLVM = 0x0300
 #if defined(DEBUG)
 #if HAVE_LLVM = 0x0207
llvm::JITEmitDebugInfo = true;
@@ -102,6 +103,7 @@ lp_set_target_options(void)
 #if 0
llvm::UnsafeFPMath = true;
 #endif
+#endif  /* HAVE_LLVM = 0x0300 */
 
 #if HAVE_LLVM  0x0209
/*
-- 
1.7.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] Add mismatch check for glGetTexImage or it will return -1 and may lead to segment fault.

2011-12-12 Thread jian . j . zhao
From: Jian Zhao jian.j.z...@intel.com

---
 src/mesa/main/texgetimage.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c
index ae0d51f..3f24187 100644
--- a/src/mesa/main/texgetimage.c
+++ b/src/mesa/main/texgetimage.c
@@ -708,6 +708,14 @@ getteximage_error_check(struct gl_context *ctx, GLenum 
target, GLint level,
   return GL_TRUE;
}
 
+   if (!_mesa_is_legal_format_and_type(ctx, format, type)) {
+  /*GL_INVALID_OPERATION is generated by a format/type
+   * mismatch (see the 1.2 spec page 94, sec 3.6.4.)
+   */
+  _mesa_error(ctx, GL_INVALID_OPERATION, glGetTexImage(target));
+  return GL_TRUE;
+   }
+
baseFormat = _mesa_get_format_base_format(texImage-TexFormat);
   
/* Make sure the requested image format is compatible with the
-- 
1.7.0.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev