On Thu, 15 Jan 2015 15:32:59 +0100, Roland Scheidegger <srol...@vmware.com> wrote:

Am 15.01.2015 um 10:05 schrieb Iago Toral:
Hi,

We have 16 deqp tests that fail, at least on i965, because of
insufficient precision of the mod GLSL function.

Mesa lowers mod(x,y) to y * fract(x,y) so there can be some precision
lost due to fract operation. Since the result is multiplied by y the
total precision lost usually grows together with the value of y.
Did you mean fract(x/y) here?


Below are some examples to give an idea of the magnitude of this error.
The values on the right represent the precision error for each case:

mod(-1.951171875, 1.9980468750) =>  0.0000000447
mod(121.57, 13.29)              =>  0.0000023842
mod(3769.12, 321.99)            =>  0.0000762939
mod(3769.12, 1321.99)           =>  0.0001220703
mod(-987654.125, 123456.984375) =>  0.0160663128
mod( 987654.125, 123456.984375) =>  0.0312500000

As you see, for large enough values, the precision error becomes
significant.

This can be fixed by lowering mod(x,y) to x - y * floor(x/y) instead,
which is the suggested implementation in the GLSL docs. I have a local
patch in my tree that does this and it does indeed fix the problem. the
down side is that this implementation adds and extra ADD instruction to
the generated code (besides replacing fract with floor, which I guess
have similar cost).

Since this is a case where there is some trade-off to the fix, I wonder
if we are interested in doing this or not. Is the precision fix worth
the additional ADD?


Well I can tell you that llvmpipe implements frc(x) as x - floor(x), so
this change looks good to me :-).
On a more serious note though, it looks to me like the cost of this
expression would be mostly dominated by the division, hence some add
more shouldn't be that bad. And if the test is legit, I don't think
there's much choice (unless you could make this optional for some old
glsl versions if they didn't require that much precision but even then
it's probably not worth bothering imho).


FWIW, I just typed out the following little piglit test and tried it on R600:

[require]
GLSL >= 3.30

[vertex shader passthrough]
[fragment shader]
uniform float a;
uniform float b;
out vec4 colour;

void
main(void)
{
//      colour = vec4(b * fract(a / b)); // current lowering of mod(x,y)
    colour = vec4(a - b * floor(a/b)); // proposed lowering
}

[test]
clear color 0.5 0.5 0.5 0.5
clear

uniform float a 4.2
uniform float b 3.5
draw rect -1 -1 2 2
probe rgba 1 1 0.7 0.7 0.7 0.7


Resulting R600 assembly:

// y * fract(x,y)
// KC0[0].x is x and KC0[1] is y
1      t: RECIP_IEEE         T0.x,  KC0[1].x
2      x: MUL                T0.x,  KC0[0].x, T0.x
3      x: FRACT              T0.x,  T0.x
4      x: MUL                R0.x,  KC0[1].x, T0.x
EXPORT_DONE        PIXEL 0     R0.xxxx  EOP

// x - y * floor(x/y)
1      t: RECIP_IEEE         T0.x,  KC0[1].x
2      x: MUL                T0.x,  KC0[0].x, T0.x
3      x: FLOOR              T0.x,  T0.x
4      x: MULADD             R0.x,  KC0[1].x, -T0.x, KC0[0].x
EXPORT_DONE        PIXEL 0     R0.xxxx  EOP

Same number of cycles/length of dependency chain/ALU pipe usage for both methods.


I'd expect most architectures that can do source negate with multiply-add in a single operation should get similar results with no extra cost for the subtraction.


/Glenn
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to