On January 4, 2018 12:51:15 Karol Herbst <[email protected]> wrote:
On Thu, Jan 4, 2018 at 7:06 PM, Ilia Mirkin <[email protected]> wrote:
On Thu, Jan 4, 2018 at 10:01 AM, Karol Herbst <[email protected]> wrote:
significant changes to last series:
* arb_gpu_shader5 interpolateat* (those nir ops don't map well to nvir)
no good plan on how to properly implement those
What's the issue? They should map as well as the TGSI ones. (Since the
TGSI ones are just the GLSL ones.)
it is a bit ugly, because usually all inputs vars are lowered away, so
that they are inputs. So they need special handling;
lowered (input is centroid):
vec1 32 ssa_25 = intrinsic load_input (ssa_24) () (0, 0) /* base=0 */
/* component=0 */ /* packed:centroid_qualified */
vec1 32 ssa_27 = intrinsic load_input (ssa_26) () (0, 1) /* base=0 */
/* component=1 */ /* packed:centroid_qualified */
not lowered:
decl_var INTERP_MODE_NONE vec2 in@unqualified-temp
vec2 32 ssa_11 = intrinsic interp_var_at_centroid () (in@unqualified-temp) ()
I kind of wished I could have a load_input intrinsic with a flag or
load_input_at_centroid, so that I end up with the same code in the
end.
In i965, we use the NIR explicit input interpolation intrinsics. I'm on my
phone so I can't give more details easily.
* arb_gpu_shader5.texturegatheroffsets (nir internal assert)
glsl_to_nir.cpp:2082: virtual void
{anonymous}::nir_visitor::visit(ir_texture*): Assertion
`ir->offset->type->is_vector() || ir->offset->type->is_scalar()' failed.
This is because nir doesn't support the 4-offset tg4 variant. This is
expected (by nir) to be lowered in GLSL to 4 separate gathers, but
isn't because nvc0 doesn't set the caps to make st/mesa do that.
Either set that cap based on whether NIR is used, or teach nir about
the 4-offset tg4 (which the nvidia hw supports directly btw).
well I would prefer the last one obviously, but nir gives me a
nir_texop_tg4 in other tests, it is just those mentioned above where
it fails.
I would prefer that as well. There's no reason NIR can't support it so we
may as well add support. We should also move the lowering from
spirv_to_nit to nir_lower_tex so that spirv_to_nir can give you the
unlowered version you want.
* some int64 stuff related to compound types
As I mentioned, you either have to fix RA (I don't recommend this), or
you have to stop using 64-bit Value's for storage. Use 32-bit Value's,
and merge/split them all the time around 64-bit ops like the TGSI FE
does (which was implemented that way largely due to the way TGSI
works, but is a happy coincidence that it also works around some of
the RA shortcomings). And additionally you may need to improve the
merge splits pass to avoid some of the pain.
You could also just disable int64 for now - it's not important.
* various extensions
* variable-indexing (related to above mentioned packing issue)
* glsl-4.20.execution.vs_in
* some variable-indexing issues related to unaligned memory accesses
The variable-indexing stuff is extremely important to work out, since
it belies a fundamental problem in some approach to the conversion.
well the normal variable indexing stuff works if I disable
nir_compact_varyings, which we might want to do anyway for nouveau for
now. Or I teach memorOpt to not merge things for unaligned addresses.
I have to take a more focused look at the fails anyway
* some geometry shader fails
Have you done any testing with nv50? It should largely work out, but
there are some things you have to be careful about. The TGSI frontend
generates IR that is capable of being processed by both the nv50 and
nvc0 lowering/RA/emission logic, would want to ensure that an nir
frontend would be able to do this too. If you don't have access to a
Tesla-era GPU, I can act as a tester in a limited capacity.
I have a tesla GPU.
Sounds like this is still all pretty experimental and has a lot of
deep issues given the fail/crash count... IMHO not ready for merging.
Also you really need to come up with a workable solution to the
immediates issue.
well I could just store them like it is done with TGSI and just put
loadImms where accessed, but this doesn't really fit the NIR logic
here. Maybe there is a NIR pass to move them around, so that the issue
is less significant. Or maybe I always check if the source contains a
const value and use loadImm instead of getting the stored immediate
value. Yeah I think the last idea would be less painful, we just end
up with more dead instructions after converting.
What is the nature of the immediate problem? We may have a similar issue.
_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev