Re: [Mesa-dev] 答复: Re: screen blurred and glxgears segment fault on ubuntu17.10 for arm architecture server with amdgpu(AMD RADEON PRO WX7100)

2017-12-05 Thread Matt Turner
On Tue, Dec 5, 2017 at 8:18 PM, Lvzhihong (ReJohn)
 wrote:
> Hi Michel,
>
> We solve the glxgears segmentation fault problem by add the compile 
> flags: -fsigned-char.
> In arm platform, char variant default to "unsigned-char", that makes 
> problem of glxgears segmentation fault.

Well done.

On x86/amd64 "char" is signed, but on arm it is unsigned. See
https://ajax.fedorapeople.org/is-char-signed-or-not.txt and
https://trofi.github.io/posts/203-signed-char-or-unsigned-char.html

I wonder if Coccinelle is capable of detecting cases where chars are
compared with -1, etc.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones

On 12/01/2017 01:52 PM, Miguel Angel Vico wrote:



On Fri, 1 Dec 2017 13:38:41 -0500
Rob Clark  wrote:


On Fri, Dec 1, 2017 at 12:09 PM, Nicolai Hähnle  wrote:

On 01.12.2017 16:06, Rob Clark wrote:


On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle 
wrote:


Hi,

I've had a chance to look a bit more closely at the allocator prototype
repository now. There's a whole bunch of low-level API design feedback,
but
for now let's focus on the high-level stuff first.


Thanks for taking a look.


Going by the 4.5 major object types (as also seen on slide 5 of your
presentation [0]), assertions and usages make sense to me.

Capabilities and capability sets should be cleaned up in my opinion, as
the
status quo is overly obfuscating things. What capability sets really
represent, as far as I understand them, is *memory layouts*, and so
that's
what they should be called.

This conceptually simplifies `derive_capabilities` significantly without
any
loss of expressiveness as far as I can see. Given two lists of memory
layouts, we simply look for which memory layouts appear in both lists,
and
then merge their constraints and capabilities.

Merging constraints looks good to me.

Capabilities need some more thought. The prototype removes capabilities
when
merging layouts, but I'd argue that that is often undesirable. (In fact,
I
cannot think of capabilities which we'd always want to remove.)

A typical example for this is compression (i.e. DCC in our case). For
rendering usage, we'd return something like:

Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC)

For display usage, we might return (depending on hardware):

Memory layout: AMD/tiled; constraints(alignment=64k); caps(none)

Merging these in the prototype would remove the DCC capability, even
though
it might well make sense to keep it there for rendering. Dealing withthe
fact that display usage does not have this capability is precisely one of
the two things that transitions are about! The other thing that
transitions
are about is caches.

I think this is kind of what Rob was saying in one of his mails.



Perhaps "layout" is a better name than "caps".. either way I think of
both AMD/tiled and AMD/DCC as the same type of "thing".. the
difference between AMD/tiled and AMD/DCC is that a transition can be
provided for AMD/DCC.  Other than that they are both things describing
the layout.



The reason that a transition can be provided is that they aren't quite the
same thing, though. In a very real sense, AMD/DCC is a "child" propertyof
AMD/tiled: DCC is implemented as a meta surface whose memory layout depends
on the layout of the main surface.


I suppose this is six-of-one, half-dozen of the other..

what you are calling a layout is what I'm calling a cap that just
happens not to have an associated transition


Although, if there are GPUs that can do an in-place "transition" between
different tiling layouts, then the distinction is perhaps really not as
clear-cut. I guess that would only apply to tiled renderers.


I suppose the advantage of just calling both layout and caps the same
thing, and just saying that a "cap" (or "layout" if you prefer that
name) can optionally have one or more associated transitions, is that
you can deal with cases where sometimes a tiled format might actually
have an in-place transition ;-)

  

So lets say you have a setup where both display and GPU supported
FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
(FOO/cached).  But the GPU supported the following transitions:

trans_a: FOO/CC -> null
trans_b: FOO/cached -> null

Then the sets for each device (in order of preference):

GPU:
1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
3: caps(FOO/tiled); constraints(alignment=32k)

Display:
1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
   transition(GPU->display: trans_a, trans_b; display->GPU: none)
2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
   transition(GPU->display: trans_a; display->GPU: none)
3: caps(FOO/tiled); constraints(alignment=64k);
   transition(GPU->display: none; display->GPU: none)



We definitely don't want to expose a way of getting uncached rendering
surfaces for radeonsi. I mean, I think we are supposed to be able to program
our hardware so that the backend bypasses all caches, but (a) nobody
validates that and (b) it's basically suicide in terms of performance. Let's
build fewer footguns :)


sure, this was just a hypothetical example.  But to take this case as
another example, if you didn't want to expose uncached rendering (or
cached w/ cache flushes after each draw), you would exclude the entry
from the GPU set which didn't have FOO/cached (I'm adding back a
cached but not CC config just to make it 

Re: [Mesa-dev] [PATCH v3 24/25] i965: Call spirv_to_nir() instead of glsl_to_nir() for SPIR-V shaders

2017-12-05 Thread Eduardo Lima Mitev
On 12/05/2017 06:10 AM, Timothy Arceri wrote:
> Reviewed-by: Timothy Arceri 
>

Thanks, Timothy.

Any chance to review the remaining patches in the series?
It would be nice to land this batch soon to focus on the actual linker
stuff.

cheers,
Eduardo

> On 04/12/17 20:21, Eduardo Lima Mitev wrote:
>> This is the main fork of the shader compilation code-path, where a NIR
>> shader is obtained by calling spirv_to_nir() or glsl_to_nir(),
>> depending on its nature..
>>
>> v2: Use 'spirv_data' member from gl_linked_shader to know which method
>>     to call. (Timothy Arceri)
>> ---
>>   src/mesa/drivers/dri/i965/brw_program.c | 10 --
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_program.c
>> b/src/mesa/drivers/dri/i965/brw_program.c
>> index 755d4973cc0..4043253a653 100644
>> --- a/src/mesa/drivers/dri/i965/brw_program.c
>> +++ b/src/mesa/drivers/dri/i965/brw_program.c
>> @@ -31,6 +31,7 @@
>>     #include 
>>   #include "main/imports.h"
>> +#include "main/glspirv.h"
>>   #include "program/prog_parameter.h"
>>   #include "program/prog_print.h"
>>   #include "program/prog_to_nir.h"
>> @@ -73,9 +74,14 @@ brw_create_nir(struct brw_context *brw,
>>     ctx->Const.ShaderCompilerOptions[stage].NirOptions;
>>  nir_shader *nir;
>>   -   /* First, lower the GLSL IR or Mesa IR to NIR */
>> +   /* First, lower the GLSL/Mesa IR or SPIR-V to NIR */
>>  if (shader_prog) {
>> -  nir = glsl_to_nir(shader_prog, stage, options);
>> +  if (shader_prog->_LinkedShaders[stage]->spirv_data)
>> + nir = _mesa_spirv_to_nir(ctx, shader_prog, stage, options);
>> +  else
>> + nir = glsl_to_nir(shader_prog, stage, options);
>> +  assert (nir);
>> +
>>     nir_remove_dead_variables(nir, nir_var_shader_in |
>> nir_var_shader_out);
>>     nir_lower_returns(nir);
>>     nir_validate_shader(nir);
>>
>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones

On 12/01/2017 10:34 AM, Nicolai Hähnle wrote:

On 01.12.2017 18:09, Nicolai Hähnle wrote:
[snip]

As for the actual transition API, I accept that some metadata may be
required, and the metadata probably needs to depend on the memory 
layout,

which is often vendor-specific. But even linear layouts need some
transitions for caches. We probably need at least some generic 
"off-device

usage" bit.


I've started thinking of cached as a capability with a transition.. I
think that helps.  Maybe it needs to somehow be more specific (ie. if
you have two devices both with there own cache with no coherency
between the two)


As I wrote above, I'd prefer not to think of "cached" as a capability 
at least for radeonsi.


 From the desktop perspective, I would say let's ignore caches, the 
drivers know which caches they need to flush to make data visible to 
other devices on the system.


On the other hand, there are probably SoC cases where non-coherent 
caches are shared between some but not all devices, and in that case 
perhaps we do need to communicate this.


So perhaps we should have two kinds of "capabilities".

The first, like framebuffer compression, is a capability of the 
allocated memory layout (because the compression requires a meta 
surface), and devices that expose it may opportunistically use it.


The second, like caches, is a capability that the device/driver will 
use and you don't get a say in it, but other devices/drivers also 
don't need to be aware of them.


So then you could theoretically have a system that gives you:

GPU: FOO/tiled(layout-caps=FOO/cc, dev-caps=FOO/gpu-cache)
Display: FOO/tiled(layout-caps=FOO/cc)
Video:   FOO/tiled(dev-caps=FOO/vid-cache)
Camera:  FOO/tiled(dev-caps=FOO/vid-cache)

[snip]

FWIW, I think all that stuff about different caches quite likely 
over-complicates things. At the end of each "command submission" of 
whichever type of engine, the buffer must be in a state where the kernel 
is free to move it around for memory management purposes. This already 
puts a big constraint on the kind of (non-coherent) caches that can be 
supported anyway, so I wouldn't be surprised if we could get away with a 
*much* simpler approach.


I'd rather not depend on this type of cleverness if possible.  Other 
kernels/OS's may not behave this way, and I'd like the allocator 
mechanism to be something we can use across all or at least most of the 
POSIX and POSIX-like OS's we support.  Also, this particular example is 
not true of our proprietary Linux driver, and I suspect it won't always 
be the case for other drivers.  If a particular driver or OS fits this 
assumption, the driver is always free to return no-op transitions in 
that case.


Thanks,
-James


Cheers,
Nicolai


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones

On 11/30/2017 12:06 PM, Lyude Paul wrote:

On Thu, 2017-11-30 at 13:20 -0500, Rob Clark wrote:

On Thu, Nov 30, 2017 at 12:59 AM, James Jones  wrote:

On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:


On Wed, 29 Nov 2017 16:28:15 -0500
Rob Clark  wrote:


Do we need to define both in-place and copy transitions?  Ie. what if
GPU is still reading a tiled or compressed texture (ie. sampling from
previous frame for some reason), but we need to untile/uncompress for
display.. of maybe there are some other cases like that we should
think about..

Maybe you already have some thoughts about that?



This is the next thing I'll be working on. I haven't given it much
thought myself so far, but I think James might have had some insights.
I'll read through some of his notes to double-check.



A couple of notes on usage transitions:

While chatting about transitions, a few assertions were made by others
that
I've come to accept, despite the fact that they reduce the generality of
the
allocator mechanisms:

-GPUs are the only things that actually need usage transitions as far as I
know thus far.  Other engines either share the GPU representations of
data,
or use more limited representations; the latter being the reason non-GPU
usage transitions are a useful thing.

-It's reasonable to assume that a GPU is required to perform a usage
transition.  This follows from the above postulate.  If only GPUs are
using
more advanced representations, you don't need any transitions unless you
have a GPU available.


This seems reasonable.  I can't think of any non-gpu related case
where you would need a transition, other than perhaps cache flush/inv.


 From that, I derived the rough API proposal for transitions presented on
my
XDC 2017 slides.  Transition "metadata" is queried from the allocator
given
a pair of usages (which may refer to more than one device), but the
realization of the transition is left to existing GPU APIs.  I think I put
Vulkan-like pseudo-code in the slides, but the GL external objects
extensions (GL_EXT_memory_object and GL_EXT_semaphore) would work as well.


I haven't quite wrapped my head around how this would work in the
cross-device case.. I mean from the API standpoint for the user, it
seems straightforward enough.  Just not sure how to implement that and
what the driver interface would look like.

I guess we need a capability-conversion (?).. I mean take for example
the the fb compression capability from your slide #12[1].  If we knew
there was an available transition to go from "Dev2 FB compression" to
"normal", then we could have allowed the "Dev2 FB compression" valid
set?

[1] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf


Regarding in-place Vs. copy: To me a transition is something that happens
in-place, at least semantically.  If you need to make copies, that's a
format conversion blit not a transition, and graphics APIs are already
capable of expressing that without any special transitions or help from
the
allocator.  However, I understand some chipsets perform transitions using
something that looks kind of like a blit using on-chip caches and
constrained usage semantics.  There's probably some work to do to see
whether those need to be accommodated as conversion blits or usgae
transitions.


I guess part of what I was thinking of, is what happens if the
producing device is still reading from the buffer.  For example,
viddec -> gpu use case, where the video decoder is also still hanging
on to the frame to use as a reference frame to decode future frames?

I guess if transition from devA -> devB can be done in parallel with
devA still reading the buffer, it isn't a problem.  I guess that
limits (non-blit) transitions to decompression and cache op's?  Maybe
that is ok..


I don't know of a real case it would be a problem.  Note you can 
transition to multiple usages in the proposed API, so for the video 
decoder example, you would transition from [video decode target] to 
[video decode target, GPU sampler source] for simultaneous texturing and 
reference frame usage.



For our hardware's purposes, transitions are just various levels of
decompression or compression reconfiguration and potentially cache
flushing/invalidation, so our transition metadata will just be some bits
signaling which compression operation is needed, if any.  That's the sort
of
operation I modeled the API around, so if things are much more exotic than
that for others, it will probably require some adjustments.




[snip]



Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my
primary goals.  However, it's a pretty heavy thing to prototype.  If
someone
has the time though, I think it would be a great experiment.  It would
help
flesh out the paltry list of usages, constraints, and capabilities in the
existing prototype codebase.  The kmscube example really should have added
at least a "render" usage, but I got lazy and just re-used texture for
now.
That 

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones

On 11/30/2017 10:48 AM, Rob Clark wrote:

On Thu, Nov 30, 2017 at 1:28 AM, James Jones  wrote:

On 11/29/2017 01:10 PM, Rob Clark wrote:


On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand 
wrote:


On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:



On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
wrote:


I'm not quite some sure what I think about this.  I think I would like
to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing
on
it's own so we have two competing solutions.



I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are thinking about implementing $new_thing for non-gpu
devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
about a (for ex.) camera.  I kinda don't want to throw out the baby
with the bathwater here.




Agreed.  GBM is very EGLish and we don't want the new allocator to be
that.



*maybe* GBM could be partially implemented on top of $new_thing.  I
don't quite see how that would work.  Possibly we could deprecate
parts of GBM that are no longer needed?  idk..  Either way, I fully
expect that GBM and mesa's implementation of $new_thing could perhaps
sit on to of some of the same set of internal APIs.  The public
interface can be decoupled from the internal implementation.




Maybe I should restate things a bit.  My real point was that modifiers +
$new_thing + Kernel blob should be a complete and more powerful
replacement
for GBM.  I don't know that we really can implement GBM on top of it
because
GBM has lots of wishy-washy concepts such as "cursor plane" which may not
map well at least not without querying the kernel about specifc display
planes.  In particular, I don't want someone to feel like they need to
use
$new_thing and GBM at the same time or together.  Ideally, I'd like them
to
never do that unless we decide gbm_bo is a useful abstraction for
$new_thing.



(just to repeat what I mentioned on irc)

I think main thing is how do you create a swapchain/surface and know
which is current front buffer after SwapBuffers()..  that is the only
bits of GBM that seem like there would still be useful.  idk, maybe
there is some other idea.



I don't view this as terribly useful except for legacy apps that need an EGL
window surface and can't be updated to use new methods.  Wayland compositors
certainly don't fall in that category.  I don't know that any GBM apps do.


kmscube doesn't count?  :-P

Hmm, I assumed weston and the other wayland compositors where still
using gbm to create EGL surfaces, but I confess to have not actually
looked at weston src code for quite a few years now.

Anyways, I think it is perfectly fine for GBM to stay as-is in it's
current form.  It can already import dma-buf fd's, and those can
certainly come from $new_thing.

So I guess we want an EGL extension to return the allocator device
instance for the GPU.  That also takes care of the non-bare-metal
case.


Rather, I think the way forward for the classes of apps that need something
like GBM or the generic allocator is more or less the path ChromeOS took
with their graphics architecture: Render to individual buffers (using FBOs
bound to imported buffers in GL) and manage buffer exchanges/blits manually.

The useful abstraction surfaces provide isn't so much deciding which buffer
is currently "front" and "back", but rather handling the transition/hand-off
to the window system/display device/etc. in SwapBuffers(), and the whole
idea of the allocator proposals is to make that something the application or
at least some non-driver utility library handles explicitly based on where
exactly the buffer is being handed off to.


Hmm, ok..  I guess the transition will need some hook into the driver.
For freedreno and vc4 (and I suspect this is not uncommon for tiler
GPUs), switching FBOs doesn't necessarily flush rendering to hw.
Maybe it would work out if you requested the sync fd file descriptor
from an EGL fence before passing things to next device, as that would
flush rendering.


This "flush" is exactly what usage transitions are for:

1) Perform rendering or texturing
2) Insert a transition into command stream using metadata extracted from 
allocator library into the rendering/texturing API using a new entry 
point.  This instructs the driver to perform any 
flushes/decompressions/etc. needed to transition to the next usage the 
pipeline.
3) Insert/extract your fence (potentially this is combined with above 
entry point like it is in GL_EXT_semaphore).



I wonder a bit 

[Mesa-dev] 答复: Re: screen blurred and glxgears segment fault on ubuntu17.10 for arm architecture server with amdgpu(AMD RADEON PRO WX7100)

2017-12-05 Thread Lvzhihong (ReJohn)
Hi Michel,

We solve the glxgears segmentation fault problem by add the compile 
flags: -fsigned-char.
In arm platform, char variant default to "unsigned-char", that makes 
problem of glxgears segmentation fault.

But after we recompile mesa with  -fsigned-char. The screen output 
still tearing(blurred). the display problem still exist.
Do you have any Ideas of the different between arm(aarch64) and  
x86(amd64)? What may cause the screen tearing?
 

-邮件原件-
发件人: Lvzhihong (ReJohn) 
发送时间: 2017年12月6日 10:20
收件人: 'Michel Dänzer' 
抄送: 'mesa-dev@lists.freedesktop.org' ; Liuchen 
(Greentea) ; sunnanyong ; 
'alexdeuc...@gmail.com' 
主题: Re: [Mesa-dev] screen blurred and glxgears segment fault on ubuntu17.10 for 
arm architecture server with amdgpu(AMD RADEON PRO WX7100)

Hi Michel,
I debug deep into the source code, found out that:

In si_shader.c file about 7039 line:

if (key->ps_prolog.color_interp_vgpr_index[i] != -1) {
unsigned interp_vgpr = key->ps_prolog.num_input_sgprs +
   
key->ps_prolog.color_interp_vgpr_index[i];

/* Get the (i,j) updated by bc_optimize handling. */
interp[0] = LLVMBuildExtractValue(gallivm->builder, ret,
  interp_vgpr, "");
interp[1] = LLVMBuildExtractValue(gallivm->builder, ret,
  interp_vgpr + 1, "");
interp_ij = lp_build_gather_values(gallivm, interp, 2);
}

The 'key->ps_prolog.color_interp_vgpr_index[i]',when i=0, the value is '255 
\377', when i=1, the value is '0 \000'; I doubt 
ps_prolog.color_interp_vgpr_index[0] should be '-1',while it's a unsigned 
variable, so it turns to be 255 ? 

By the way, I do the same test in a x86 server with all configuration the same 
with the arm server. The glxgears runs well and the screen output is normal. 
It's queer indeed.

I hope the information can help. 
Looking forward to your reply.

Thanks.

rejohn


-邮件原件-
发件人: Lvzhihong (ReJohn)
发送时间: 2017年12月5日 11:09
收件人: 'Michel Dänzer' 
抄送: mesa-dev@lists.freedesktop.org; Liuchen (Greentea) 
; sunnanyong 
主题: Re: [Mesa-dev] screen blurred and glxgears segment fault on ubuntu17.10 for 
arm architecture server with amdgpu(AMD RADEON PRO WX7100)

Thanks for your reply.

The attachment is the Xorg log. 
If you need other information, pls let me know.

Thanks.


-邮件原件-
发件人: Michel Dänzer [mailto:mic...@daenzer.net]
发送时间: 2017年12月5日 1:02
收件人: Lvzhihong (ReJohn) 
抄送: mesa-dev@lists.freedesktop.org; Liuchen (Greentea) 
; sunnanyong 
主题: Re: [Mesa-dev] screen blurred and glxgears segment fault on ubuntu17.10 for 
arm architecture server with amdgpu(AMD RADEON PRO WX7100)

On 2017-12-04 03:49 AM, Lvzhihong (ReJohn) wrote:
> Hi,all,
> 
>  We met a problem on ubuntu17.10 for arm server with 
> amdgpu(AMD RADEON PRO WX7100),  we use open source driver which are 
> integrated in ubuntu17.10. And the architecture is AArch64-linux-gnu.
> 
>  
> 
>  we install :
> 
>      apt-get install xserver-xorg xinit xfce4 and mesa-utils
> glmark2
> 
>  we start x server :
> 
>   startx
> 
>  and then the monitor shows the screen and the screen is blurred.

Can you share the corresponding Xorg log file?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable disk shader cache by default

2017-12-05 Thread Jordan Justen
On 2017-12-05 18:30:30, Mark Janes wrote:
> Timothy Arceri  writes:
> 
> > On 06/12/17 12:04, Mark Janes wrote:
> >> Jordan Justen  writes:
> >> 
> >>> On 2017-12-05 14:49:28, Mark Janes wrote:
>  Jordan Justen  writes:
> > It could be interesting to define a MESA extension to control or query
> > the shader cache. Today, the shader cache is a nearly 'invisible'
> > feature. There are a few env-vars, but I wouldn't call a public
> > interface.
> >
> > The downside of defining an extension is that it might constrain
> > changes to the shader cache implementation. Or, if the interface is
> > too complex, then it may be tough for some drivers to implement.
> 
>  Why is an extension necessary?  By comparison, GCC has no interface to
>  query the statistics for ccache.  A utility that can read the cache
>  files and report hits/misses would at least inform us that the cache is
>  functioning.  The utility would be useful in writing real unit tests for
>  the feature.
> >>>
> >>> If we don't have an extension, then how would this work? i965 unit
> >>> tests during the build? If we don't actually test the feature against
> >>> real hardware, I think we'll miss a whole class of potential issues.
> >> 
> >> Unit tests are good, and I know we already have some of those.  We'll
> >> have better data on the classes of issues are once we find more of
> >> them.  So far, it's just a few hardware-independent piglit tests.
> >> 
> >>> If we don't have an extension, then how can an external (piglit) test
> >>> hope to interact or probe the cache?
> >> 
> >> Piglit queries a utility or lib to make assertions about the cache?  For
> >> that matter, this whole cache verification could be implemented within
> >> piglit as a profile that executes tests twice with appropriate
> >> verification.
> >
> > That would mean manual work to add tests to a profile. The manual part 
> > makes it unlikely to ever happen, also trying to manually update a list 
> > to test every variation of shader is doomed to fail. Alternatively 
> > adding only tests that have found a problem in the past makes it not 
> > very useful.
> 
> Are there not some cpu tests that can be easily dropped from the
> profile?

Looking in tests/cpu.py, I guess the asmparsertests are probably not
affected. (At least today, and that's unlikely to change.)

For the GLSLParserTest tests, they do interact with the shader cache,
so they should still be run.

Uh, not sure if I mentioned it previously, but the shader cache has no
need to run any vulkan tests.

-Jordan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable disk shader cache by default

2017-12-05 Thread Mark Janes
Timothy Arceri  writes:

> On 06/12/17 12:04, Mark Janes wrote:
>> Jordan Justen  writes:
>> 
>>> On 2017-12-05 14:49:28, Mark Janes wrote:
 Jordan Justen  writes:

> On 2017-12-05 09:13:11, Mark Janes wrote:
>> Jordan Justen  writes:
>>
>>> On 2017-11-08 17:26:47, Timothy Arceri wrote:
 Reviewed-by: Timothy Arceri 

 Mark may want to consider adding some of the once a day type CI runs 
 for
 this. For example running the test suite for two consecutive runs on 
 the
 same build so that the second run uses the shader cache and also a
 second run the uses MESA_GLSL=cache_fb to force testing of the cache
 fallback path.
>>>
>>> Yeah. We discussed this previously, but I don't think it's been
>>> implemented yet. My opinion is that it could perhaps be a weekly test.
>>
>> This automation is implemented now. It found the issues reported in
>> https://bugs.freedesktop.org/show_bug.cgi?id=103988
>>
>> My opinion is that this test strategy is inadequate.
>
> Meaning you think we cannot enable i965 shader cache for the 18.0
> release?

 I haven't heard anyone express confidence that this feature is bug-free,
 and I don't know on what basis that claim could be made.  I appreciate
 that a lot of have manual testing has been done.  Do you feel confident
 that the cache can be enabled for all mesa customers without disruption?
>>>
>>> If we had enabled it two months ago, then yes, we would have worked
>>> through the niche issues, or discovered the so-far-hidden major land
>>> mine. At this point, it's getting risky. In a month, I will say no.
>>>
> We are already ~1.5 months away from the next stable branch point. If
> we want to enable this in 18.0, we should be using this time to see if
> enabling the cache by default has some large unexpected side effect in
> our graphics stack, or breaks real-world apps.

 I agree.  We should encourage as many users as possible to enable the
 shader cache in their environments.  At least one stable release should
 be out with a working cache, where the feature can be enabled by those
 who are eager to benefit from it.  I assume there will be a lot of them,
 and they could flush out issues for everyone who has no idea what a
 shader cache is.

>> Adding a dimension to the test matrix has high cost, especially when
>> combined with other dimensions of the test matrix (does shader cache
>> need to be tested for 32/64 bit builds? For hswgt1/hswgt2/hswgt3e?).
>
> Are you saying this is too high cost to run per check-in? Do you need
> to disable it for the health of CI? I think I proposed that daily, or
> perhaps even weekly would be adequate.

 Certainly, the test time per line of shader cache code is massively
 higher than any other feature, even if you run it just once a month.
 Other features have tests that run in milliseconds, not 30min * 20
 machines.
>>>
>>> The scope of this feature is nearly the entire API. It is justified to
>>> throw the entire GL suite of tests at it on a regular basis. The cost
>>> of running this once per week ought to be reasonable.
>>>
>>> Is the cost of running this per check-in too high for the system
>>> today? Or, is it that you think it is too high for the feature? I'm
>>> sympathetic to the former, and not so much to the later. :)
>>>
>>> By the way, enabling these in CI has been helpful in highlighting a
>>> few corner case issues. So, even if you don't like it, *Thank You* for
>>> enabling it. :)
>>>
 Beyond poor return on execution time, there is the simple fact that
 whoever is running the CI needs to manually look at shader cache results
 separately from the rest of the tests.  Unit testing is effective
 because coverage can be added at approximately zero marginal cost.

 3 years from now, will we still be looking at separate weekly shader
 cache test runs to make sure it's working?
>>>
>>> I actually think that long term this should become part of the main
>>> daily and weekly tests. (An idea you've already rejected.) In the long
>>> term it should be stable enough for that, and if it does ever regress,
>>> we'd what to see something sooner rather than later.
>>>
> These tests are already identifying some corner case issues. I'm not
> sure these would have impacted real-world apps yet, but I do think it
> is a good idea to run these tests regularly.
>
> You say this test strategy is inadequate. Perhaps. I do think it needs
> to be part of our test strategy. There is no way we are going to hit
> all the corners of the API better than running all of our tests with
> the cache enabled. Do you agree?

 

Re: [Mesa-dev] screen blurred and glxgears segment fault on ubuntu17.10 for arm architecture server with amdgpu(AMD RADEON PRO WX7100)

2017-12-05 Thread Lvzhihong (ReJohn)
Hi Michel,
I debug deep into the source code, found out that:

In si_shader.c file about 7039 line:

if (key->ps_prolog.color_interp_vgpr_index[i] != -1) {
unsigned interp_vgpr = key->ps_prolog.num_input_sgprs +
   
key->ps_prolog.color_interp_vgpr_index[i];

/* Get the (i,j) updated by bc_optimize handling. */
interp[0] = LLVMBuildExtractValue(gallivm->builder, ret,
  interp_vgpr, "");
interp[1] = LLVMBuildExtractValue(gallivm->builder, ret,
  interp_vgpr + 1, "");
interp_ij = lp_build_gather_values(gallivm, interp, 2);
}

The 'key->ps_prolog.color_interp_vgpr_index[i]',when i=0, the value is '255 
\377', when i=1, the value is '0 \000';
I doubt ps_prolog.color_interp_vgpr_index[0] should be '-1',while it's a 
unsigned variable, so it turns to be 255 ? 

By the way, I do the same test in a x86 server with all configuration the same 
with the arm server. The glxgears runs well and the screen output is normal. 
It's queer indeed.

I hope the information can help. 
Looking forward to your reply.

Thanks.

rejohn


-邮件原件-
发件人: Lvzhihong (ReJohn) 
发送时间: 2017年12月5日 11:09
收件人: 'Michel Dänzer' 
抄送: mesa-dev@lists.freedesktop.org; Liuchen (Greentea) 
; sunnanyong 
主题: Re: [Mesa-dev] screen blurred and glxgears segment fault on ubuntu17.10 for 
arm architecture server with amdgpu(AMD RADEON PRO WX7100)

Thanks for your reply.

The attachment is the Xorg log. 
If you need other information, pls let me know.

Thanks.


-邮件原件-
发件人: Michel Dänzer [mailto:mic...@daenzer.net]
发送时间: 2017年12月5日 1:02
收件人: Lvzhihong (ReJohn) 
抄送: mesa-dev@lists.freedesktop.org; Liuchen (Greentea) 
; sunnanyong 
主题: Re: [Mesa-dev] screen blurred and glxgears segment fault on ubuntu17.10 for 
arm architecture server with amdgpu(AMD RADEON PRO WX7100)

On 2017-12-04 03:49 AM, Lvzhihong (ReJohn) wrote:
> Hi,all,
> 
>  We met a problem on ubuntu17.10 for arm server with 
> amdgpu(AMD RADEON PRO WX7100),  we use open source driver which are 
> integrated in ubuntu17.10. And the architecture is AArch64-linux-gnu.
> 
>  
> 
>  we install :
> 
>      apt-get install xserver-xorg xinit xfce4 and mesa-utils
> glmark2
> 
>  we start x server :
> 
>   startx
> 
>  and then the monitor shows the screen and the screen is blurred.

Can you share the corresponding Xorg log file?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable disk shader cache by default

2017-12-05 Thread Timothy Arceri

On 06/12/17 12:04, Mark Janes wrote:

Jordan Justen  writes:


On 2017-12-05 14:49:28, Mark Janes wrote:

Jordan Justen  writes:


On 2017-12-05 09:13:11, Mark Janes wrote:

Jordan Justen  writes:


On 2017-11-08 17:26:47, Timothy Arceri wrote:

Reviewed-by: Timothy Arceri 

Mark may want to consider adding some of the once a day type CI runs for
this. For example running the test suite for two consecutive runs on the
same build so that the second run uses the shader cache and also a
second run the uses MESA_GLSL=cache_fb to force testing of the cache
fallback path.


Yeah. We discussed this previously, but I don't think it's been
implemented yet. My opinion is that it could perhaps be a weekly test.


This automation is implemented now. It found the issues reported in
https://bugs.freedesktop.org/show_bug.cgi?id=103988

My opinion is that this test strategy is inadequate.


Meaning you think we cannot enable i965 shader cache for the 18.0
release?


I haven't heard anyone express confidence that this feature is bug-free,
and I don't know on what basis that claim could be made.  I appreciate
that a lot of have manual testing has been done.  Do you feel confident
that the cache can be enabled for all mesa customers without disruption?


If we had enabled it two months ago, then yes, we would have worked
through the niche issues, or discovered the so-far-hidden major land
mine. At this point, it's getting risky. In a month, I will say no.


We are already ~1.5 months away from the next stable branch point. If
we want to enable this in 18.0, we should be using this time to see if
enabling the cache by default has some large unexpected side effect in
our graphics stack, or breaks real-world apps.


I agree.  We should encourage as many users as possible to enable the
shader cache in their environments.  At least one stable release should
be out with a working cache, where the feature can be enabled by those
who are eager to benefit from it.  I assume there will be a lot of them,
and they could flush out issues for everyone who has no idea what a
shader cache is.


Adding a dimension to the test matrix has high cost, especially when
combined with other dimensions of the test matrix (does shader cache
need to be tested for 32/64 bit builds? For hswgt1/hswgt2/hswgt3e?).


Are you saying this is too high cost to run per check-in? Do you need
to disable it for the health of CI? I think I proposed that daily, or
perhaps even weekly would be adequate.


Certainly, the test time per line of shader cache code is massively
higher than any other feature, even if you run it just once a month.
Other features have tests that run in milliseconds, not 30min * 20
machines.


The scope of this feature is nearly the entire API. It is justified to
throw the entire GL suite of tests at it on a regular basis. The cost
of running this once per week ought to be reasonable.

Is the cost of running this per check-in too high for the system
today? Or, is it that you think it is too high for the feature? I'm
sympathetic to the former, and not so much to the later. :)

By the way, enabling these in CI has been helpful in highlighting a
few corner case issues. So, even if you don't like it, *Thank You* for
enabling it. :)


Beyond poor return on execution time, there is the simple fact that
whoever is running the CI needs to manually look at shader cache results
separately from the rest of the tests.  Unit testing is effective
because coverage can be added at approximately zero marginal cost.

3 years from now, will we still be looking at separate weekly shader
cache test runs to make sure it's working?


I actually think that long term this should become part of the main
daily and weekly tests. (An idea you've already rejected.) In the long
term it should be stable enough for that, and if it does ever regress,
we'd what to see something sooner rather than later.


These tests are already identifying some corner case issues. I'm not
sure these would have impacted real-world apps yet, but I do think it
is a good idea to run these tests regularly.

You say this test strategy is inadequate. Perhaps. I do think it needs
to be part of our test strategy. There is no way we are going to hit
all the corners of the API better than running all of our tests with
the cache enabled. Do you agree?


Tests should strive to cover the implementation, not the API.


I don't think any of our test suites operate this way. Very little of
piglit or deqp focus on i965.


Shader
cache is a unique feature, because it affects a large part of the API.
It doesn't necessarily follow that covering the API will cover the
feature, or that that is an effective test strategy.


As you say, the scope of the feature is nearly the entire API. You
want us to start building a test suite that assumes the current
implementation details of the i965 shader cache, and tests it?


Re: [Mesa-dev] [PATCH] i965: Enable disk shader cache by default

2017-12-05 Thread Mark Janes
Jordan Justen  writes:

> On 2017-12-05 14:49:28, Mark Janes wrote:
>> Jordan Justen  writes:
>> 
>> > On 2017-12-05 09:13:11, Mark Janes wrote:
>> >> Jordan Justen  writes:
>> >> 
>> >> > On 2017-11-08 17:26:47, Timothy Arceri wrote:
>> >> >> Reviewed-by: Timothy Arceri 
>> >> >> 
>> >> >> Mark may want to consider adding some of the once a day type CI runs 
>> >> >> for 
>> >> >> this. For example running the test suite for two consecutive runs on 
>> >> >> the 
>> >> >> same build so that the second run uses the shader cache and also a 
>> >> >> second run the uses MESA_GLSL=cache_fb to force testing of the cache 
>> >> >> fallback path.
>> >> >
>> >> > Yeah. We discussed this previously, but I don't think it's been
>> >> > implemented yet. My opinion is that it could perhaps be a weekly test.
>> >> 
>> >> This automation is implemented now. It found the issues reported in
>> >> https://bugs.freedesktop.org/show_bug.cgi?id=103988
>> >> 
>> >> My opinion is that this test strategy is inadequate.
>> >
>> > Meaning you think we cannot enable i965 shader cache for the 18.0
>> > release?
>> 
>> I haven't heard anyone express confidence that this feature is bug-free,
>> and I don't know on what basis that claim could be made.  I appreciate
>> that a lot of have manual testing has been done.  Do you feel confident
>> that the cache can be enabled for all mesa customers without disruption?
>
> If we had enabled it two months ago, then yes, we would have worked
> through the niche issues, or discovered the so-far-hidden major land
> mine. At this point, it's getting risky. In a month, I will say no.
>
>> > We are already ~1.5 months away from the next stable branch point. If
>> > we want to enable this in 18.0, we should be using this time to see if
>> > enabling the cache by default has some large unexpected side effect in
>> > our graphics stack, or breaks real-world apps.
>> 
>> I agree.  We should encourage as many users as possible to enable the
>> shader cache in their environments.  At least one stable release should
>> be out with a working cache, where the feature can be enabled by those
>> who are eager to benefit from it.  I assume there will be a lot of them,
>> and they could flush out issues for everyone who has no idea what a
>> shader cache is.
>> 
>> >> Adding a dimension to the test matrix has high cost, especially when
>> >> combined with other dimensions of the test matrix (does shader cache
>> >> need to be tested for 32/64 bit builds? For hswgt1/hswgt2/hswgt3e?).
>> >
>> > Are you saying this is too high cost to run per check-in? Do you need
>> > to disable it for the health of CI? I think I proposed that daily, or
>> > perhaps even weekly would be adequate.
>> 
>> Certainly, the test time per line of shader cache code is massively
>> higher than any other feature, even if you run it just once a month.
>> Other features have tests that run in milliseconds, not 30min * 20
>> machines.
>
> The scope of this feature is nearly the entire API. It is justified to
> throw the entire GL suite of tests at it on a regular basis. The cost
> of running this once per week ought to be reasonable.
>
> Is the cost of running this per check-in too high for the system
> today? Or, is it that you think it is too high for the feature? I'm
> sympathetic to the former, and not so much to the later. :)
>
> By the way, enabling these in CI has been helpful in highlighting a
> few corner case issues. So, even if you don't like it, *Thank You* for
> enabling it. :)
>
>> Beyond poor return on execution time, there is the simple fact that
>> whoever is running the CI needs to manually look at shader cache results
>> separately from the rest of the tests.  Unit testing is effective
>> because coverage can be added at approximately zero marginal cost.
>> 
>> 3 years from now, will we still be looking at separate weekly shader
>> cache test runs to make sure it's working?
>
> I actually think that long term this should become part of the main
> daily and weekly tests. (An idea you've already rejected.) In the long
> term it should be stable enough for that, and if it does ever regress,
> we'd what to see something sooner rather than later.
>
>> > These tests are already identifying some corner case issues. I'm not
>> > sure these would have impacted real-world apps yet, but I do think it
>> > is a good idea to run these tests regularly.
>> >
>> > You say this test strategy is inadequate. Perhaps. I do think it needs
>> > to be part of our test strategy. There is no way we are going to hit
>> > all the corners of the API better than running all of our tests with
>> > the cache enabled. Do you agree?
>> 
>> Tests should strive to cover the implementation, not the API.
>
> I don't think any of our test suites operate this way. Very little of
> piglit or deqp focus on i965.
>
>> Shader
>> cache is a unique feature, 

Re: [Mesa-dev] [PATCH 3/3] spirv: Allow OpPtrAccessChain for block indices

2017-12-05 Thread Kristian Høgsberg
On Thu, Nov 30, 2017 at 9:15 PM, Jason Ekstrand  wrote:
> After talking with Kristian a bit on IRC, I think there is a bit more
> explanation needed.  I would be happy to add something like this to the
> commit message or as a comment upon request.
>
> There is an interesting edge case in the variable pointers extension when
> you do an OpPtrAccessChain on a pointer to a struct decorated block.  In
> this case there is not entirely clear if you should look for an array stride
> and treat it as an implicit array of such structs or if you should treat it
> as a single block in an array of blocks which would imply incrementing the
> index portion of the descriptor tuple.  We choose the later approach and
> assume that it's an array of blocks and therefore an array of SSBOs.  This
> has two advantages:
>
>  1) The SPIR-V spec for the OpPtrAccessChain instruction says "Base is
> treated as the address of the first element of an array, and the Element
> element’s address is computed to be the base for the Indexes, as per
> OpAccessChain."  Taken literally, that would mean that your struct type is
> supposed to be treated as an array of such a struct and, since it's
> decorated block, that means an array of blocks which corresponds to an array
> descriptor.
>
>  2) Taking this approach ensures that any array dereference in an
> OpAccessChain can be replaced with an OpAccessChain that selects element 0
> of the array together with an OpPtrAccessChain taking that result and adding
> to the index.  In particular, it ensures that this works when the array
> dereference is on an array of SSBOs.
>
> The downside (there always is one) is that this might be somewhat surprising
> behavior to apps.  I can easily see an app slapping an ArrayStride on the
> pointer and expecting that struct to implicitly turn into an array of
> structs and being a bit shocked when they get GPU hangs because of trying to
> access some random texture as an SSBO.  We could attempt to be a bit
> "smarter" and go looking for an ArrayStride decoration and, if we find one,
> use that instead of incrementing the block index.  However, that seems like
> a recipe for disaster because the behavior suddenly depends on adding a
> decoration.
>
> Really, this whole mess is an unfortunate complication arising from
> attempting to add variable pointers onto a description of resource
> descriptors that's based on GLSL syntax.  I'm in the process of trying to
> get things clarified within the SPIR-V working group in Khronos.  In the
> mean time, I believe this to be the best and most reasonable interpretation
> of the spec as-written today.

Series

Reviewed-by: Kristian H. Kristensen 

with the caveat that the behavior is still up in the air, but the
series looks like it does what it's intended to do, and it seems that
this is what the Khronos discussion and CTS is converging on.

> --Jason
>
>
> On Thu, Nov 30, 2017 at 7:06 PM, Jason Ekstrand 
> wrote:
>>
>> The SPIR-V spec is a bit underspecified when it comes to exactly how
>> you're allowed to use OpPtrAccessChain and what it means in certain edge
>> cases.  In particular, what if the base pointer of the OpPtrAccessChain
>> points to the base struct of an SSBO instead of an element in that SSBO.
>> The original variable pointers implementation in mesa assumed that you
>> weren't allowed to do an OpPtrAccessChain that adjusted the block index
>> and asserted such.  However, there are some CTS tests that do this and,
>> if the CTS does it, someone will do it in the wild so we should probably
>> handle it.  With this commit, we significantly reduce our assumptions
>> and should be able to handle more-or-less anything.
>>
>> The one assumption we still make for correctness is that if we see an
>> OpPtrAccessChain on a pointer to a struct decorated block that the block
>> index should be adjusted.  In theory, someone could try to put an array
>> stride on such a pointer and try to make the SSBO an implicit array of
>> the base struct and we would not give them what they want.  That said,
>> any index other than 0 would count as an out-of-bounds access which is
>> invalid.
>> ---
>>  src/compiler/spirv/vtn_variables.c | 128
>> -
>>  1 file changed, 83 insertions(+), 45 deletions(-)
>>
>> diff --git a/src/compiler/spirv/vtn_variables.c
>> b/src/compiler/spirv/vtn_variables.c
>> index 09b3d35..89ce939 100644
>> --- a/src/compiler/spirv/vtn_variables.c
>> +++ b/src/compiler/spirv/vtn_variables.c
>> @@ -157,6 +157,22 @@ vtn_variable_resource_index(struct vtn_builder *b,
>> struct vtn_variable *var,
>> return >dest.ssa;
>>  }
>>
>> +static nir_ssa_def *
>> +vtn_resource_reindex(struct vtn_builder *b, nir_ssa_def *base_index,
>> + nir_ssa_def *offset_index)
>> +{
>> +   nir_intrinsic_instr *instr =
>> +  nir_intrinsic_instr_create(b->nb.shader,
>> + 

Re: [Mesa-dev] [PATCH 04/29] anv/blorp: Rework HiZ ops to look like MCS and CCS

2017-12-05 Thread Nanley Chery
On Mon, Nov 27, 2017 at 07:05:54PM -0800, Jason Ekstrand wrote:
> ---
>  src/intel/vulkan/anv_blorp.c   | 38 
> ++
>  src/intel/vulkan/anv_private.h |  9 +
>  src/intel/vulkan/genX_cmd_buffer.c | 11 ++-
>  3 files changed, 33 insertions(+), 25 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index f10adf0..da273d6 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -1568,26 +1568,30 @@ anv_image_copy_to_shadow(struct anv_cmd_buffer 
> *cmd_buffer,
> blorp_batch_finish();
>  }
>  
> -void
> -anv_gen8_hiz_op_resolve(struct anv_cmd_buffer *cmd_buffer,
> -const struct anv_image *image,
> -enum blorp_hiz_op op)
> +static enum blorp_hiz_op
> +isl_to_blorp_hiz_op(enum isl_aux_op isl_op)
>  {
> -   assert(image);
> +   switch (isl_op) {

Is the NONE case missing?

> +   case ISL_AUX_OP_FAST_CLEAR:   return BLORP_HIZ_OP_DEPTH_CLEAR;
> +   case ISL_AUX_OP_FULL_RESOLVE: return BLORP_HIZ_OP_DEPTH_RESOLVE;
> +   case ISL_AUX_OP_AMBIGUATE:return BLORP_HIZ_OP_HIZ_RESOLVE;
> +   default:
> +  unreachable("Unsupported HiZ aux op");
> +   }
> +}
>  
> +void
> +anv_image_hiz_op(struct anv_cmd_buffer *cmd_buffer,
> + const struct anv_image *image,
> + VkImageAspectFlagBits aspect, uint32_t level,
> + uint32_t base_layer, uint32_t layer_count,
> + enum isl_aux_op hiz_op)
> +{
> +   assert(aspect == VK_IMAGE_ASPECT_DEPTH_BIT);

Assuming this will handle fast depth/stencil clears, do we want to
future-proof this assert by replacing `==` with `&`?

> +   assert(base_layer + layer_count <= anv_image_aux_layers(image, aspect, 
> 0));
 ^
I think we should replace `0` with `level`.

-Nanley

> assert(anv_image_aspect_to_plane(image->aspects,
>  VK_IMAGE_ASPECT_DEPTH_BIT) == 0);
>  
> -   /* Don't resolve depth buffers without an auxiliary HiZ buffer and
> -* don't perform such a resolve on gens that don't support it.
> -*/
> -   if (cmd_buffer->device->info.gen < 8 ||
> -   image->planes[0].aux_usage != ISL_AUX_USAGE_HIZ)
> -  return;
> -
> -   assert(op == BLORP_HIZ_OP_HIZ_RESOLVE ||
> -  op == BLORP_HIZ_OP_DEPTH_RESOLVE);
> -
> struct blorp_batch batch;
> blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
>  
> @@ -1597,7 +1601,9 @@ anv_gen8_hiz_op_resolve(struct anv_cmd_buffer 
> *cmd_buffer,
>  ISL_AUX_USAGE_HIZ, );
> surf.clear_color.f32[0] = ANV_HZ_FC_VAL;
>  
> -   blorp_hiz_op(, , 0, 0, 1, op);
> +   blorp_hiz_op(, , level, base_layer, layer_count,
> +isl_to_blorp_hiz_op(hiz_op));
> +
> blorp_batch_finish();
>  }
>  
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
> index dc44ab6..5dd95a3 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -2530,10 +2530,11 @@ anv_can_sample_with_hiz(const struct gen_device_info 
> * const devinfo,
>  }
>  
>  void
> -anv_gen8_hiz_op_resolve(struct anv_cmd_buffer *cmd_buffer,
> -const struct anv_image *image,
> -enum blorp_hiz_op op);
> -
> +anv_image_hiz_op(struct anv_cmd_buffer *cmd_buffer,
> + const struct anv_image *image,
> + VkImageAspectFlagBits aspect, uint32_t level,
> + uint32_t base_layer, uint32_t layer_count,
> + enum isl_aux_op hiz_op);
>  void
>  anv_image_mcs_op(struct anv_cmd_buffer *cmd_buffer,
>   const struct anv_image *image,
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 2e7a2cc..0c1ae83 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -388,19 +388,20 @@ transition_depth_buffer(struct anv_cmd_buffer 
> *cmd_buffer,
>anv_layout_to_aux_usage(_buffer->device->info, image,
>VK_IMAGE_ASPECT_DEPTH_BIT, final_layout);
>  
> -   enum blorp_hiz_op hiz_op;
> +   enum isl_aux_op hiz_op;
> if (hiz_enabled && !enable_hiz) {
> -  hiz_op = BLORP_HIZ_OP_DEPTH_RESOLVE;
> +  hiz_op = ISL_AUX_OP_FULL_RESOLVE;
> } else if (!hiz_enabled && enable_hiz) {
> -  hiz_op = BLORP_HIZ_OP_HIZ_RESOLVE;
> +  hiz_op = ISL_AUX_OP_AMBIGUATE;
> } else {
>assert(hiz_enabled == enable_hiz);
>/* If the same buffer will be used, no resolves are necessary. */
> -  hiz_op = BLORP_HIZ_OP_NONE;
> +  hiz_op = ISL_AUX_OP_NONE;
> }
>  
> if (hiz_op != BLORP_HIZ_OP_NONE)
> -  anv_gen8_hiz_op_resolve(cmd_buffer, image, hiz_op);
> +  anv_image_hiz_op(cmd_buffer, image, VK_IMAGE_ASPECT_DEPTH_BIT,
> +   0, 0, 1, hiz_op);
>  }
>  
>  

Re: [Mesa-dev] [PATCH v4 00/44] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

2017-12-05 Thread Chema Casanova
On 05/12/17 18:31, Chema Casanova wrote:
> El 05/12/17 a las 06:16, Jason Ekstrand escribió:
>> A couple of notes:
>>
>>  1) I *think* I gave you enough reviews to land the UBO/SSBO part and
>> the optimizations in 26-28.  If reviews are still missing anywhere,
>> please let me know.  If not, let's try and get that part landed.
> 
> The series is almost ready to land, I have only pending to address your
> feedback about use untyped_read for reading vec3 ssbos.
> 
> The only missing explicit R-b is that " [PATCH v4 28/44] i965/fs: Use
> untyped_surface_read for 16-bit load_ssbo" and "[PATCH v4 23/44]
> i965/fs: Enables 16-bit load_ubo with sampler" i've just answered your
> review to confirm the R-b.
> 
> I expect to finish today vec3 ssbo and send the series to Jenkins before
> landing, confirm your "pending" R-b, do a last rebase to master and ask
> for a push.

I've just prepared a rebased branch with the reviewed commits ready to
land to enable VK_KHR_16bit_storage support for SSBO/UBO.

https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-v4-ubo-ssbo-to-land

As I don't have still commit access to mesa, maybe Eduardo or Alejandro
can land it for me tomorrow. But, Jason feel free to push it if you want.

Chema

>>  2) I send out a patch to rewrite assign_constant_locations which I
>> think should make it automatically handle 8 and 16-bit values as
>> well.  I'd rather do that than more special casing if everything works
>> out ok.
> 
> I'm testing this patch with 16-bits and make sure whatever is needed to
> have 16-bit working.
> 
>>
>>  3) I sent out a series of patches to enable pushing of UBOs in
>> Vulkan.  If we're not careful, these will clash with 16bit storage as
>> UBO support suddenly has to imply push constant support.  That said,
>> I"m willing to wait at least a little while before landing them to let
>> us get 16bit push constant support sorted out.  The UBO pushing
>> patches give us a nice little performance boost but we're nowhere near
>> a release and I don't want it blocking you.
> 
> That would be my next priority, so we would only have pending to land
> the 16-bit input/output support to finish this extension.
> 
> Chema
> 
>> On Wed, Nov 29, 2017 at 6:07 PM, Jose Maria Casanova Crespo
>> > wrote:
>>
>> Hello,
>>
>> this is the V4 series for the implementation of the
>> SPV_KHR_16bit_storage
>> and VK_KHR_16bit_storage extensions on the anv vulkan driver, in
>> addition
>> to the GLSL and NIR support needed.
>>
>> The original series can be found here [1], the following v2 [2]
>> and v3 [3].
>>
>> In short V4 includes the following:
>>
>>  * Reorder the series to enable features as they are implemented,
>> the series
>>    now enables first UBO and SSBO support, and then inputs/outputs and
>>    finally push constants.
>>  * Support the byte scattered read/write messages with different
>> bit sizes
>>    byte/word/dword.
>>  * Refactor of the store_ssbo code and also fix stores when
>> writemask was .yz
>>  * Uses the sampler for load_ubo avoiding the initial
>> implementation of
>>    the series using byte_scattered_read.
>>  * Addressed all the feedback provided by Jason and Topi on v3 review.
>>
>> This series is also available at:
>>
>> https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4
>> 
>>
>> The objective is to start landing part of this series, all
>> feedback has been
>> addressed for SSBO and UBO. But for input/outputs features it will
>> probably
>> need another iteration as was not completely reviewed. It is also
>> needed
>> to define the approach for push constants issues before of after
>> landing
>> the support with this implementation.
>>
>> Patches 1-5 and 8-17 have already been reviewed. Patch 7 was already
>> reviewed but as it has changed too much i would appreciate another
>> review. When patches until 25 or 28 are reviewed we could land
>> UBOs and
>> SSBOs support.
>>
>> Finally an updated overview of the patches:
>>
>> Patches 1-2 add 16-bit float, int and uint types to GLSL. This is
>> needed because NIR uses GLSL types internally. We use the enums
>> already defined at AMD_gpu_shader_half_float and NV_gpu_shader
>> extensions. Patch 2 updates mesa/st, in order to avoid warnings for
>> types not handled on a switch.
>>
>> Patches 3-6 add NIR support for those new GLSL 16-bit types,
>> conversion opcodes, and rounding modes for float to half-float
>> conversions.
>>
>> Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR support.
>>
>> Patches 10-12 add general 16-bit support for i965. This includes
>> handling of new types on several general purpose methods,
>> update/remove some asserts.
>>
>> Patches 

Re: [Mesa-dev] [PATCH 02/29] anv/blorp: Rework image clear/resolve helpers

2017-12-05 Thread Nanley Chery
On Mon, Nov 27, 2017 at 07:05:52PM -0800, Jason Ekstrand wrote:
> This replaces image_fast_clear and ccs_resolve with two new helpers that
> simply perform an isl_aux_op whatever that may be on CCS or MCS.  This
> is a bit cleaner as it separates performing the aux operation from which
> blorp helper we have to call to do it.
> ---
>  src/intel/vulkan/anv_blorp.c   | 218 
> ++---
>  src/intel/vulkan/anv_private.h |  23 ++--
>  src/intel/vulkan/genX_cmd_buffer.c |  28 +++--
>  3 files changed, 165 insertions(+), 104 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index e244468..7c8a673 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -1439,75 +1439,6 @@ fast_clear_aux_usage(const struct anv_image *image,
>  }
>  
>  void
> -anv_image_fast_clear(struct anv_cmd_buffer *cmd_buffer,
> - const struct anv_image *image,
> - VkImageAspectFlagBits aspect,
> - const uint32_t base_level, const uint32_t level_count,
> - const uint32_t base_layer, uint32_t layer_count)
> -{
> -   assert(image->type == VK_IMAGE_TYPE_3D || image->extent.depth == 1);
> -
> -   if (image->type == VK_IMAGE_TYPE_3D) {
> -  assert(base_layer == 0);
> -  assert(layer_count == anv_minify(image->extent.depth, base_level));
> -   }
> -
> -   struct blorp_batch batch;
> -   blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
> -
> -   struct blorp_surf surf;
> -   get_blorp_surf_for_anv_image(cmd_buffer->device, image, aspect,
> -fast_clear_aux_usage(image, aspect),
> -);
> -
> -   /* From the Sky Lake PRM Vol. 7, "Render Target Fast Clear":
> -*
> -*"After Render target fast clear, pipe-control with color cache
> -*write-flush must be issued before sending any DRAW commands on
> -*that render target."
> -*
> -* This comment is a bit cryptic and doesn't really tell you what's going
> -* or what's really needed.  It appears that fast clear ops are not
> -* properly synchronized with other drawing.  This means that we cannot
> -* have a fast clear operation in the pipe at the same time as other
> -* regular drawing operations.  We need to use a PIPE_CONTROL to ensure
> -* that the contents of the previous draw hit the render target before we
> -* resolve and then use a second PIPE_CONTROL after the resolve to ensure
> -* that it is completed before any additional drawing occurs.
> -*/
> -   cmd_buffer->state.pending_pipe_bits |=
> -  ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;
> -
> -   uint32_t plane = anv_image_aspect_to_plane(image->aspects, aspect);
> -   uint32_t width_div = image->format->planes[plane].denominator_scales[0];
> -   uint32_t height_div = image->format->planes[plane].denominator_scales[1];
> -
> -   for (uint32_t l = 0; l < level_count; l++) {
> -  const uint32_t level = base_level + l;
> -
> -  const VkExtent3D extent = {
> - .width = anv_minify(image->extent.width, level),
> - .height = anv_minify(image->extent.height, level),
> - .depth = anv_minify(image->extent.depth, level),
> -  };
> -
> -  if (image->type == VK_IMAGE_TYPE_3D)
> - layer_count = extent.depth;
> -
> -  assert(level < anv_image_aux_levels(image, aspect));
> -  assert(base_layer + layer_count <= anv_image_aux_layers(image, aspect, 
> level));
> -  blorp_fast_clear(, , surf.surf->format,
> -   level, base_layer, layer_count,
> -   0, 0,
> -   extent.width / width_div,
> -   extent.height / height_div);
> -   }
> -
> -   cmd_buffer->state.pending_pipe_bits |=
> -  ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;
> -}
> -
> -void
>  anv_cmd_buffer_resolve_subpass(struct anv_cmd_buffer *cmd_buffer)
>  {
> struct anv_framebuffer *fb = cmd_buffer->state.framebuffer;
> @@ -1681,36 +1612,153 @@ anv_gen8_hiz_op_resolve(struct anv_cmd_buffer 
> *cmd_buffer,
>  }
>  
>  void
> -anv_ccs_resolve(struct anv_cmd_buffer * const cmd_buffer,
> -const struct anv_image * const image,
> -VkImageAspectFlagBits aspect,
> -const uint8_t level,
> -const uint32_t start_layer, const uint32_t layer_count,
> -const enum blorp_fast_clear_op op)
> +anv_image_mcs_op(struct anv_cmd_buffer *cmd_buffer,
> + const struct anv_image *image,
> + VkImageAspectFlagBits aspect,
> + uint32_t base_layer, uint32_t layer_count,
> + enum isl_aux_op mcs_op, bool predicate)
>  {
> -   assert(cmd_buffer && image);
> +   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> +   assert(image->samples > 1);
> +   assert(base_layer + layer_count 

Re: [Mesa-dev] [PATCH 03/29] anv/blorp: Support ISL_AUX_USAGE_HIZ in surf_for_anv_image

2017-12-05 Thread Nanley Chery
On Mon, Nov 27, 2017 at 07:05:53PM -0800, Jason Ekstrand wrote:
> If the function gets passed ANV_AUX_USAGE_DEFAULT, it still has the old
> behavior of setting ISL_AUX_USAGE_NONE for depth/stencil which is what
> we want for blits/copies.
> ---
>  src/intel/vulkan/anv_blorp.c | 22 ++
>  1 file changed, 6 insertions(+), 16 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index 7c8a673..f10adf0 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -193,12 +193,13 @@ get_blorp_surf_for_anv_image(const struct anv_device 
> *device,
>  {
> uint32_t plane = anv_image_aspect_to_plane(image->aspects, aspect);
>  
> -   if (aux_usage == ANV_AUX_USAGE_DEFAULT)
> +   if (aux_usage == ANV_AUX_USAGE_DEFAULT) {
>aux_usage = image->planes[plane].aux_usage;
>  
> -   if (aspect == VK_IMAGE_ASPECT_STENCIL_BIT ||
> -   aux_usage == ISL_AUX_USAGE_HIZ)
> -  aux_usage = ISL_AUX_USAGE_NONE;
> +  if (aspect == VK_IMAGE_ASPECT_STENCIL_BIT ||
> +  aux_usage == ISL_AUX_USAGE_HIZ)

I think the predicate no longer needs a check on the aspect. If the
aspect is stencil the aux_usage will either be NONE or HIZ. If the
aux_usage HIZ it'll be caught by the aux_usage check. If it's NONE
there's no work to be done. Either way, this patch is
Reviewed-by: Nanley Chery 


> + aux_usage = ISL_AUX_USAGE_NONE;
> +   }
>  
> const struct anv_surface *surface = >planes[plane].surface;
> *blorp_surf = (struct blorp_surf) {
> @@ -1593,18 +1594,7 @@ anv_gen8_hiz_op_resolve(struct anv_cmd_buffer 
> *cmd_buffer,
> struct blorp_surf surf;
> get_blorp_surf_for_anv_image(cmd_buffer->device,
>  image, VK_IMAGE_ASPECT_DEPTH_BIT,
> -ISL_AUX_USAGE_NONE, );
> -
> -   /* Manually add the aux HiZ surf */
> -   surf.aux_surf = >planes[0].aux_surface.isl,
> -   surf.aux_addr = (struct blorp_address) {
> -  .buffer = image->planes[0].bo,
> -  .offset = image->planes[0].bo_offset +
> -image->planes[0].aux_surface.offset,
> -  .mocs = cmd_buffer->device->default_mocs,
> -   };
> -   surf.aux_usage = ISL_AUX_USAGE_HIZ;
> -

Thanks for getting rid of that!

> +ISL_AUX_USAGE_HIZ, );
> surf.clear_color.f32[0] = ANV_HZ_FC_VAL;
>  
> blorp_hiz_op(, , 0, 0, 1, op);
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] V3 i965/Gallium ARB_get_program_binary support

2017-12-05 Thread Mike Lothian
I can confirm this fixes the Dead Island crash, which makes this the first
time I've been able to play the game since purchasing it :D

The patches doesn't apply cleanly to master so I used the following fixup
https://github.com/FireBurn/mesa/tree/gallium-program-binary

If you're happy with that feel free to add my Tested-by

On Tue, 5 Dec 2017 at 05:03 Timothy Arceri  wrote:

> Ping! The outstanding patches for review are:
>
> 4, 12, 22, 23
>
> Gallium specific patches:
>
> 17-21
>
> The following have a v1 r-b Nicolai but have changed since:
>
> 13, 14, 15
>
> Branch available here:
>
> https://github.com/tarceri/Mesa.git gallium-program-binary
>
> On 29/11/17 12:24, Timothy Arceri wrote:
> > V3:
> > This is basically the V2 that Jordan sent with feedback addressed,
> > gallium support added, some minor functional changes such as only
> > storing the default uniforms to either disk or program binary cache
> > (rather than fixing them up later) and some refactoring to allow
> > greater code sharing between gallium and i965.
> >
> > This series adds i965/gallium support for ARB_get_program_binary
> > with greater than 0 supported formats. Today we support this extension,
> > but advertise support for 0 formats. This series allows i965/gallium
> > to advertise support for 1 format.
> >
> > This series defines a common Mesa format for ARB_get_program_binary,
> > along with helper functions to read and write the format. The binary
> > saved can only be reloaded on the exact same Mesa build using the
> > exact same hardware.
> >
> > The i965/gallium implementation saves out a serialized nir/tgsi
> > represenation of the program. For i965 we can later add support for
> > saving the gen binary program as well. (We will still need the nir
> > program for state based recompiles.)
> >
> > This implementation passes piglit, deqp and glcts functions. It also
> > works with (and fixes a crash in) Dead Island with makes use of the
> > extension.
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler (v2)

2017-12-05 Thread Jason Ekstrand
Looks good to me.

On Tue, Dec 5, 2017 at 3:47 PM, Jose Maria Casanova Crespo <
jmcasan...@igalia.com> wrote:

> From: Jason Ekstrand 
>
> load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
> surface format defined. So when reading 16-bit components with the
> sampler we need to unshuffle two 16-bit components from each 32-bit
> component.
>
> Using the sampler avoids the use of the byte_scattered_read message
> that needs one message for each component and is supposed to be
> slower.
>
> v2: (Jason Ekstrand)
> - Simplify component selection and unshuffling for different bitsizes
> - Remove SKL optimization of reading only two 32-bit components when
>   reading 16-bits types.
>
> Reviewed-by: Jose Maria Casanova Crespo 
> ---
>  src/intel/compiler/brw_fs.cpp | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 91399c6c1d..93bb6b4673 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -191,14 +191,21 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
> fs_builder ,
>  vec4_result, surf_index, vec4_offset);
> inst->size_written = 4 * vec4_result.component_size(inst->exec_size);
>
> -   if (type_sz(dst.type) == 8) {
> -  shuffle_32bit_load_result_to_64bit_data(
> - bld, retype(vec4_result, dst.type), vec4_result, 2);
> +   fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
> +   switch (type_sz(dst.type)) {
> +   case 2:
> +  shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1);
> +  bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
> +  break;
> +   case 4:
> +  bld.MOV(dst, retype(dw, dst.type));
> +  break;
> +   case 8:
> +  shuffle_32bit_load_result_to_64bit_data(bld, dst, dw, 1);
> +  break;
> +   default:
> +  unreachable("Unsupported bit_size");
> }
> -
> -   vec4_result.type = dst.type;
> -   bld.MOV(dst, offset(vec4_result, bld,
> -   (const_offset & 0xf) / type_sz(vec4_result.type)));
>  }
>
>  /**
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel/compiler/gen10: Disable push constants.

2017-12-05 Thread Rafael Antognolli
We still have gpu hangs on Cannonlake when using push constants, so
disable them for now until we have a proper fix for these hangs.

Signed-off-by: Rafael Antognolli 
---
 src/intel/compiler/brw_fs.cpp | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 6772c0d5a54..37706667562 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -2033,6 +2033,15 @@ fs_visitor::assign_constant_locations()
if (subgroup_id_index >= 0)
   max_push_components--; /* Save a slot for the thread ID */
 
+   /* FIXME: We currently have some GPU hangs that happen apparently when using
+* push constants. Since we have no solution for such hangs yet, just
+* go ahead and use pull constants for now.
+*/
+   if (devinfo->gen == 10 && compiler->supports_pull_constants) {
+  compiler->shader_perf_log(log_data, "Disabling push constants.");
+  max_push_components = 0;
+   }
+
/* We push small arrays, but no bigger than 16 floats.  This is big enough
 * for a vec4 but hopefully not large enough to push out other stuff.  We
 * should probably use a better heuristic at some point.
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/1] radv: use a faster version for nir_op_pack_half_2x16

2017-12-05 Thread Dave Airlie
On 6 December 2017 at 03:02, Samuel Pitoiset  wrote:
> This patch is ported from RadeonSI and it has two effects.
>
> It fixes a rendering issue which affects F1 2017 and Dawn
> of War 3 (Vega only) because LLVM was ending up by generating
> the new v_mad_mix_{hi,lo} instructions which appear to be
> buggy in some way. Not sure if Mesa is generating something
> wrong or if the issue is in LLVM only. Anyway, that explains why
> the DOW3 issue can't be reproduced with GL on Vega.
>
> It also improves performance because v_cvt_pkrtz_f16 is faster,
> and because I guess the rounding mode behaviour is similar between
> GL and VK, we can use it. About performance, it improves Talos
> by +3/4% but I don't see any other impacts.
>
> No CTS regressions on Polaris (Vega in progress).

Seems like a good plan.

Reviewed-by: Dave Airlie 

Thanks for hunting that down.

Dave.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 12 +---
>  1 file changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 96ba289a81..663b27d265 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -1426,23 +1426,13 @@ static LLVMValueRef emit_bitfield_insert(struct 
> ac_llvm_context *ctx,
>  static LLVMValueRef emit_pack_half_2x16(struct ac_llvm_context *ctx,
> LLVMValueRef src0)
>  {
> -   LLVMValueRef const16 = LLVMConstInt(ctx->i32, 16, false);
> -   int i;
> LLVMValueRef comp[2];
>
> src0 = ac_to_float(ctx, src0);
> comp[0] = LLVMBuildExtractElement(ctx->builder, src0, ctx->i32_0, "");
> comp[1] = LLVMBuildExtractElement(ctx->builder, src0, ctx->i32_1, "");
> -   for (i = 0; i < 2; i++) {
> -   comp[i] = LLVMBuildFPTrunc(ctx->builder, comp[i], ctx->f16, 
> "");
> -   comp[i] = LLVMBuildBitCast(ctx->builder, comp[i], ctx->i16, 
> "");
> -   comp[i] = LLVMBuildZExt(ctx->builder, comp[i], ctx->i32, "");
> -   }
> -
> -   comp[1] = LLVMBuildShl(ctx->builder, comp[1], const16, "");
> -   comp[0] = LLVMBuildOr(ctx->builder, comp[0], comp[1], "");
>
> -   return comp[0];
> +   return ac_build_cvt_pkrtz_f16(ctx, comp);
>  }
>
>  static LLVMValueRef emit_unpack_half_2x16(struct ac_llvm_context *ctx,
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler (v2)

2017-12-05 Thread Jose Maria Casanova Crespo
From: Jason Ekstrand 

load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
surface format defined. So when reading 16-bit components with the
sampler we need to unshuffle two 16-bit components from each 32-bit
component.

Using the sampler avoids the use of the byte_scattered_read message
that needs one message for each component and is supposed to be
slower.

v2: (Jason Ekstrand)
- Simplify component selection and unshuffling for different bitsizes
- Remove SKL optimization of reading only two 32-bit components when
  reading 16-bits types.

Reviewed-by: Jose Maria Casanova Crespo 
---
 src/intel/compiler/brw_fs.cpp | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 91399c6c1d..93bb6b4673 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -191,14 +191,21 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
 vec4_result, surf_index, vec4_offset);
inst->size_written = 4 * vec4_result.component_size(inst->exec_size);
 
-   if (type_sz(dst.type) == 8) {
-  shuffle_32bit_load_result_to_64bit_data(
- bld, retype(vec4_result, dst.type), vec4_result, 2);
+   fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
+   switch (type_sz(dst.type)) {
+   case 2:
+  shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1);
+  bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
+  break;
+   case 4:
+  bld.MOV(dst, retype(dw, dst.type));
+  break;
+   case 8:
+  shuffle_32bit_load_result_to_64bit_data(bld, dst, dw, 1);
+  break;
+   default:
+  unreachable("Unsupported bit_size");
}
-
-   vec4_result.type = dst.type;
-   bld.MOV(dst, offset(vec4_result, bld,
-   (const_offset & 0xf) / type_sz(vec4_result.type)));
 }
 
 /**
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/29] anv/blorp: Rework image clear/resolve helpers

2017-12-05 Thread Nanley Chery
On Mon, Nov 27, 2017 at 07:05:52PM -0800, Jason Ekstrand wrote:
> This replaces image_fast_clear and ccs_resolve with two new helpers that
> simply perform an isl_aux_op whatever that may be on CCS or MCS.  This
> is a bit cleaner as it separates performing the aux operation from which
> blorp helper we have to call to do it.
> ---
>  src/intel/vulkan/anv_blorp.c   | 218 
> ++---
>  src/intel/vulkan/anv_private.h |  23 ++--
>  src/intel/vulkan/genX_cmd_buffer.c |  28 +++--
>  3 files changed, 165 insertions(+), 104 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index e244468..7c8a673 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -1439,75 +1439,6 @@ fast_clear_aux_usage(const struct anv_image *image,
>  }
>  
>  void
> -anv_image_fast_clear(struct anv_cmd_buffer *cmd_buffer,
> - const struct anv_image *image,
> - VkImageAspectFlagBits aspect,
> - const uint32_t base_level, const uint32_t level_count,
> - const uint32_t base_layer, uint32_t layer_count)
> -{
> -   assert(image->type == VK_IMAGE_TYPE_3D || image->extent.depth == 1);
> -
> -   if (image->type == VK_IMAGE_TYPE_3D) {
> -  assert(base_layer == 0);
> -  assert(layer_count == anv_minify(image->extent.depth, base_level));
> -   }
> -
> -   struct blorp_batch batch;
> -   blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
> -
> -   struct blorp_surf surf;
> -   get_blorp_surf_for_anv_image(cmd_buffer->device, image, aspect,
> -fast_clear_aux_usage(image, aspect),
> -);
> -
> -   /* From the Sky Lake PRM Vol. 7, "Render Target Fast Clear":
> -*
> -*"After Render target fast clear, pipe-control with color cache
> -*write-flush must be issued before sending any DRAW commands on
> -*that render target."
> -*
> -* This comment is a bit cryptic and doesn't really tell you what's going
> -* or what's really needed.  It appears that fast clear ops are not
> -* properly synchronized with other drawing.  This means that we cannot
> -* have a fast clear operation in the pipe at the same time as other
> -* regular drawing operations.  We need to use a PIPE_CONTROL to ensure
> -* that the contents of the previous draw hit the render target before we
> -* resolve and then use a second PIPE_CONTROL after the resolve to ensure
> -* that it is completed before any additional drawing occurs.
> -*/
> -   cmd_buffer->state.pending_pipe_bits |=
> -  ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;
> -
> -   uint32_t plane = anv_image_aspect_to_plane(image->aspects, aspect);
> -   uint32_t width_div = image->format->planes[plane].denominator_scales[0];
> -   uint32_t height_div = image->format->planes[plane].denominator_scales[1];
> -
> -   for (uint32_t l = 0; l < level_count; l++) {
> -  const uint32_t level = base_level + l;
> -
> -  const VkExtent3D extent = {
> - .width = anv_minify(image->extent.width, level),
> - .height = anv_minify(image->extent.height, level),
> - .depth = anv_minify(image->extent.depth, level),
> -  };
> -
> -  if (image->type == VK_IMAGE_TYPE_3D)
> - layer_count = extent.depth;
> -
> -  assert(level < anv_image_aux_levels(image, aspect));
> -  assert(base_layer + layer_count <= anv_image_aux_layers(image, aspect, 
> level));
> -  blorp_fast_clear(, , surf.surf->format,
> -   level, base_layer, layer_count,
> -   0, 0,
> -   extent.width / width_div,
> -   extent.height / height_div);
> -   }
> -
> -   cmd_buffer->state.pending_pipe_bits |=
> -  ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;
> -}
> -
> -void
>  anv_cmd_buffer_resolve_subpass(struct anv_cmd_buffer *cmd_buffer)
>  {
> struct anv_framebuffer *fb = cmd_buffer->state.framebuffer;
> @@ -1681,36 +1612,153 @@ anv_gen8_hiz_op_resolve(struct anv_cmd_buffer 
> *cmd_buffer,
>  }
>  
>  void
> -anv_ccs_resolve(struct anv_cmd_buffer * const cmd_buffer,
> -const struct anv_image * const image,
> -VkImageAspectFlagBits aspect,
> -const uint8_t level,
> -const uint32_t start_layer, const uint32_t layer_count,
> -const enum blorp_fast_clear_op op)
> +anv_image_mcs_op(struct anv_cmd_buffer *cmd_buffer,
> + const struct anv_image *image,
> + VkImageAspectFlagBits aspect,
> + uint32_t base_layer, uint32_t layer_count,
> + enum isl_aux_op mcs_op, bool predicate)
>  {
> -   assert(cmd_buffer && image);
> +   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> +   assert(image->samples > 1);
> +   assert(base_layer + layer_count 

Re: [Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler

2017-12-05 Thread Chema Casanova


On 05/12/17 22:25, Chema Casanova wrote:
> On 05/12/17 19:53, Jason Ekstrand wrote:
>> On Tue, Dec 5, 2017 at 9:08 AM, Chema Casanova > > wrote:
>>
>> El 30/11/17 a las 23:58, Jason Ekstrand escribió:
>> > On Wed, Nov 29, 2017 at 6:50 PM, Jose Maria Casanova Crespo
>> > 
>> >> wrote:
>> >
>> >     load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
>> >     surface format defined. So when reading 16-bit components with the
>> >     sampler we need to unshuffle two 16-bit components from each
>> 32-bit
>> >     component.
>> >
>> >     Using the sampler avoids the use of the byte_scattered_read
>> message
>> >     that needs one message for each component and is supposed to be
>> >     slower.
>> >
>> >     In the case of SKL+ we take advantage of a hardware feature that
>> >     automatically defines a channel mask based on the rlen value,
>> so on
>> >     SKL+ we only use half of the registers without using a header
>> in the
>> >     payload.
>> >     ---
>> >      src/intel/compiler/brw_fs.cpp           | 31
>> >     +++
>> >      src/intel/compiler/brw_fs_generator.cpp | 10 --
>> >      2 files changed, 35 insertions(+), 6 deletions(-)
>> >
>> >     diff --git a/src/intel/compiler/brw_fs.cpp
>> >     b/src/intel/compiler/brw_fs.cpp
>> >     index 1ca4d416b2..9c543496ba 100644
>> >     --- a/src/intel/compiler/brw_fs.cpp
>> >     +++ b/src/intel/compiler/brw_fs.cpp
>> >     @@ -184,9 +184,17 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
>> >     fs_builder ,
>> >          * a double this means we are only loading 2 elements worth of
>> >     data.
>> >          * We also want to use a 32-bit data type for the dst of the
>> >     load operation
>> >          * so other parts of the driver don't get confused about the
>> >     size of the
>> >     -    * result.
>> >     +    * result. On the case of 16-bit data we only need half of the
>> >     32-bit
>> >     +    * components on SKL+ as we take advance of using message
>> >     return size to
>> >     +    * define an xy channel mask.
>> >          */
>> >     -   fs_reg vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
>> >     +   fs_reg vec4_result;
>> >     +   if (type_sz(dst.type) == 2 && (devinfo->gen >= 9)) {
>> >     +      vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
>> >     +      vec4_result = retype(vec4_result, BRW_REGISTER_TYPE_HF);
>> >     +   } else {
>> >     +      vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
>> >     +   }
>> >
>> >         fs_inst *inst =
>> >     bld.emit(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_LOGICAL,
>> >                                  vec4_result, surf_index,
>> vec4_offset);
>> >         inst->size_written = 4 *
>> >     vec4_result.component_size(inst->exec_size);
>> >     @@ -197,8 +205,23 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
>> >     fs_builder ,
>> >         }
>> >
>> >         vec4_result.type = dst.type;
>> >     -   bld.MOV(dst, offset(vec4_result, bld,
>> >     -                       (const_offset & 0xf) /
>> >     type_sz(vec4_result.type)));
>> >     +
>> >     +   if (type_sz(dst.type) == 2) {
>> >     +      /* 16-bit types need to be unshuffled as each pair of
>> >     16-bit components
>> >     +       * is packed on a 32-bit compoment because we are using a
>> >     32-bit format
>> >     +       * in the surface of uniform that is read by the sampler.
>> >     +       * TODO: On BDW+ mark when an uniform has 16-bit type so we
>> >     could setup a
>> >     +       * surface format of 16-bit and use the 16-bit return
>> >     format at the
>> >     +       * sampler.
>> >     +       */
>> >     +      vec4_result.stride = 2;
>> >     +      bld.MOV(dst, byte_offset(offset(vec4_result, bld,
>> >     +                                      (const_offset & 0x7) / 4),
>> >     +                               (const_offset & 0x7) / 2 % 2 *
>> 2));
>> >     +   } else {
>> >     +      bld.MOV(dst, offset(vec4_result, bld,
>> >     +                          (const_offset & 0xf) /
>> >     type_sz(vec4_result.type)));
>> >     +   }
>> >
>> >
>> > This seems overly complicated.  How about something like
>>
>> > fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
>> > switch (type_sz(dst.type)) {
>> > case 2:
>> >    shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1);
>> >    bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
>> >  

Re: [Mesa-dev] [PATCH] i965: Enable disk shader cache by default

2017-12-05 Thread Jordan Justen
On 2017-12-05 14:49:28, Mark Janes wrote:
> Jordan Justen  writes:
> 
> > On 2017-12-05 09:13:11, Mark Janes wrote:
> >> Jordan Justen  writes:
> >> 
> >> > On 2017-11-08 17:26:47, Timothy Arceri wrote:
> >> >> Reviewed-by: Timothy Arceri 
> >> >> 
> >> >> Mark may want to consider adding some of the once a day type CI runs 
> >> >> for 
> >> >> this. For example running the test suite for two consecutive runs on 
> >> >> the 
> >> >> same build so that the second run uses the shader cache and also a 
> >> >> second run the uses MESA_GLSL=cache_fb to force testing of the cache 
> >> >> fallback path.
> >> >
> >> > Yeah. We discussed this previously, but I don't think it's been
> >> > implemented yet. My opinion is that it could perhaps be a weekly test.
> >> 
> >> This automation is implemented now. It found the issues reported in
> >> https://bugs.freedesktop.org/show_bug.cgi?id=103988
> >> 
> >> My opinion is that this test strategy is inadequate.
> >
> > Meaning you think we cannot enable i965 shader cache for the 18.0
> > release?
> 
> I haven't heard anyone express confidence that this feature is bug-free,
> and I don't know on what basis that claim could be made.  I appreciate
> that a lot of have manual testing has been done.  Do you feel confident
> that the cache can be enabled for all mesa customers without disruption?

If we had enabled it two months ago, then yes, we would have worked
through the niche issues, or discovered the so-far-hidden major land
mine. At this point, it's getting risky. In a month, I will say no.

> > We are already ~1.5 months away from the next stable branch point. If
> > we want to enable this in 18.0, we should be using this time to see if
> > enabling the cache by default has some large unexpected side effect in
> > our graphics stack, or breaks real-world apps.
> 
> I agree.  We should encourage as many users as possible to enable the
> shader cache in their environments.  At least one stable release should
> be out with a working cache, where the feature can be enabled by those
> who are eager to benefit from it.  I assume there will be a lot of them,
> and they could flush out issues for everyone who has no idea what a
> shader cache is.
> 
> >> Adding a dimension to the test matrix has high cost, especially when
> >> combined with other dimensions of the test matrix (does shader cache
> >> need to be tested for 32/64 bit builds? For hswgt1/hswgt2/hswgt3e?).
> >
> > Are you saying this is too high cost to run per check-in? Do you need
> > to disable it for the health of CI? I think I proposed that daily, or
> > perhaps even weekly would be adequate.
> 
> Certainly, the test time per line of shader cache code is massively
> higher than any other feature, even if you run it just once a month.
> Other features have tests that run in milliseconds, not 30min * 20
> machines.

The scope of this feature is nearly the entire API. It is justified to
throw the entire GL suite of tests at it on a regular basis. The cost
of running this once per week ought to be reasonable.

Is the cost of running this per check-in too high for the system
today? Or, is it that you think it is too high for the feature? I'm
sympathetic to the former, and not so much to the later. :)

By the way, enabling these in CI has been helpful in highlighting a
few corner case issues. So, even if you don't like it, *Thank You* for
enabling it. :)

> Beyond poor return on execution time, there is the simple fact that
> whoever is running the CI needs to manually look at shader cache results
> separately from the rest of the tests.  Unit testing is effective
> because coverage can be added at approximately zero marginal cost.
> 
> 3 years from now, will we still be looking at separate weekly shader
> cache test runs to make sure it's working?

I actually think that long term this should become part of the main
daily and weekly tests. (An idea you've already rejected.) In the long
term it should be stable enough for that, and if it does ever regress,
we'd what to see something sooner rather than later.

> > These tests are already identifying some corner case issues. I'm not
> > sure these would have impacted real-world apps yet, but I do think it
> > is a good idea to run these tests regularly.
> >
> > You say this test strategy is inadequate. Perhaps. I do think it needs
> > to be part of our test strategy. There is no way we are going to hit
> > all the corners of the API better than running all of our tests with
> > the cache enabled. Do you agree?
> 
> Tests should strive to cover the implementation, not the API.

I don't think any of our test suites operate this way. Very little of
piglit or deqp focus on i965.

> Shader
> cache is a unique feature, because it affects a large part of the API.
> It doesn't necessarily follow that covering the API will cover the
> feature, or that that is an effective test strategy.

As you 

Re: [Mesa-dev] [PATCH v2 3/7] intel: emit first_vertex and reorder the VE' components

2017-12-05 Thread Kenneth Graunke
On Monday, December 4, 2017 12:12:49 PM PST Antia Puentes wrote:
> The new order is:
> * VE 1: 
> * VE 2: 
> 
> Previously it was:
> * VE 1: 
> * VE 2: 
> 
> The gl_BaseVertex is in a new location now, and firstvertex occupies
> the old gl_BaseVertex place. This way we can keep pointing to the
> indirect buffer for indirect draw calls.
> 
> Reviewed-by: Neil Roberts 
> ---
>  src/intel/compiler/brw_nir.c  | 11 +---
>  src/intel/compiler/brw_vec4.cpp   | 13 +
>  src/mesa/drivers/dri/i965/brw_context.h   | 36 ++---
>  src/mesa/drivers/dri/i965/brw_draw.c  | 26 +++---
>  src/mesa/drivers/dri/i965/brw_draw_upload.c   | 13 -
>  src/mesa/drivers/dri/i965/genX_state_upload.c | 39 
> +++
>  6 files changed, 88 insertions(+), 50 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
> index 8f3f77f89ae..f702f5b8534 100644
> --- a/src/intel/compiler/brw_nir.c
> +++ b/src/intel/compiler/brw_nir.c
> @@ -240,7 +240,7 @@ brw_nir_lower_vs_inputs(nir_shader *nir,
>  */
> const bool has_sgvs =
>nir->info.system_values_read &
> -  (BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX) |
> +  (BITFIELD64_BIT(SYSTEM_VALUE_FIRST_VERTEX) |
> BITFIELD64_BIT(SYSTEM_VALUE_BASE_INSTANCE) |
> BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) |
> BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID));
> @@ -262,6 +262,7 @@ brw_nir_lower_vs_inputs(nir_shader *nir,
>  nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
>  
>  switch (intrin->intrinsic) {
> +case nir_intrinsic_load_first_vertex:
>  case nir_intrinsic_load_base_vertex:
>  case nir_intrinsic_load_base_instance:
>  case nir_intrinsic_load_vertex_id_zero_base:
> @@ -279,7 +280,7 @@ brw_nir_lower_vs_inputs(nir_shader *nir,
>  
> nir_intrinsic_set_base(load, num_inputs);
> switch (intrin->intrinsic) {
> -   case nir_intrinsic_load_base_vertex:
> +   case nir_intrinsic_load_first_vertex:
>nir_intrinsic_set_component(load, 0);
>break;
> case nir_intrinsic_load_base_instance:
> @@ -292,11 +293,15 @@ brw_nir_lower_vs_inputs(nir_shader *nir,
>nir_intrinsic_set_component(load, 3);
>break;
> case nir_intrinsic_load_draw_id:
> +   case nir_intrinsic_load_base_vertex:
>/* gl_DrawID is stored right after gl_VertexID and friends
> * if any of them exist.
> */
>nir_intrinsic_set_base(load, num_inputs + has_sgvs);
> -  nir_intrinsic_set_component(load, 0);
> +  if (intrin->intrinsic ==  nir_intrinsic_load_draw_id)
> + nir_intrinsic_set_component(load, 0);
> +  else
> + nir_intrinsic_set_component(load, 1);
>break;
> default:
>unreachable("Invalid system value intrinsic");
> diff --git a/src/intel/compiler/brw_vec4.cpp b/src/intel/compiler/brw_vec4.cpp
> index 14f930e0264..70a197a9fa0 100644
> --- a/src/intel/compiler/brw_vec4.cpp
> +++ b/src/intel/compiler/brw_vec4.cpp
> @@ -2789,13 +2789,19 @@ brw_compile_vs(const struct brw_compiler *compiler, 
> void *log_data,
>  * incoming vertex attribute.  So, add an extra slot.
>  */
> if (shader->info.system_values_read &
> -   (BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX) |
> +   (BITFIELD64_BIT(SYSTEM_VALUE_FIRST_VERTEX) |
>  BITFIELD64_BIT(SYSTEM_VALUE_BASE_INSTANCE) |
>  BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) |
>  BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID))) {
>nr_attribute_slots++;
> }
>  
> +   if (shader->info.system_values_read &
> +   (BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX) |
> +BITFIELD64_BIT(SYSTEM_VALUE_DRAW_ID))) {
> +  nr_attribute_slots++;
> +   }
> +
> if (shader->info.system_values_read &
> BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX))
>prog_data->uses_basevertex = true;
> @@ -2816,12 +2822,9 @@ brw_compile_vs(const struct brw_compiler *compiler, 
> void *log_data,
> BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID))
>prog_data->uses_instanceid = true;
>  
> -   /* gl_DrawID has its very own vec4 */
> if (shader->info.system_values_read &
> -   BITFIELD64_BIT(SYSTEM_VALUE_DRAW_ID)) {
> +   BITFIELD64_BIT(SYSTEM_VALUE_DRAW_ID))
>prog_data->uses_drawid = true;
> -  nr_attribute_slots++;
> -   }
>  
> /* The 3DSTATE_VS documentation lists the lower bound on "Vertex URB Entry
>  * Read Length" as 1 in vec4 mode, and 0 in SIMD8 mode.  

Re: [Mesa-dev] [PATCH] i965: Enable disk shader cache by default

2017-12-05 Thread Timothy Arceri

On 06/12/17 09:49, Mark Janes wrote:

Jordan Justen  writes:


On 2017-12-05 09:13:11, Mark Janes wrote:

Jordan Justen  writes:


On 2017-11-08 17:26:47, Timothy Arceri wrote:

Reviewed-by: Timothy Arceri 

Mark may want to consider adding some of the once a day type CI runs for
this. For example running the test suite for two consecutive runs on the
same build so that the second run uses the shader cache and also a
second run the uses MESA_GLSL=cache_fb to force testing of the cache
fallback path.


Yeah. We discussed this previously, but I don't think it's been
implemented yet. My opinion is that it could perhaps be a weekly test.


This automation is implemented now. It found the issues reported in
https://bugs.freedesktop.org/show_bug.cgi?id=103988

My opinion is that this test strategy is inadequate.


Meaning you think we cannot enable i965 shader cache for the 18.0
release?


I haven't heard anyone express confidence that this feature is bug-free,
and I don't know on what basis that claim could be made.  I appreciate
that a lot of have manual testing has been done.  Do you feel confident
that the cache can be enabled for all mesa customers without disruption?



I would suggest enabling it now in master it can be switched off again 
before release. For what its worth besides 1 or 2 dev branch related 
issues we still haven't had any shader cache related regressions 
reported for radeonsi (or any of the other gallium drivers with shader 
cache) and its been enabled for a couple of releases now.




We are already ~1.5 months away from the next stable branch point. If
we want to enable this in 18.0, we should be using this time to see if
enabling the cache by default has some large unexpected side effect in
our graphics stack, or breaks real-world apps.


I agree.  We should encourage as many users as possible to enable the
shader cache in their environments.  At least one stable release should
be out with a working cache, where the feature can be enabled by those
who are eager to benefit from it.  I assume there will be a lot of them,
and they could flush out issues for everyone who has no idea what a
shader cache is.


Adding a dimension to the test matrix has high cost, especially when
combined with other dimensions of the test matrix (does shader cache
need to be tested for 32/64 bit builds? For hswgt1/hswgt2/hswgt3e?).


Are you saying this is too high cost to run per check-in? Do you need
to disable it for the health of CI? I think I proposed that daily, or
perhaps even weekly would be adequate.


Certainly, the test time per line of shader cache code is massively
higher than any other feature, even if you run it just once a month.
Other features have tests that run in milliseconds, not 30min * 20
machines.

Beyond poor return on execution time, there is the simple fact that
whoever is running the CI needs to manually look at shader cache results
separately from the rest of the tests.  Unit testing is effective
because coverage can be added at approximately zero marginal cost.

3 years from now, will we still be looking at separate weekly shader
cache test runs to make sure it's working?


These tests are already identifying some corner case issues. I'm not
sure these would have impacted real-world apps yet, but I do think it
is a good idea to run these tests regularly.

You say this test strategy is inadequate. Perhaps. I do think it needs
to be part of our test strategy. There is no way we are going to hit
all the corners of the API better than running all of our tests with
the cache enabled. Do you agree?


Tests should strive to cover the implementation, not the API.  Shader
cache is a unique feature, because it affects a large part of the API.
It doesn't necessarily follow that covering the API will cover the
feature, or that that is an effective test strategy.

Knowing nothing of the implementation, I would look to similar projects
(eg: ccache) to see what test strategies they employ.  There are
probably issues they've encountered that we have not thought of.

Another approach would be to write real unit tests for bugs found during
the course of the shader cache implementation.  Those test cases would
help developers know that they broke a cache feature without having to
rely on a weekly CI run.


The problem is these tests would just be.

1. Compile shaders
2. Run program
3. Recompile all the same shaders from Step 1
4. Re-run the program in the same was as Step 2

Shader-cache is not like other features in that you need to exercise 
pretty much ever other feature (hence re-running piglit) in order to be 
confident you have full coverage.





Since 103988 is not hardware specific, and is not a regression, it's not
something that could only have been caught by CI.  I'm surprised to find
it at this point, considering piglit was used to validate the feature.

I'd be happy if there was at least a smoke test where 

Re: [Mesa-dev] [PATCH v4 28/44] i965/fs: Use untyped_surface_read for 16-bit load_ssbo (v2)

2017-12-05 Thread Chema Casanova
On 05/12/17 23:47, Jason Ekstrand wrote:
> On Tue, Dec 5, 2017 at 1:36 PM, Jose Maria Casanova Crespo
> > wrote:
> 
> SSBO loads were using byte_scattered read messages as they allow
> reading 16-bit size components. byte_scattered messages can only
> operate one component at a time so we needed to emit as many messages
> as components.
> 
> But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
> untyped_surface_read message to read pairs of 16-bit components using
> only one message. Once each pair is read it is unshuffled to return the
> proper 16-bit components. vec3 case is assimilated to vec4 but the 4th
> component is ignored.
> 
> 16-bit scalars are read using one byte_scattered_read message.
> 
> v2: Removed use of stride = 2 on sources (Jason Ekstrand)
>     Rework optimization using unshuffle 16 reads (Chema Casanova)
> v3: Use W and D types insead of HF and F in shuffle to avoid rounding
>     erros (Jason Ekstrand)
>     Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)
> 
> CC: Jason Ekstrand >
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 29 ++---
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index e11e75e6332..8deec082d59 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -2303,16 +2303,31 @@ do_untyped_vector_read(const fs_builder ,
>                         unsigned num_components)
>  {
>     if (type_sz(dest.type) <= 2) {
> -      fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
> -      bld.MOV(read_offset, offset_reg);
> -      for (unsigned i = 0; i < num_components; i++) {
> -         fs_reg read_reg =
> -            emit_byte_scattered_read(bld, surf_index, read_offset,
> +      assert(dest.stride == 1);
> +
> +      if (num_components > 1) {
> +         /* Pairs of 16-bit components can be read with untyped
> read, for 16-bit
> +          * vec3 4th component is ignored.
> +          */
> +         fs_reg read_result =
> +            emit_untyped_read(bld, surf_index, offset_reg,
> +                              1 /* dims */,
> DIV_ROUND_UP(num_components, 2),
> +                              BRW_PREDICATE_NONE);
> +         shuffle_32bit_load_result_to_16bit_data(bld,
> +               retype(dest, BRW_REGISTER_TYPE_W),
> +               retype(read_result, BRW_REGISTER_TYPE_D),
> +               num_components);
> +      } else {
> +         assert(num_components == 1);
> +         /* scalar 16-bit are read using one byte_scattered_read
> message */
> +         fs_reg read_result =
> +            emit_byte_scattered_read(bld, surf_index, offset_reg,
>                                       1 /* dims */, 1,
>                                       type_sz(dest.type) * 8 /*
> bit_size */,
>                                       BRW_PREDICATE_NONE);
> -         bld.MOV(offset(dest, bld, i), subscript(read_reg,
> dest.type, 0));
> -         bld.ADD(read_offset, read_offset,
> brw_imm_ud(type_sz(dest.type)));
> +         read_result.type = dest.type;
> +         read_result.stride = 2;
> +         bld.MOV(dest, read_result);
> 
> 
> If read_reg has a 32-bit type, you could use subscript here.  Meh.

Fixed locally.

> Reviewed-by: Jason Ekstrand   
> 
>        }
>     } else if (type_sz(dest.type) == 4) {
>        fs_reg read_result = emit_untyped_read(bld, surf_index,
> offset_reg,
> --
> 2.11.0
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable disk shader cache by default

2017-12-05 Thread Mark Janes
Jordan Justen  writes:

> On 2017-12-05 09:13:11, Mark Janes wrote:
>> Jordan Justen  writes:
>> 
>> > On 2017-11-08 17:26:47, Timothy Arceri wrote:
>> >> Reviewed-by: Timothy Arceri 
>> >> 
>> >> Mark may want to consider adding some of the once a day type CI runs for 
>> >> this. For example running the test suite for two consecutive runs on the 
>> >> same build so that the second run uses the shader cache and also a 
>> >> second run the uses MESA_GLSL=cache_fb to force testing of the cache 
>> >> fallback path.
>> >
>> > Yeah. We discussed this previously, but I don't think it's been
>> > implemented yet. My opinion is that it could perhaps be a weekly test.
>> 
>> This automation is implemented now. It found the issues reported in
>> https://bugs.freedesktop.org/show_bug.cgi?id=103988
>> 
>> My opinion is that this test strategy is inadequate.
>
> Meaning you think we cannot enable i965 shader cache for the 18.0
> release?

I haven't heard anyone express confidence that this feature is bug-free,
and I don't know on what basis that claim could be made.  I appreciate
that a lot of have manual testing has been done.  Do you feel confident
that the cache can be enabled for all mesa customers without disruption?

> We are already ~1.5 months away from the next stable branch point. If
> we want to enable this in 18.0, we should be using this time to see if
> enabling the cache by default has some large unexpected side effect in
> our graphics stack, or breaks real-world apps.

I agree.  We should encourage as many users as possible to enable the
shader cache in their environments.  At least one stable release should
be out with a working cache, where the feature can be enabled by those
who are eager to benefit from it.  I assume there will be a lot of them,
and they could flush out issues for everyone who has no idea what a
shader cache is.

>> Adding a dimension to the test matrix has high cost, especially when
>> combined with other dimensions of the test matrix (does shader cache
>> need to be tested for 32/64 bit builds? For hswgt1/hswgt2/hswgt3e?).
>
> Are you saying this is too high cost to run per check-in? Do you need
> to disable it for the health of CI? I think I proposed that daily, or
> perhaps even weekly would be adequate.

Certainly, the test time per line of shader cache code is massively
higher than any other feature, even if you run it just once a month.
Other features have tests that run in milliseconds, not 30min * 20
machines.

Beyond poor return on execution time, there is the simple fact that
whoever is running the CI needs to manually look at shader cache results
separately from the rest of the tests.  Unit testing is effective
because coverage can be added at approximately zero marginal cost.

3 years from now, will we still be looking at separate weekly shader
cache test runs to make sure it's working?

> These tests are already identifying some corner case issues. I'm not
> sure these would have impacted real-world apps yet, but I do think it
> is a good idea to run these tests regularly.
>
> You say this test strategy is inadequate. Perhaps. I do think it needs
> to be part of our test strategy. There is no way we are going to hit
> all the corners of the API better than running all of our tests with
> the cache enabled. Do you agree?

Tests should strive to cover the implementation, not the API.  Shader
cache is a unique feature, because it affects a large part of the API.
It doesn't necessarily follow that covering the API will cover the
feature, or that that is an effective test strategy.

Knowing nothing of the implementation, I would look to similar projects
(eg: ccache) to see what test strategies they employ.  There are
probably issues they've encountered that we have not thought of.

Another approach would be to write real unit tests for bugs found during
the course of the shader cache implementation.  Those test cases would
help developers know that they broke a cache feature without having to
rely on a weekly CI run.

>> Since 103988 is not hardware specific, and is not a regression, it's not
>> something that could only have been caught by CI.  I'm surprised to find
>> it at this point, considering piglit was used to validate the feature.
>> 
>> I'd be happy if there was at least a smoke test where complex programs
>> are cached, with the intent of exercising the shader cache mechanism
>> directly.  It doesn't have to be exhaustive to be effective.  Seems like
>> a good idea to at least directly test the issue that was fixed in
>> 103988 tests.
>
> It could be interesting to define a MESA extension to control or query
> the shader cache. Today, the shader cache is a nearly 'invisible'
> feature. There are a few env-vars, but I wouldn't call a public
> interface.
>
> The downside of defining an extension is that it might constrain
> changes to the shader cache implementation. Or, if the 

Re: [Mesa-dev] [PATCH v4 28/44] i965/fs: Use untyped_surface_read for 16-bit load_ssbo (v2)

2017-12-05 Thread Jason Ekstrand
On Tue, Dec 5, 2017 at 1:36 PM, Jose Maria Casanova Crespo <
jmcasan...@igalia.com> wrote:

> SSBO loads were using byte_scattered read messages as they allow
> reading 16-bit size components. byte_scattered messages can only
> operate one component at a time so we needed to emit as many messages
> as components.
>
> But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
> untyped_surface_read message to read pairs of 16-bit components using
> only one message. Once each pair is read it is unshuffled to return the
> proper 16-bit components. vec3 case is assimilated to vec4 but the 4th
> component is ignored.
>
> 16-bit scalars are read using one byte_scattered_read message.
>
> v2: Removed use of stride = 2 on sources (Jason Ekstrand)
> Rework optimization using unshuffle 16 reads (Chema Casanova)
> v3: Use W and D types insead of HF and F in shuffle to avoid rounding
> erros (Jason Ekstrand)
> Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)
>
> CC: Jason Ekstrand 
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 29 ++---
>  1 file changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index e11e75e6332..8deec082d59 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -2303,16 +2303,31 @@ do_untyped_vector_read(const fs_builder ,
> unsigned num_components)
>  {
> if (type_sz(dest.type) <= 2) {
> -  fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
> -  bld.MOV(read_offset, offset_reg);
> -  for (unsigned i = 0; i < num_components; i++) {
> - fs_reg read_reg =
> -emit_byte_scattered_read(bld, surf_index, read_offset,
> +  assert(dest.stride == 1);
> +
> +  if (num_components > 1) {
> + /* Pairs of 16-bit components can be read with untyped read, for
> 16-bit
> +  * vec3 4th component is ignored.
> +  */
> + fs_reg read_result =
> +emit_untyped_read(bld, surf_index, offset_reg,
> +  1 /* dims */, DIV_ROUND_UP(num_components,
> 2),
> +  BRW_PREDICATE_NONE);
> + shuffle_32bit_load_result_to_16bit_data(bld,
> +   retype(dest, BRW_REGISTER_TYPE_W),
> +   retype(read_result, BRW_REGISTER_TYPE_D),
> +   num_components);
> +  } else {
> + assert(num_components == 1);
> + /* scalar 16-bit are read using one byte_scattered_read message
> */
> + fs_reg read_result =
> +emit_byte_scattered_read(bld, surf_index, offset_reg,
>   1 /* dims */, 1,
>   type_sz(dest.type) * 8 /* bit_size
> */,
>   BRW_PREDICATE_NONE);
> - bld.MOV(offset(dest, bld, i), subscript(read_reg, dest.type, 0));
> - bld.ADD(read_offset, read_offset, brw_imm_ud(type_sz(dest.type))
> );
> + read_result.type = dest.type;
> + read_result.stride = 2;
> + bld.MOV(dest, read_result);
>

If read_reg has a 32-bit type, you could use subscript here.  Meh.

Reviewed-by: Jason Ekstrand 


>}
> } else if (type_sz(dest.type) == 4) {
>fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
> --
> 2.11.0
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: add const qualifier on _mesa_is_renderable_texture_format()

2017-12-05 Thread Timothy Arceri

Series:

Reviewed-by: Timothy Arceri 

On 06/12/17 06:08, Brian Paul wrote:

---
  src/mesa/main/teximage.c | 3 ++-
  src/mesa/main/teximage.h | 3 ++-
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index 572e380..e5f8bb0 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -5712,7 +5712,8 @@ _mesa_TextureBufferRange(GLuint texture, GLenum 
internalFormat, GLuint buffer,
  }
  
  GLboolean

-_mesa_is_renderable_texture_format(struct gl_context *ctx, GLenum 
internalformat)
+_mesa_is_renderable_texture_format(const struct gl_context *ctx,
+   GLenum internalformat)
  {
 /* Everything that is allowed for renderbuffers,
  * except for a base format of GL_STENCIL_INDEX, unless supported.
diff --git a/src/mesa/main/teximage.h b/src/mesa/main/teximage.h
index 18452db..2e950bf 100644
--- a/src/mesa/main/teximage.h
+++ b/src/mesa/main/teximage.h
@@ -216,7 +216,8 @@ bool
  _mesa_format_no_online_compression(const struct gl_context *ctx, GLenum 
format);
  
  GLboolean

-_mesa_is_renderable_texture_format(struct gl_context *ctx, GLenum 
internalformat);
+_mesa_is_renderable_texture_format(const struct gl_context *ctx,
+   GLenum internalformat);
  
  extern void

  _mesa_texture_sub_image(struct gl_context *ctx, GLuint dims,


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 28/44] i965/fs: Use untyped_surface_read for 16-bit load_ssbo (v2)

2017-12-05 Thread Jose Maria Casanova Crespo
SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.

But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_surface_read message to read pairs of 16-bit components using
only one message. Once each pair is read it is unshuffled to return the
proper 16-bit components. vec3 case is assimilated to vec4 but the 4th
component is ignored.

16-bit scalars are read using one byte_scattered_read message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Rework optimization using unshuffle 16 reads (Chema Casanova)
v3: Use W and D types insead of HF and F in shuffle to avoid rounding
erros (Jason Ekstrand)
Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)

CC: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index e11e75e6332..8deec082d59 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2303,16 +2303,31 @@ do_untyped_vector_read(const fs_builder ,
unsigned num_components)
 {
if (type_sz(dest.type) <= 2) {
-  fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
-  bld.MOV(read_offset, offset_reg);
-  for (unsigned i = 0; i < num_components; i++) {
- fs_reg read_reg =
-emit_byte_scattered_read(bld, surf_index, read_offset,
+  assert(dest.stride == 1);
+
+  if (num_components > 1) {
+ /* Pairs of 16-bit components can be read with untyped read, for 
16-bit
+  * vec3 4th component is ignored.
+  */
+ fs_reg read_result =
+emit_untyped_read(bld, surf_index, offset_reg,
+  1 /* dims */, DIV_ROUND_UP(num_components, 2),
+  BRW_PREDICATE_NONE);
+ shuffle_32bit_load_result_to_16bit_data(bld,
+   retype(dest, BRW_REGISTER_TYPE_W),
+   retype(read_result, BRW_REGISTER_TYPE_D),
+   num_components);
+  } else {
+ assert(num_components == 1);
+ /* scalar 16-bit are read using one byte_scattered_read message */
+ fs_reg read_result =
+emit_byte_scattered_read(bld, surf_index, offset_reg,
  1 /* dims */, 1,
  type_sz(dest.type) * 8 /* bit_size */,
  BRW_PREDICATE_NONE);
- bld.MOV(offset(dest, bld, i), subscript(read_reg, dest.type, 0));
- bld.ADD(read_offset, read_offset, brw_imm_ud(type_sz(dest.type)));
+ read_result.type = dest.type;
+ read_result.stride = 2;
+ bld.MOV(dest, read_result);
   }
} else if (type_sz(dest.type) == 4) {
   fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler

2017-12-05 Thread Chema Casanova
On 05/12/17 19:53, Jason Ekstrand wrote:
> On Tue, Dec 5, 2017 at 9:08 AM, Chema Casanova  > wrote:
> 
> El 30/11/17 a las 23:58, Jason Ekstrand escribió:
> > On Wed, Nov 29, 2017 at 6:50 PM, Jose Maria Casanova Crespo
> > 
> >> wrote:
> >
> >     load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
> >     surface format defined. So when reading 16-bit components with the
> >     sampler we need to unshuffle two 16-bit components from each
> 32-bit
> >     component.
> >
> >     Using the sampler avoids the use of the byte_scattered_read
> message
> >     that needs one message for each component and is supposed to be
> >     slower.
> >
> >     In the case of SKL+ we take advantage of a hardware feature that
> >     automatically defines a channel mask based on the rlen value,
> so on
> >     SKL+ we only use half of the registers without using a header
> in the
> >     payload.
> >     ---
> >      src/intel/compiler/brw_fs.cpp           | 31
> >     +++
> >      src/intel/compiler/brw_fs_generator.cpp | 10 --
> >      2 files changed, 35 insertions(+), 6 deletions(-)
> >
> >     diff --git a/src/intel/compiler/brw_fs.cpp
> >     b/src/intel/compiler/brw_fs.cpp
> >     index 1ca4d416b2..9c543496ba 100644
> >     --- a/src/intel/compiler/brw_fs.cpp
> >     +++ b/src/intel/compiler/brw_fs.cpp
> >     @@ -184,9 +184,17 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
> >     fs_builder ,
> >          * a double this means we are only loading 2 elements worth of
> >     data.
> >          * We also want to use a 32-bit data type for the dst of the
> >     load operation
> >          * so other parts of the driver don't get confused about the
> >     size of the
> >     -    * result.
> >     +    * result. On the case of 16-bit data we only need half of the
> >     32-bit
> >     +    * components on SKL+ as we take advance of using message
> >     return size to
> >     +    * define an xy channel mask.
> >          */
> >     -   fs_reg vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> >     +   fs_reg vec4_result;
> >     +   if (type_sz(dst.type) == 2 && (devinfo->gen >= 9)) {
> >     +      vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
> >     +      vec4_result = retype(vec4_result, BRW_REGISTER_TYPE_HF);
> >     +   } else {
> >     +      vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> >     +   }
> >
> >         fs_inst *inst =
> >     bld.emit(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_LOGICAL,
> >                                  vec4_result, surf_index,
> vec4_offset);
> >         inst->size_written = 4 *
> >     vec4_result.component_size(inst->exec_size);
> >     @@ -197,8 +205,23 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
> >     fs_builder ,
> >         }
> >
> >         vec4_result.type = dst.type;
> >     -   bld.MOV(dst, offset(vec4_result, bld,
> >     -                       (const_offset & 0xf) /
> >     type_sz(vec4_result.type)));
> >     +
> >     +   if (type_sz(dst.type) == 2) {
> >     +      /* 16-bit types need to be unshuffled as each pair of
> >     16-bit components
> >     +       * is packed on a 32-bit compoment because we are using a
> >     32-bit format
> >     +       * in the surface of uniform that is read by the sampler.
> >     +       * TODO: On BDW+ mark when an uniform has 16-bit type so we
> >     could setup a
> >     +       * surface format of 16-bit and use the 16-bit return
> >     format at the
> >     +       * sampler.
> >     +       */
> >     +      vec4_result.stride = 2;
> >     +      bld.MOV(dst, byte_offset(offset(vec4_result, bld,
> >     +                                      (const_offset & 0x7) / 4),
> >     +                               (const_offset & 0x7) / 2 % 2 *
> 2));
> >     +   } else {
> >     +      bld.MOV(dst, offset(vec4_result, bld,
> >     +                          (const_offset & 0xf) /
> >     type_sz(vec4_result.type)));
> >     +   }
> >
> >
> > This seems overly complicated.  How about something like
> 
> > fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
> > switch (type_sz(dst.type)) {
> > case 2:
> >    shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1);
> >    bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
> >    break;
> > case 4:
> >    bld.MOV(dst, dw);
> >    break;
> > case 8:
> >    shuffle_32bit_load_result_to_64bit_data(bld, dst, 

[Mesa-dev] [PATCH 3/3] glx: Implement GLX_EXT_no_config_context (v4)

2017-12-05 Thread Adam Jackson
This more or less ports EGL_KHR_no_config_context to GLX.

v2: Enable the extension only for those backends that support it.
v3: Fix glvnd path and dri2_convert_glx_attribs()
v4: Screeching signedness correctness, and disable a now
inappropriate test.

Khronos: https://github.com/KhronosGroup/OpenGL-Registry/pull/102
Reviewed-by: Kenneth Graunke 
Signed-off-by: Adam Jackson 
Reviewed-by: Ian Romanick 
---
 src/glx/applegl_glx.c |  3 +++
 src/glx/create_context.c  | 37 +++
 src/glx/dri2_glx.c|  1 +
 src/glx/dri3_glx.c|  1 +
 src/glx/dri_common.c  |  4 
 src/glx/drisw_glx.c   |  1 +
 src/glx/driwindows_glx.c  |  2 +-
 src/glx/g_glxglvnddispatchfuncs.c | 14 +++-
 src/glx/glxcmds.c |  2 +-
 src/glx/glxextensions.c   |  1 +
 src/glx/glxextensions.h   |  1 +
 src/glx/tests/create_context_unittest.cpp |  9 
 12 files changed, 50 insertions(+), 26 deletions(-)

diff --git a/src/glx/applegl_glx.c b/src/glx/applegl_glx.c
index c086e5146a..85f70058c6 100644
--- a/src/glx/applegl_glx.c
+++ b/src/glx/applegl_glx.c
@@ -134,6 +134,9 @@ applegl_create_context(struct glx_screen *psc,
/* TODO: Integrate this with apple_glx_create_context and make
 * struct apple_glx_context inherit from struct glx_context. */
 
+   if (!config)
+  return NULL;
+
gc = calloc(1, sizeof(*gc));
if (gc == NULL)
   return NULL;
diff --git a/src/glx/create_context.c b/src/glx/create_context.c
index 38e949ab4c..bc39ffef1d 100644
--- a/src/glx/create_context.c
+++ b/src/glx/create_context.c
@@ -47,21 +47,11 @@ glXCreateContextAttribsARB(Display *dpy, GLXFBConfig config,
xcb_generic_error_t *err;
xcb_void_cookie_t cookie;
unsigned dummy_err = 0;
+   int screen = -1;
 
-
-   if (dpy == NULL || cfg == NULL)
-  return NULL;
-
-   /* This means that either the caller passed the wrong display pointer or
-* one of the internal GLX data structures (probably the fbconfig) has an
-* error.  There is nothing sensible to do, so return an error.
-*/
-   psc = GetGLXScreenConfigs(dpy, cfg->screen);
-   if (psc == NULL)
+   if (dpy == NULL)
   return NULL;
 
-   assert(cfg->screen == psc->scr);
-
/* Count the number of attributes specified by the application.  All
 * attributes appear in pairs, except the terminating None.
 */
@@ -70,6 +60,25 @@ glXCreateContextAttribsARB(Display *dpy, GLXFBConfig config,
 /* empty */ ;
}
 
+   if (cfg) {
+  screen = cfg->screen;
+   } else {
+  for (unsigned int i = 0; i < num_attribs; i++) {
+ if (attrib_list[i * 2] == GLX_SCREEN)
+screen = attrib_list[i * 2 + 1];
+  }
+   }
+
+   /* This means that either the caller passed the wrong display pointer or
+* one of the internal GLX data structures (probably the fbconfig) has an
+* error.  There is nothing sensible to do, so return an error.
+*/
+   psc = GetGLXScreenConfigs(dpy, screen);
+   if (psc == NULL)
+  return NULL;
+
+   assert(screen == psc->scr);
+
if (direct && psc->vtable->create_context_attribs) {
   /* GLX drops the error returned by the driver.  The expectation is that
* an error will also be returned by the server.  The server's error
@@ -104,8 +113,8 @@ glXCreateContextAttribsARB(Display *dpy, GLXFBConfig config,
cookie =
   xcb_glx_create_context_attribs_arb_checked(c,
 gc->xid,
-cfg->fbconfigID,
-cfg->screen,
+cfg ? cfg->fbconfigID : 0,
+screen,
 gc->share_xid,
 gc->isDirect,
 num_attribs,
diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
index 0f44635725..eeec4f0d60 100644
--- a/src/glx/dri2_glx.c
+++ b/src/glx/dri2_glx.c
@@ -1129,6 +1129,7 @@ dri2BindExtensions(struct dri2_screen *psc, struct 
glx_display * priv,
 
   __glXEnableDirectExtension(>base, "GLX_ARB_create_context");
   __glXEnableDirectExtension(>base, "GLX_ARB_create_context_profile");
+  __glXEnableDirectExtension(>base, "GLX_EXT_no_config_context");
 
   if ((mask & ((1 << __DRI_API_GLES) |
(1 << __DRI_API_GLES2) |
diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c
index f280a8cef7..f0560edeb5 100644
--- a/src/glx/dri3_glx.c
+++ b/src/glx/dri3_glx.c
@@ -720,6 +720,7 @@ dri3_bind_extensions(struct dri3_screen *psc, struct 
glx_display * priv,
 
__glXEnableDirectExtension(>base, "GLX_ARB_create_context");

[Mesa-dev] [PATCH 0/3] GLX_EXT_no_config_context v4

2017-12-05 Thread Adam Jackson
With one exception (noted in 2/3) this is now regression-free in piglit,
and passes make check.

- ajax

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] glx: Lift sending the MakeCurrent request to top-level code (v2)

2017-12-05 Thread Adam Jackson
Somewhat terrifyingly, we never sent this for direct contexts, which
means the server never knew the context/drawable bindings.

To handle this sanely, pull the request code up out of the indirect
backend, and rewrite the context switch path to call it as appropriate.
This attempts to preserve the existing behavior of not calling unbind()
on the context if its refcount would not drop to zero, which is why the
diff is a little uglier than I'd like.

Fixes glx-make-context, glx-multi-context-single-window and
glx-query-drawable-glx_fbconfig_id-window with both direct and indirect
contexts. glx-make-glxdrawable-current regresses, but I believe the test
is in error, as the fbconfig it uses to create the context is just
"something double-buffered and rgba" as opposed to "corresponds to the
visual I used to create the windows".

v2: Style fixups (Ian Romanick)

Signed-off-by: Adam Jackson 
---
 src/glx/glxcurrent.c   | 182 +
 src/glx/indirect_glx.c | 140 +++--
 2 files changed, 162 insertions(+), 160 deletions(-)

diff --git a/src/glx/glxcurrent.c b/src/glx/glxcurrent.c
index 9f8bf7cee1..0ce80670d2 100644
--- a/src/glx/glxcurrent.c
+++ b/src/glx/glxcurrent.c
@@ -165,17 +165,85 @@ glXGetCurrentDrawable(void)
return gc->currentDrawable;
 }
 
-/**
- * Make a particular context current.
- *
- * \note This is in this file so that it can access dummyContext.
- */
+static Bool
+SendMakeCurrentRequest(Display * dpy, GLXContextID gc_id,
+   GLXContextTag gc_tag, GLXDrawable draw,
+   GLXDrawable read, GLXContextTag *out_tag)
+{
+   xGLXMakeCurrentReply reply;
+   Bool ret;
+   int opcode = __glXSetupForCommand(dpy);
+
+   LockDisplay(dpy);
+
+   if (draw == read) {
+  xGLXMakeCurrentReq *req;
+
+  GetReq(GLXMakeCurrent, req);
+  req->reqType = opcode;
+  req->glxCode = X_GLXMakeCurrent;
+  req->drawable = draw;
+  req->context = gc_id;
+  req->oldContextTag = gc_tag;
+   }
+   else {
+  struct glx_display *priv = __glXInitialize(dpy);
+
+  if ((priv->majorVersion > 1) || (priv->minorVersion >= 3)) {
+ xGLXMakeContextCurrentReq *req;
+
+ GetReq(GLXMakeContextCurrent, req);
+ req->reqType = opcode;
+ req->glxCode = X_GLXMakeContextCurrent;
+ req->drawable = draw;
+ req->readdrawable = read;
+ req->context = gc_id;
+ req->oldContextTag = gc_tag;
+  }
+  else {
+ xGLXVendorPrivateWithReplyReq *vpreq;
+ xGLXMakeCurrentReadSGIReq *req;
+
+ GetReqExtra(GLXVendorPrivateWithReply,
+ sz_xGLXMakeCurrentReadSGIReq -
+ sz_xGLXVendorPrivateWithReplyReq, vpreq);
+ req = (xGLXMakeCurrentReadSGIReq *) vpreq;
+ req->reqType = opcode;
+ req->glxCode = X_GLXVendorPrivateWithReply;
+ req->vendorCode = X_GLXvop_MakeCurrentReadSGI;
+ req->drawable = draw;
+ req->readable = read;
+ req->context = gc_id;
+ req->oldContextTag = gc_tag;
+  }
+   }
+
+   ret = _XReply(dpy, (xReply *) , 0, False);
+
+   if (out_tag)
+  *out_tag = reply.contextTag;
+
+   UnlockDisplay(dpy);
+   SyncHandle();
+
+   return ret;
+}
+
+static void
+SetGC(struct glx_context *gc, Display *dpy, GLXDrawable draw, GLXDrawable read)
+{
+   gc->currentDpy = dpy;
+   gc->currentDrawable = draw;
+   gc->currentReadable = read;
+}
+
 static Bool
 MakeContextCurrent(Display * dpy, GLXDrawable draw,
GLXDrawable read, GLXContext gc_user)
 {
struct glx_context *gc = (struct glx_context *) gc_user;
struct glx_context *oldGC = __glXGetCurrentContext();
+   Bool ret = GL_FALSE;
 
/* Make sure that the new context has a nonzero ID.  In the request,
 * a zero context ID is used only to mean that we bind to no current
@@ -186,59 +254,87 @@ MakeContextCurrent(Display * dpy, GLXDrawable draw,
}
 
_glapi_check_multithread();
-
__glXLock();
+
if (oldGC == gc &&
-   gc->currentDrawable == draw && gc->currentReadable == read) {
-  __glXUnlock();
-  return True;
+   gc->currentDrawable == draw &&
+   gc->currentReadable == read) {
+  /* Same context and drawables: no op, just return */
+  ret = GL_TRUE;
}
 
-   if (oldGC != ) {
-  if (--oldGC->thread_refcount == 0) {
-oldGC->vtable->unbind(oldGC, gc);
-oldGC->currentDpy = 0;
+   else if (oldGC == gc) {
+  /* Same context and new drawbles: update drawable bindings */
+  if (!SendMakeCurrentRequest(dpy, gc->xid, gc->currentContextTag,
+  draw, read, >currentContextTag)) {
+ goto out;
   }
-   }
 
-   if (gc) {
-  /* Attempt to bind the context.  We do this before mucking with
-   * gc and __glXSetCurrentContext to properly handle our state in
-   * case of an error.
-   *
-   * If an error occurs, 

[Mesa-dev] [PATCH 1/3] glx: Move vertex array protocol state into the indirect backend (v2)

2017-12-05 Thread Adam Jackson
Only relevant for indirect contexts, so let's get that code out of the
common path.

v2: Add the necessary context setup before calling GetString

Signed-off-by: Adam Jackson 
---
 src/glx/glxcurrent.c   | 12 
 src/glx/indirect_glx.c | 26 ++
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/src/glx/glxcurrent.c b/src/glx/glxcurrent.c
index fd04929b89..9f8bf7cee1 100644
--- a/src/glx/glxcurrent.c
+++ b/src/glx/glxcurrent.c
@@ -238,18 +238,6 @@ MakeContextCurrent(Display * dpy, GLXDrawable draw,
 
__glXUnlock();
 
-   /* The indirect vertex array state must to be initialised after we
-* have setup the context, as it needs to query server attributes.
-*/
-   if (gc && !gc->isDirect) {
-  __GLXattribute *state = gc->client_state_private;
-  if (state && state->array_state == NULL) {
- glGetString(GL_EXTENSIONS);
- glGetString(GL_VERSION);
- __glXInitVertexArrayState(gc);
-  }
-   }
-
return GL_TRUE;
 }
 
diff --git a/src/glx/indirect_glx.c b/src/glx/indirect_glx.c
index cfae12f6c0..5482d768ff 100644
--- a/src/glx/indirect_glx.c
+++ b/src/glx/indirect_glx.c
@@ -34,7 +34,7 @@
 
 #include "glapi.h"
 #include "glxclient.h"
-
+#include "indirect.h"
 #include "util/debug.h"
 
 #ifndef GLX_USE_APPLEGL
@@ -148,9 +148,27 @@ indirect_bind_context(struct glx_context *gc, struct 
glx_context *old,
sent = SendMakeCurrentRequest(dpy, gc->xid, tag, draw, read,
 >currentContextTag);
 
-   if (!IndirectAPI)
-  IndirectAPI = __glXNewIndirectAPI();
-   _glapi_set_dispatch(IndirectAPI);
+   if (sent) {
+  if (!IndirectAPI)
+ IndirectAPI = __glXNewIndirectAPI();
+  _glapi_set_dispatch(IndirectAPI);
+
+  /* The indirect vertex array state must to be initialised after we
+   * have setup the context, as it needs to query server attributes.
+   *
+   * At the point this is called gc->currentDpy is not initialized
+   * nor is the thread's current context actually set. Hence the
+   * cleverness before the GetString calls.
+   */
+  __GLXattribute *state = gc->client_state_private;
+  if (state && state->array_state == NULL) {
+ gc->currentDpy = gc->psc->dpy;
+ __glXSetCurrentContext(gc);
+ __indirect_glGetString(GL_EXTENSIONS);
+ __indirect_glGetString(GL_VERSION);
+ __glXInitVertexArrayState(gc);
+  }
+   }
 
return !sent;
 }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] anv: fix a case statement in GetMemoryFdPropertiesKHR

2017-12-05 Thread Jason Ekstrand
Oops.  Both are

Reviewed-by: Jason Ekstrand 

On Tue, Dec 5, 2017 at 12:51 PM, Fredrik Höglund  wrote:

> The handle type in the case statement is supposed to be VK_EXTERNAL_-
> MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.
>
> Signed-off-by: Fredrik Höglund 
> ---
>  src/intel/vulkan/anv_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
> index 97124154b69..af804612654 100644
> --- a/src/intel/vulkan/anv_device.c
> +++ b/src/intel/vulkan/anv_device.c
> @@ -1714,7 +1714,7 @@ VkResult anv_GetMemoryFdPropertiesKHR(
> struct anv_physical_device *pdevice = >instance->
> physicalDevice;
>
> switch (handleType) {
> -   case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR:
> +   case VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT:
>/* dma-buf can be imported as any memory type */
>pMemoryFdProperties->memoryTypeBits =
>   (1 << pdevice->memory.type_count) - 1;
> --
> 2.15.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] anv: fix a case statement in GetMemoryFdPropertiesKHR

2017-12-05 Thread Fredrik Höglund
The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.

Signed-off-by: Fredrik Höglund 
---
 src/intel/vulkan/anv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 97124154b69..af804612654 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1714,7 +1714,7 @@ VkResult anv_GetMemoryFdPropertiesKHR(
struct anv_physical_device *pdevice = >instance->physicalDevice;
 
switch (handleType) {
-   case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR:
+   case VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT:
   /* dma-buf can be imported as any memory type */
   pMemoryFdProperties->memoryTypeBits =
  (1 << pdevice->memory.type_count) - 1;
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] radv: fix a case statement in GetMemoryFdPropertiesKHR

2017-12-05 Thread Fredrik Höglund
The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.

Signed-off-by: Fredrik Höglund 
---
 src/amd/vulkan/radv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 1b7cd355938..2538472bea6 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -3557,7 +3557,7 @@ VkResult radv_GetMemoryFdPropertiesKHR(VkDevice _device,
   VkMemoryFdPropertiesKHR 
*pMemoryFdProperties)
 {
switch (handleType) {
-   case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR:
+   case VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT:
   pMemoryFdProperties->memoryTypeBits = (1 << RADV_MEM_TYPE_COUNT) - 1;
   return VK_SUCCESS;
 
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable disk shader cache by default

2017-12-05 Thread Jordan Justen
On 2017-12-05 09:13:11, Mark Janes wrote:
> Jordan Justen  writes:
> 
> > On 2017-11-08 17:26:47, Timothy Arceri wrote:
> >> Reviewed-by: Timothy Arceri 
> >> 
> >> Mark may want to consider adding some of the once a day type CI runs for 
> >> this. For example running the test suite for two consecutive runs on the 
> >> same build so that the second run uses the shader cache and also a 
> >> second run the uses MESA_GLSL=cache_fb to force testing of the cache 
> >> fallback path.
> >
> > Yeah. We discussed this previously, but I don't think it's been
> > implemented yet. My opinion is that it could perhaps be a weekly test.
> 
> This automation is implemented now. It found the issues reported in
> https://bugs.freedesktop.org/show_bug.cgi?id=103988
> 
> My opinion is that this test strategy is inadequate.

Meaning you think we cannot enable i965 shader cache for the 18.0
release?

We are already ~1.5 months away from the next stable branch point. If
we want to enable this in 18.0, we should be using this time to see if
enabling the cache by default has some large unexpected side effect in
our graphics stack, or breaks real-world apps.

> Adding a dimension
> to the test matrix has high cost, especially when combined with other
> dimensions of the test matrix (does shader cache need to be tested for
> 32/64 bit builds? For hswgt1/hswgt2/hswgt3e?).

Are you saying this is too high cost to run per check-in? Do you need
to disable it for the health of CI? I think I proposed that daily, or
perhaps even weekly would be adequate.

These tests are already identifying some corner case issues. I'm not
sure these would have impacted real-world apps yet, but I do think it
is a good idea to run these tests regularly.

You say this test strategy is inadequate. Perhaps. I do think it needs
to be part of our test strategy. There is no way we are going to hit
all the corners of the API better than running all of our tests with
the cache enabled. Do you agree?

> Since 103988 is not hardware specific, and is not a regression, it's not
> something that could only have been caught by CI.  I'm surprised to find
> it at this point, considering piglit was used to validate the feature.
> 
> I'd be happy if there was at least a smoke test where complex programs
> are cached, with the intent of exercising the shader cache mechanism
> directly.  It doesn't have to be exhaustive to be effective.  Seems like
> a good idea to at least directly test the issue that was fixed in
> 103988 tests.

It could be interesting to define a MESA extension to control or query
the shader cache. Today, the shader cache is a nearly 'invisible'
feature. There are a few env-vars, but I wouldn't call a public
interface.

The downside of defining an extension is that it might constrain
changes to the shader cache implementation. Or, if the interface is
too complex, then it may be tough for some drivers to implement.

But, what if we get around to defining and implementing this extenion,
(and a few piglit tests that makes use of it), by mid-January? It'll
still be pretty difficult to argue we should enable the shader cache
on i965 for 18.0, since we burned all the time to let it be tested on
master against real-world apps.

Also, while you say our current test strategy is inadequate, I do
think it is still far more comprehensive than anything we will
accomplish by defining a new extension and writing a few, or even a
few hundred tests against the new extension.

-Jordan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/android: Partially handle HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED

2017-12-05 Thread Rob Herring
On Tue, Dec 5, 2017 at 11:01 AM, Robert Foss  wrote:
> On Tue, 2017-12-05 at 18:22 +0900, Tomasz Figa wrote:
>> On Sat, Dec 2, 2017 at 4:43 AM, Rob Herring  wrote:
>> > On Fri, Dec 1, 2017 at 8:44 AM, Tomasz Figa 
>> > wrote:
>> > > On Fri, Dec 1, 2017 at 11:20 PM, Rob Herring 
>> > > wrote:
>> > > > On Fri, Dec 1, 2017 at 7:30 AM, Robert Foss > > > > ora.com> wrote:
>> > > > > On Thu, 2017-11-30 at 11:14 -0600, Rob Herring wrote:
>> > > > > > On Thu, Nov 30, 2017 at 12:11 AM, Tapani Pälli > > > > > > i...@intel.co
>> > > > > > m> wrote:
>> > > > > > >
>> > > > > > >
>> > > > > > > On 11/30/2017 06:13 AM, Tomasz Figa wrote:
>> > > > > > > >
>> > > > > > > > On Thu, Nov 30, 2017 at 3:43 AM, Robert Foss > > > > > > > > ss@collabo
>> > > > > > > > ra.com>
>> > > > > > > > wrote:
>> >
>> > [...]


>> > > However I fully agree that there are other upstream components
>> > > (e.g.
>> > > drm_hwcomposer), which would benefit from it and nobody wants to
>> > > include Mesa in the build just for one header. Should we have a
>> > > separate freedesktop project for it?
>> >
>> > I'm still going to say libdrm. If that's really a problem, then we
>> > can
>> > split it out later.
>>
>> At least for Chromium OS, libdrm would work fine indeed.
>
> Excellent, so with the question of where (libdrm) covered, that leaves
> us with what.
>
> Currently this is what what the XXX_handle structs look like:
> gbm_gralloc: https://github.com/robherring/gbm_gralloc/blob/master/gral
> loc_drm_handle.h#L36
>
> minigbm: https://chromium.googlesource.com/chromiumos/platform/minigbm/
> +/master/cros_gralloc/cros_gralloc_handle.h#20
>
> intel-minigbm: https://github.com/intel/minigbm/blob/master/cros_grallo
> c/cros_gralloc_handle.h#L20
>
> A lot seems to be shared between the 3, gbm_gralloc seems to have
> modifier support, but lack multi-planar YUV support.
>
> DRV_MAX_PLANES sounds to be a bit of a misnomer, as (I think) it refers
> to YUV planes and not planes supported by the driver/hw.
>
> I would assume that all shared variables would be available in the
> libdrm-struct, as well the ones planar YUV support.
>
> As for the non-obvious ones, maybe the should be listed so that we can
> dig into if they are really needed, implementation specific or unused.

My plan is similar to what I was thinking in the move to hwc. Move the
struct as is to libdrm and get the dependencies (gbm_gralloc,
drm_gralloc?, drm_hwc, mesa) switched over to use it. Then start
modifying the struct contents. Here's my list:

- versioning of the handle
- remove .name (GEM handles. only dmabuf support)
- better way to do buffer registration tracking (than a pid and ptr)?
- multi-planar
- switch format to fourcc (or add the fourcc format)
- switch to accessor functions or library calls

Rob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] glx: Implement GLX_EXT_no_config_context (v3)

2017-12-05 Thread Adam Jackson
On Thu, 2017-11-30 at 16:10 +, Emil Velikov wrote:
> On 14 November 2017 at 20:13, Adam Jackson  wrote:
> 
> > @@ -562,6 +562,10 @@ dri2_convert_glx_attribs(unsigned num_attribs,
> > const uint32_t *attribs,
> >  return false;
> >   }
> >   break;
> > +  case GLX_SCREEN:
> > + /* Implies GLX_EXT_no_config_context */
> > + *render_type = GLX_DONT_CARE;
> > + break;
> 
> We should fall-through (and fail) when GLX_SCREEN is set but the
> extension is missing.

Nah. This function does not make reference to the screen we're trying
to create a context for. And, for direct contexts (that call this
conversion helper) this extension is now always present anyway.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radv: enable lowering of nir_op_bitfield_extract

2017-12-05 Thread Connor Abbott
Same comment as the previous patch.

On Tue, Dec 5, 2017 at 12:50 PM, Samuel Pitoiset
 wrote:
> This instruction should also be lowered correctly.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_shader.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
> index 0b19d23fa2..044bcd0641 100644
> --- a/src/amd/vulkan/radv_shader.c
> +++ b/src/amd/vulkan/radv_shader.c
> @@ -67,6 +67,7 @@ static const struct nir_shader_compiler_options nir_options 
> = {
> .lower_extract_word = true,
> .lower_ffma = true,
> .lower_bitfield_insert = true,
> +   .lower_bitfield_extract = true,
> .max_unroll_iterations = 32
>  };
>
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] radv: enable lowering of nir_op_bitfield_insert

2017-12-05 Thread Connor Abbott
lower_bitfield_insert lowers nir_op_bitfield_insert to DX10-style
nir_op_bfi and nir_op_bfm, both of which aren't handled by
ac_nir_to_llvm, so unless I'm missing something this will just break
them even harder. We probably should use this lowering after adding
support for bfi and bfm, since AMD does have native instructions for
bfi and bfm, but first I'd like to see the actual bug fixed. Have you
tried running it with NIR_PRINT=true to pin down which optimization
pass is going wrong?

On Tue, Dec 5, 2017 at 12:50 PM, Samuel Pitoiset
 wrote:
> Otherwise it's replaced by
> "vec1 32 ssa_108 = load_const (0x /* 0.00 */)", which
> looks clearly wrong.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104119
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_shader.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
> index 4a3fdfa80e..0b19d23fa2 100644
> --- a/src/amd/vulkan/radv_shader.c
> +++ b/src/amd/vulkan/radv_shader.c
> @@ -66,6 +66,7 @@ static const struct nir_shader_compiler_options nir_options 
> = {
> .lower_extract_byte = true,
> .lower_extract_word = true,
> .lower_ffma = true,
> +   .lower_bitfield_insert = true,
> .max_unroll_iterations = 32
>  };
>
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa: add const qualifier on _mesa_is_renderable_texture_format()

2017-12-05 Thread Brian Paul
---
 src/mesa/main/teximage.c | 3 ++-
 src/mesa/main/teximage.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index 572e380..e5f8bb0 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -5712,7 +5712,8 @@ _mesa_TextureBufferRange(GLuint texture, GLenum 
internalFormat, GLuint buffer,
 }
 
 GLboolean
-_mesa_is_renderable_texture_format(struct gl_context *ctx, GLenum 
internalformat)
+_mesa_is_renderable_texture_format(const struct gl_context *ctx,
+   GLenum internalformat)
 {
/* Everything that is allowed for renderbuffers,
 * except for a base format of GL_STENCIL_INDEX, unless supported.
diff --git a/src/mesa/main/teximage.h b/src/mesa/main/teximage.h
index 18452db..2e950bf 100644
--- a/src/mesa/main/teximage.h
+++ b/src/mesa/main/teximage.h
@@ -216,7 +216,8 @@ bool
 _mesa_format_no_online_compression(const struct gl_context *ctx, GLenum 
format);
 
 GLboolean
-_mesa_is_renderable_texture_format(struct gl_context *ctx, GLenum 
internalformat);
+_mesa_is_renderable_texture_format(const struct gl_context *ctx,
+   GLenum internalformat);
 
 extern void
 _mesa_texture_sub_image(struct gl_context *ctx, GLuint dims,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] mesa: add const qualifier on _mesa_base_fbo_format()

2017-12-05 Thread Brian Paul
---
 src/mesa/main/fbobject.c | 2 +-
 src/mesa/main/fbobject.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index 30287ab..d23916d 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -1804,7 +1804,7 @@ _mesa_CreateRenderbuffers(GLsizei n, GLuint 
*renderbuffers)
  * \return the base internal format, or 0 if internalFormat is illegal
  */
 GLenum
-_mesa_base_fbo_format(struct gl_context *ctx, GLenum internalFormat)
+_mesa_base_fbo_format(const struct gl_context *ctx, GLenum internalFormat)
 {
/*
 * Notes: some formats such as alpha, luminance, etc. were added
diff --git a/src/mesa/main/fbobject.h b/src/mesa/main/fbobject.h
index e4846e8..f52fd15 100644
--- a/src/mesa/main/fbobject.h
+++ b/src/mesa/main/fbobject.h
@@ -112,7 +112,7 @@ extern GLboolean
 _mesa_is_legal_color_format(const struct gl_context *ctx, GLenum baseFormat);
 
 extern GLenum
-_mesa_base_fbo_format(struct gl_context *ctx, GLenum internalFormat);
+_mesa_base_fbo_format(const struct gl_context *ctx, GLenum internalFormat);
 
 extern bool
 _mesa_detach_renderbuffer(struct gl_context *ctx,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] radeonsi: make const and stream uploaders allocate read-only memory

2017-12-05 Thread Marek Olšák
From: Marek Olšák 

and anything that clones these uploaders, like u_threaded_context.
---
 src/gallium/drivers/radeon/r600_pipe_common.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 9090e65..9e45a9f 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -438,26 +438,29 @@ bool si_common_context_init(struct r600_common_context 
*rctx,
return false;
}
 
rctx->allocator_zeroed_memory =
u_suballocator_create(>b, sscreen->info.gart_page_size,
  0, PIPE_USAGE_DEFAULT, 0, true);
if (!rctx->allocator_zeroed_memory)
return false;
 
rctx->b.stream_uploader = u_upload_create(>b, 1024 * 1024,
- 0, PIPE_USAGE_STREAM, 0);
+ 0, PIPE_USAGE_STREAM,
+ R600_RESOURCE_FLAG_READ_ONLY);
if (!rctx->b.stream_uploader)
return false;
 
rctx->b.const_uploader = u_upload_create(>b, 128 * 1024,
-0, PIPE_USAGE_DEFAULT, 0);
+0, PIPE_USAGE_DEFAULT,
+
sscreen->cpdma_prefetch_writes_memory ?
+   0 : 
R600_RESOURCE_FLAG_READ_ONLY);
if (!rctx->b.const_uploader)
return false;
 
rctx->cached_gtt_allocator = u_upload_create(>b, 16 * 1024,
 0, PIPE_USAGE_STAGING, 0);
if (!rctx->cached_gtt_allocator)
return false;
 
rctx->ctx = rctx->ws->ctx_create(rctx->ws);
if (!rctx->ctx)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] radeonsi: allow DMABUF exports for local buffers

2017-12-05 Thread Marek Olšák
From: Marek Olšák 

Cc: 17.3 
---
 src/gallium/drivers/radeon/r600_texture.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 2aa47b5..07f7c33 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -739,22 +739,26 @@ static boolean r600_texture_get_handle(struct 
pipe_screen* screen,
stride = rtex->surface.u.gfx9.surf_pitch *
 rtex->surface.bpe;
slice_size = rtex->surface.u.gfx9.surf_slice_size;
} else {
offset = rtex->surface.u.legacy.level[0].offset;
stride = rtex->surface.u.legacy.level[0].nblk_x *
 rtex->surface.bpe;
slice_size = 
(uint64_t)rtex->surface.u.legacy.level[0].slice_size_dw * 4;
}
} else {
+   /* Buffer exports are for the OpenCL interop. */
/* Move a suballocated buffer into a non-suballocated 
allocation. */
-   if (sscreen->ws->buffer_is_suballocated(res->buf)) {
+   if (sscreen->ws->buffer_is_suballocated(res->buf) ||
+   /* A DMABUF export always fails if the BO is local. */
+   (rtex->resource.flags & RADEON_FLAG_NO_INTERPROCESS_SHARING 
&&
+whandle->type != DRM_API_HANDLE_TYPE_KMS)) {
assert(!res->b.is_shared);
 
/* Allocate a new buffer with PIPE_BIND_SHARED. */
struct pipe_resource templ = res->b.b;
templ.bind |= PIPE_BIND_SHARED;
 
struct pipe_resource *newb =
screen->resource_create(screen, );
if (!newb)
return false;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] radeonsi/gfx9: make shader binaries use read-only memory

2017-12-05 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_buffer_common.c | 3 +++
 src/gallium/drivers/radeon/r600_pipe_common.h   | 1 +
 src/gallium/drivers/radeonsi/si_pipe.c  | 2 ++
 src/gallium/drivers/radeonsi/si_pipe.h  | 1 +
 src/gallium/drivers/radeonsi/si_shader.c| 9 ++---
 5 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index ec282d5..55400ab 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -167,20 +167,23 @@ void si_init_resource_fields(struct si_screen *sscreen,
 
/* Displayable and shareable surfaces are not suballocated. */
if (res->b.b.bind & (PIPE_BIND_SHARED | PIPE_BIND_SCANOUT))
res->flags |= RADEON_FLAG_NO_SUBALLOC; /* shareable */
else
res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
 
if (sscreen->debug_flags & DBG(NO_WC))
res->flags &= ~RADEON_FLAG_GTT_WC;
 
+   if (res->b.b.flags & R600_RESOURCE_FLAG_READ_ONLY)
+   res->flags |= RADEON_FLAG_READ_ONLY;
+
/* Set expected VRAM and GART usage for the buffer. */
res->vram_usage = 0;
res->gart_usage = 0;
res->max_forced_staging_uploads = 0;
res->b.max_forced_staging_uploads = 0;
 
if (res->domains & RADEON_DOMAIN_VRAM) {
res->vram_usage = size;
 
res->max_forced_staging_uploads =
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 498a741..d1fdea0 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -46,20 +46,21 @@
 
 struct u_log_context;
 struct si_screen;
 struct si_context;
 
 #define R600_RESOURCE_FLAG_TRANSFER(PIPE_RESOURCE_FLAG_DRV_PRIV << 
0)
 #define R600_RESOURCE_FLAG_FLUSHED_DEPTH   (PIPE_RESOURCE_FLAG_DRV_PRIV << 
1)
 #define R600_RESOURCE_FLAG_FORCE_TILING
(PIPE_RESOURCE_FLAG_DRV_PRIV << 2)
 #define R600_RESOURCE_FLAG_DISABLE_DCC (PIPE_RESOURCE_FLAG_DRV_PRIV << 
3)
 #define R600_RESOURCE_FLAG_UNMAPPABLE  (PIPE_RESOURCE_FLAG_DRV_PRIV << 
4)
+#define R600_RESOURCE_FLAG_READ_ONLY   (PIPE_RESOURCE_FLAG_DRV_PRIV << 
5)
 
 /* Debug flags. */
 enum {
/* Shader logging options: */
DBG_VS = PIPE_SHADER_VERTEX,
DBG_PS = PIPE_SHADER_FRAGMENT,
DBG_GS = PIPE_SHADER_GEOMETRY,
DBG_TCS = PIPE_SHADER_TESS_CTRL,
DBG_TES = PIPE_SHADER_TESS_EVAL,
DBG_CS = PIPE_SHADER_COMPUTE,
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 5d7837d..676d199 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -822,20 +822,22 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws,
!(sscreen->debug_flags & DBG(NO_RB_PLUS)) &&
(sscreen->info.family == CHIP_STONEY ||
 sscreen->info.family == CHIP_RAVEN);
}
 
sscreen->dcc_msaa_allowed =
!(sscreen->debug_flags & DBG(NO_DCC_MSAA)) &&
(sscreen->debug_flags & DBG(DCC_MSAA) ||
 sscreen->info.chip_class == VI);
 
+   sscreen->cpdma_prefetch_writes_memory = sscreen->info.chip_class <= VI;
+
(void) mtx_init(>shader_parts_mutex, mtx_plain);
sscreen->use_monolithic_shaders =
(sscreen->debug_flags & DBG(MONOLITHIC_SHADERS)) != 0;
 
sscreen->barrier_flags.cp_to_L2 = SI_CONTEXT_INV_SMEM_L1 |
SI_CONTEXT_INV_VMEM_L1;
if (sscreen->info.chip_class <= VI) {
sscreen->barrier_flags.cp_to_L2 |= SI_CONTEXT_INV_GLOBAL_L2;
sscreen->barrier_flags.L2_to_cp |= 
SI_CONTEXT_WRITEBACK_GLOBAL_L2;
}
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 7a09937..3a959f9 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -116,20 +116,21 @@ struct si_screen {
booldpbb_allowed;
booldfsm_allowed;
boolllvm_has_working_vgpr_indexing;
 
/* Whether shaders are monolithic (1-part) or separate (3-part). */
booluse_monolithic_shaders;
boolrecord_llvm_ir;
boolhas_rbplus; /* if RB+ registers 
exist */
boolrbplus_allowed; /* if RB+ is allowed */
booldcc_msaa_allowed;
+   boolcpdma_prefetch_writes_memory;
 
struct 

[Mesa-dev] [PATCH 3/6] winsys/amdgpu: make IBs use read-only memory

2017-12-05 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 089a358..63cd632 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -642,20 +642,21 @@ static bool amdgpu_ib_new_buffer(struct amdgpu_winsys 
*ws, struct amdgpu_ib *ib,
   buffer_size = MAX2(buffer_size, 8 * 1024 * 4);
   break;
default:
   unreachable("unhandled IB type");
}
 
pb = ws->base.buffer_create(>base, buffer_size,
ws->info.gart_page_size,
RADEON_DOMAIN_GTT,
RADEON_FLAG_NO_INTERPROCESS_SHARING |
+   RADEON_FLAG_READ_ONLY |
(ring_type == RING_GFX ||
 ring_type == RING_COMPUTE ||
 ring_type == RING_DMA ?
RADEON_FLAG_GTT_WC : 0));
if (!pb)
   return false;
 
mapped = ws->base.buffer_map(pb, NULL, PIPE_TRANSFER_WRITE);
if (!mapped) {
   pb_reference(, NULL);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] radeonsi: use a separate allocator for fine fences

2017-12-05 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_pipe_common.c | 7 +++
 src/gallium/drivers/radeon/r600_pipe_common.h | 1 +
 src/gallium/drivers/radeonsi/si_fence.c   | 2 +-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index d85f9f0..9090e65 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -447,20 +447,25 @@ bool si_common_context_init(struct r600_common_context 
*rctx,
rctx->b.stream_uploader = u_upload_create(>b, 1024 * 1024,
  0, PIPE_USAGE_STREAM, 0);
if (!rctx->b.stream_uploader)
return false;
 
rctx->b.const_uploader = u_upload_create(>b, 128 * 1024,
 0, PIPE_USAGE_DEFAULT, 0);
if (!rctx->b.const_uploader)
return false;
 
+   rctx->cached_gtt_allocator = u_upload_create(>b, 16 * 1024,
+0, PIPE_USAGE_STAGING, 0);
+   if (!rctx->cached_gtt_allocator)
+   return false;
+
rctx->ctx = rctx->ws->ctx_create(rctx->ws);
if (!rctx->ctx)
return false;
 
if (sscreen->info.num_sdma_rings && !(sscreen->debug_flags & 
DBG(NO_ASYNC_DMA))) {
rctx->dma.cs = rctx->ws->cs_create(rctx->ctx, RING_DMA,
   r600_flush_dma_ring,
   rctx);
rctx->dma.flush = r600_flush_dma_ring;
}
@@ -491,20 +496,22 @@ void si_common_context_cleanup(struct r600_common_context 
*rctx)
rctx->ws->cs_destroy(rctx->gfx.cs);
if (rctx->dma.cs)
rctx->ws->cs_destroy(rctx->dma.cs);
if (rctx->ctx)
rctx->ws->ctx_destroy(rctx->ctx);
 
if (rctx->b.stream_uploader)
u_upload_destroy(rctx->b.stream_uploader);
if (rctx->b.const_uploader)
u_upload_destroy(rctx->b.const_uploader);
+   if (rctx->cached_gtt_allocator)
+   u_upload_destroy(rctx->cached_gtt_allocator);
 
slab_destroy_child(>pool_transfers);
slab_destroy_child(>pool_transfers_unsync);
 
if (rctx->allocator_zeroed_memory) {
u_suballocator_destroy(rctx->allocator_zeroed_memory);
}
rctx->ws->fence_reference(>last_gfx_fence, NULL);
rctx->ws->fence_reference(>last_sdma_fence, NULL);
r600_resource_reference(>eop_bug_scratch, NULL);
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index d1fdea0..a8e632c 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -388,20 +388,21 @@ struct r600_common_context {
struct si_screen*screen;
struct radeon_winsys*ws;
struct radeon_winsys_ctx*ctx;
enum radeon_family  family;
enum chip_class chip_class;
struct r600_ringgfx;
struct r600_ringdma;
struct pipe_fence_handle*last_gfx_fence;
struct pipe_fence_handle*last_sdma_fence;
struct r600_resource*eop_bug_scratch;
+   struct u_upload_mgr *cached_gtt_allocator;
unsignednum_gfx_cs_flushes;
unsignedinitial_gfx_cs_size;
unsignedgpu_reset_counter;
unsignedlast_dirty_tex_counter;
unsignedlast_compressed_colortex_counter;
unsignedlast_num_draw_calls;
 
struct threaded_context *tc;
struct u_suballocator   *allocator_zeroed_memory;
struct slab_child_pool  pool_transfers;
diff --git a/src/gallium/drivers/radeonsi/si_fence.c 
b/src/gallium/drivers/radeonsi/si_fence.c
index 0d165a1..3c4d754 100644
--- a/src/gallium/drivers/radeonsi/si_fence.c
+++ b/src/gallium/drivers/radeonsi/si_fence.c
@@ -144,21 +144,21 @@ static bool si_fine_fence_signaled(struct radeon_winsys 
*rws,
 
 static void si_fine_fence_set(struct si_context *ctx,
  struct si_fine_fence *fine,
  unsigned flags)
 {
uint32_t *fence_ptr;
 
assert(util_bitcount(flags & (PIPE_FLUSH_TOP_OF_PIPE | 
PIPE_FLUSH_BOTTOM_OF_PIPE)) == 1);
 
/* Use uncached system memory for the fence. */
-   u_upload_alloc(ctx->b.b.stream_uploader, 0, 4, 4,
+   u_upload_alloc(ctx->b.cached_gtt_allocator, 0, 4, 4,
   >offset, (struct pipe_resource **)>buf, 
(void **)_ptr);
if (!fine->buf)
return;
 
   

[Mesa-dev] [PATCH 2/6] radeonsi: print the buffer list for CHECK_VM

2017-12-05 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_debug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index 22609b7..385ce39 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -1069,20 +1069,21 @@ void si_check_vm_faults(struct r600_common_context *ctx,
fprintf(f, "Last apitrace call: %u\n\n",
sctx->apitrace_call_number);
 
switch (ring) {
case RING_GFX: {
struct u_log_context log;
u_log_context_init();
 
si_log_draw_state(sctx, );
si_log_compute_state(sctx, );
+   si_log_cs(sctx, , true);
 
u_log_new_page_print(, f);
u_log_context_destroy();
break;
}
case RING_DMA:
si_dump_dma(sctx, saved, f);
break;
 
default:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler

2017-12-05 Thread Jason Ekstrand
On Tue, Dec 5, 2017 at 9:08 AM, Chema Casanova 
wrote:

> El 30/11/17 a las 23:58, Jason Ekstrand escribió:
> > On Wed, Nov 29, 2017 at 6:50 PM, Jose Maria Casanova Crespo
> > > wrote:
> >
> > load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
> > surface format defined. So when reading 16-bit components with the
> > sampler we need to unshuffle two 16-bit components from each 32-bit
> > component.
> >
> > Using the sampler avoids the use of the byte_scattered_read message
> > that needs one message for each component and is supposed to be
> > slower.
> >
> > In the case of SKL+ we take advantage of a hardware feature that
> > automatically defines a channel mask based on the rlen value, so on
> > SKL+ we only use half of the registers without using a header in the
> > payload.
> > ---
> >  src/intel/compiler/brw_fs.cpp   | 31
> > +++
> >  src/intel/compiler/brw_fs_generator.cpp | 10 --
> >  2 files changed, 35 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.cpp
> > index 1ca4d416b2..9c543496ba 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -184,9 +184,17 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
> > fs_builder ,
> >  * a double this means we are only loading 2 elements worth of
> > data.
> >  * We also want to use a 32-bit data type for the dst of the
> > load operation
> >  * so other parts of the driver don't get confused about the
> > size of the
> > -* result.
> > +* result. On the case of 16-bit data we only need half of the
> > 32-bit
> > +* components on SKL+ as we take advance of using message
> > return size to
> > +* define an xy channel mask.
> >  */
> > -   fs_reg vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> > +   fs_reg vec4_result;
> > +   if (type_sz(dst.type) == 2 && (devinfo->gen >= 9)) {
> > +  vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
> > +  vec4_result = retype(vec4_result, BRW_REGISTER_TYPE_HF);
> > +   } else {
> > +  vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> > +   }
> >
> > fs_inst *inst =
> > bld.emit(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_LOGICAL,
> >  vec4_result, surf_index, vec4_offset);
> > inst->size_written = 4 *
> > vec4_result.component_size(inst->exec_size);
> > @@ -197,8 +205,23 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
> > fs_builder ,
> > }
> >
> > vec4_result.type = dst.type;
> > -   bld.MOV(dst, offset(vec4_result, bld,
> > -   (const_offset & 0xf) /
> > type_sz(vec4_result.type)));
> > +
> > +   if (type_sz(dst.type) == 2) {
> > +  /* 16-bit types need to be unshuffled as each pair of
> > 16-bit components
> > +   * is packed on a 32-bit compoment because we are using a
> > 32-bit format
> > +   * in the surface of uniform that is read by the sampler.
> > +   * TODO: On BDW+ mark when an uniform has 16-bit type so we
> > could setup a
> > +   * surface format of 16-bit and use the 16-bit return
> > format at the
> > +   * sampler.
> > +   */
> > +  vec4_result.stride = 2;
> > +  bld.MOV(dst, byte_offset(offset(vec4_result, bld,
> > +  (const_offset & 0x7) / 4),
> > +   (const_offset & 0x7) / 2 % 2 * 2));
> > +   } else {
> > +  bld.MOV(dst, offset(vec4_result, bld,
> > +  (const_offset & 0xf) /
> > type_sz(vec4_result.type)));
> > +   }
> >
> >
> > This seems overly complicated.  How about something like
>
> > fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
> > switch (type_sz(dst.type)) {
> > case 2:
> >shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1);
> >bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
> >break;
> > case 4:
> >bld.MOV(dst, dw);
> >break;
> > case 8:
> >shuffle_32bit_load_result_to_64bit_data(bld, dst, dw, 1);
> >break;
> > default:
> >unreachable();
> > }
>
> This implementation it is really more clear. Tested and works perfectly
> fine.
>
> >
> >
> >  }
> >
> >  /**
> > diff --git a/src/intel/compiler/brw_fs_generator.cpp
> > b/src/intel/compiler/brw_fs_generator.cpp
> > index a3861cd68e..00a4e29147 100644
> > --- a/src/intel/compiler/brw_fs_generator.cpp
> > +++ b/src/intel/compiler/brw_fs_generator.cpp
> > @@ -1381,12 +1381,18 @@
> > fs_generator::generate_varying_pull_constant_load_gen7(fs_inst
> *inst,
> > uint32_t 

Re: [Mesa-dev] [PATCH 3/3] st/mesa: remove unneeded #include in st_format.h

2017-12-05 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Tue, Dec 5, 2017 at 5:59 PM, Brian Paul  wrote:
> ---
>  src/mesa/state_tracker/st_format.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/src/mesa/state_tracker/st_format.h 
> b/src/mesa/state_tracker/st_format.h
> index 3dd9c10..466b5d0 100644
> --- a/src/mesa/state_tracker/st_format.h
> +++ b/src/mesa/state_tracker/st_format.h
> @@ -33,7 +33,6 @@
>  #include "main/formats.h"
>  #include "main/glheader.h"
>
> -#include "pipe/p_defines.h"
>  #include "pipe/p_format.h"
>
>  #ifdef __cplusplus
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Jason Ekstrand
On Tue, Dec 5, 2017 at 10:34 AM, Rogovin, Kevin 
wrote:

> Hi,
>
> >This isn't true.  100% of the intel_mipmap_tree -> blorp_surf
> translations are handled by that function.
> >  It's a perfectly reasonable place to handle these things.  It could
> also be handled in genX(blorp_exec) if that makes someone more comfortable.
>
> This is where I placed the ASTC enumeration setting, in genX(blorp_exec).
> I added a boolean signaling if the caller to blorp believed it would sample
> from an ASTC, if it did it sets the enum as ASTC otherwise as AUX.
>
> I confess I am not super familiar with blorp, but as far as I can tell,
> the only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage
> since an ASTC surface cannot be used for an FBO.
>

I'd have to think about it harder but even those are not likely to actually
get ASTC decode.  Maybe for some sort of decompression thing but if you're
doing GetCompressedTexImage, it'll probably turn into a blorp_copy which
will use R32G32B32A32_UINT.


> > The problem is that a single invalidate doesn't actually cause a
> synchronization point in the rendering operation.  All it does is torch the
> texture cache and it
> > does so immediately and in parallel with currently active rendering.  If
> you can't have ASTC5x5 in the texture cache with a CCS_E surface, then we
> need to
> > do a CS stall to ensure that the previous draw is complete before we
> invalidate.  Otherwise, if the draw with ASTC5x5 is still in-flight, the
> texture cache will
> > immediately start to get populated with ASTC5x5 data again.
>
> Ahh futz, I forget to add that stall.. by luck the guilty application
> worked anyways (when I first wrote the work I did intel_batchbuffer_flush()
> and relaxed it down to texture invalidate).
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] mesa: s/%u/%d/ in _mesa_error() call in check_layer()

2017-12-05 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Tue, Dec 5, 2017 at 6:00 PM, Brian Paul  wrote:
> The layer parameter is signed.  Fixes the error message seen when
> running the arb_texture_multisample-errors test which checks a
> negative layer value.
> ---
>  src/mesa/main/fbobject.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
> index f7702f1..30287ab 100644
> --- a/src/mesa/main/fbobject.c
> +++ b/src/mesa/main/fbobject.c
> @@ -3189,8 +3189,7 @@ check_layer(struct gl_context *ctx, GLenum target, 
> GLint layer,
>  * and layer is negative."
>  */
> if (layer < 0) {
> -  _mesa_error(ctx, GL_INVALID_VALUE,
> -  "%s(layer %u < 0)", caller, layer);
> +  _mesa_error(ctx, GL_INVALID_VALUE, "%s(layer %d < 0)", caller, layer);
>return false;
> }
>
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] st/mesa: swizzle argument when there's a vector size mismatch

2017-12-05 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Tue, Dec 5, 2017 at 5:26 AM, Ilia Mirkin  wrote:
> GLSL IR operation arguments can sometimes have an implicit swizzle as a
> result of a vector arg and a scalar arg, where the scalar argument is
> implicitly expanded to the size of the vector argument.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103955
> Signed-off-by: Ilia Mirkin 
> ---
>
> v1 -> v2:
>  - fix typo that caused compilation failure
>  - remove ir_binop_interpolate_at_* from getting the resizing treatment
>
> I went through all the ir_binops and ir_triops, and those two seem like the
> only ones with weird arguments. I think.
>
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 8eeae86dabb..740c197c74b 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -1341,10 +1341,33 @@ glsl_to_tgsi_visitor::visit_expression(ir_expression* 
> ir, st_src_reg *op)
> st_dst_reg result_dst;
>
> int vector_elements = ir->operands[0]->type->vector_elements;
> -   if (ir->operands[1]) {
> +   if (ir->operands[1] &&
> +   ir->operation != ir_binop_interpolate_at_offset &&
> +   ir->operation != ir_binop_interpolate_at_sample) {
> +  st_src_reg *swz_op = NULL;
> +  if (vector_elements > ir->operands[1]->type->vector_elements) {
> + assert(ir->operands[1]->type->vector_elements == 1);
> + swz_op = [1];
> +  } else if (vector_elements < ir->operands[1]->type->vector_elements) {
> + assert(ir->operands[0]->type->vector_elements == 1);
> + swz_op = [0];
> +  }
> +  if (swz_op) {
> + uint16_t swizzle_x = GET_SWZ(swz_op->swizzle, 0);
> + swz_op->swizzle = MAKE_SWIZZLE4(swizzle_x, swizzle_x,
> + swizzle_x, swizzle_x);
> +  }
>vector_elements = MAX2(vector_elements,
>   ir->operands[1]->type->vector_elements);
> }
> +   if (ir->operands[2] &&
> +   ir->operands[2]->type->vector_elements != vector_elements) {
> +  /* This can happen with ir_triop_lrp, i.e. glsl mix */
> +  assert(ir->operands[2]->type->vector_elements == 1);
> +  uint16_t swizzle_x = GET_SWZ(op[2].swizzle, 0);
> +  op[2].swizzle = MAKE_SWIZZLE4(swizzle_x, swizzle_x,
> +swizzle_x, swizzle_x);
> +   }
>
> this->result.file = PROGRAM_UNDEFINED;
>
> --
> 2.13.6
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin
Hi,

>This isn't true.  100% of the intel_mipmap_tree -> blorp_surf translations are 
>handled by that function. 
>  It's a perfectly reasonable place to handle these things.  It could also be 
> handled in genX(blorp_exec) if that makes someone more comfortable.

This is where I placed the ASTC enumeration setting, in genX(blorp_exec). I 
added a boolean signaling if the caller to blorp believed it would sample from 
an ASTC, if it did it sets the enum as ASTC otherwise as AUX.

I confess I am not super familiar with blorp, but as far as I can tell, the 
only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage since an 
ASTC surface cannot be used for an FBO.

> The problem is that a single invalidate doesn't actually cause a 
> synchronization point in the rendering operation.  All it does is torch the 
> texture cache and it 
> does so immediately and in parallel with currently active rendering.  If you 
> can't have ASTC5x5 in the texture cache with a CCS_E surface, then we need to
> do a CS stall to ensure that the previous draw is complete before we 
> invalidate.  Otherwise, if the draw with ASTC5x5 is still in-flight, the 
> texture cache will 
> immediately start to get populated with ASTC5x5 data again.

Ahh futz, I forget to add that stall.. by luck the guilty application worked 
anyways (when I first wrote the work I did intel_batchbuffer_flush() and 
relaxed it down to texture invalidate).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin

> Are you saying that this bug extends over hardware context?

Different HW contexts imply different execbuffer2 ioctl's. The kernel inserts a 
full blown flush of everything after (or before, I cannot remember) each 
execbuffer2 call. This way there is context isolation in HW buggineness.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Jason Ekstrand
On Tue, Dec 5, 2017 at 10:17 AM, Pohjolainen, Topi <
topi.pohjolai...@gmail.com> wrote:

> On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote:
> > Hi,
> >
> >
> > >> Here are my comments of the patch posted:
> > >>
> > >>  1.  it is essentially replication and moving around of the code of
> the patch series posted earlier but missing various
> > >>   important bits: preventing the sampler from using the auxiliary
> buffer (this requires to modify surface state
> > >>   sent in brw_wm_surfaces.c). It also does not cover blorp
> sufficiently (blorp might read from an ASTC5x5
> > >>   and there are more paths in blorp than blorp_surf_for_miptree()
> that sample from surfaces.
> > >>
> >
> > >Can you explain both more in detail? Resolves done in
> > >brw_predraw_resolve_inputs() mean that there is nothing interesting in
> the aux buffers and surface setup won't therefore enable auxiliary for
> texture surfaces.
> >
> > That there is nothing interesting is irrelevant to the sampler if the
> SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the
> SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the
> GPU command; The sampler will always try to read the auxiliary buffer if it
> is given the opportunity to do so. Indeed, it is quite feasible that less
> bandwidth is consumed if the sampler is given the chance to read an
> auxiliary buffer in place of the buffer; as such even if the surface is
> resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for
> HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer
> is fully resolved (see inte_mipmap_tree_sample_with_hiz() in
> intel_mipmap_tree.c).
>

We do have code which checks for the resolve state and disables the
auxiliary buffer.  However, I don't like relying on it for correctness as
that's bitten us before.  I think it'd be better to check the W/A state for
ASTC5x5, assert that everything is resolved, and manually disable aux in
that case.


> > > In case of blorp, as far as I know all operations sampling something
> should go thru blorp_surf_for_miptree(). Can you point out cases that don't?
> >
> > Blorp is used in more than blorp_surf_for_miptree(), for example
> implementing GetTexImage(). Indeed, it is possible for blorp to sample from
> an ASTC5x5 (you can see this handled in the patch series I posted). I chose
> the hammer that the default is to just assume blorp is going to access
> auxiliary buffers unless a flag is set when the caller knows that blorp is
> going to sample from an astc5x5 (against see my patch series).
>

This isn't true.  100% of the intel_mipmap_tree -> blorp_surf translations
are handled by that function.  It's a perfectly reasonable place to handle
these things.  It could also be handled in genX(blorp_exec) if that makes
someone more comfortable.

> >Right. In the case of sampling both aux and astc5x5 in the same draw
> cycle the only thing to do is to disable aux. With my question of direction
> I meant the texture
> > > cache flush between two cycles. Do we need to flush in both cases
> > > 1) ASTC5x5 in first cycle and AUX in the following
> > > 2) AUX in first cycle and ASTC5x5 in the following
> >
> > YES we need to flush in both cases. What is happening is that the
> sampler hardware is bugged. Let us suppose it was bugged in only 1
> direction, take 1. Then if the sampler first samples from an ASTC5x5 then
> an AUX it would not hang, but the other way it would. However, if there are
> multiple draws in flight where one samples from an ASTC5x5 and the other
> does not, the command buffer order gives ZERO guarantee that the sampler
> will sample in that order because fragments get executed out-of-order even
> across draw calls even within a subslice (this is why sendc is needed at
> end of shader in GEN).
> >
>
> I need to clarify this bit a little more. In the current setup we have only
> one draw cycle in flight per context that uses sample engine. There may be
> blitter calls in parallel (although I'm not sure) but they don't use
> sampling
> engine anyway.
>
> The draw cycle itself may have multiple draws programmed into it but as
> they
> all are based on the same surface setup there is naturally no flushing
> problem.
> Surfaces with auxiliary would be resolved before the draw and programmed
> without auxiliary buffer.
>

The problem is that a single invalidate doesn't actually cause a
synchronization point in the rendering operation.  All it does is torch the
texture cache and it does so immediately and in parallel with currently
active rendering.  If you can't have ASTC5x5 in the texture cache with a
CCS_E surface, then we need to do a CS stall to ensure that the previous
draw is complete before we invalidate.  Otherwise, if the draw with ASTC5x5
is still in-flight, the texture cache will immediately start to get
populated with ASTC5x5 data again.
___
mesa-dev mailing 

Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Pohjolainen, Topi
On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote:
> Hi,
> 
> 
> >> Here are my comments of the patch posted:
> >> 
> >>  1.  it is essentially replication and moving around of the code of the 
> >> patch series posted earlier but missing various
> >>   important bits: preventing the sampler from using the auxiliary 
> >> buffer (this requires to modify surface state
> >>   sent in brw_wm_surfaces.c). It also does not cover blorp 
> >> sufficiently (blorp might read from an ASTC5x5
> >>   and there are more paths in blorp than blorp_surf_for_miptree() that 
> >> sample from surfaces.
> >> 
> 
> >Can you explain both more in detail? Resolves done in
> >brw_predraw_resolve_inputs() mean that there is nothing interesting in the 
> >aux buffers and surface setup won't therefore enable auxiliary for texture 
> >surfaces.
> 
> That there is nothing interesting is irrelevant to the sampler if the 
> SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the 
> SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the 
> GPU command; The sampler will always try to read the auxiliary buffer if it 
> is given the opportunity to do so. Indeed, it is quite feasible that less 
> bandwidth is consumed if the sampler is given the chance to read an auxiliary 
> buffer in place of the buffer; as such even if the surface is resolved one 
> may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 
> programs to use the HiZ auxiliary buffer even if the depth buffer is fully 
> resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).
> 
> > In case of blorp, as far as I know all operations sampling something should 
> > go thru blorp_surf_for_miptree(). Can you point out cases that don't?
> 
> Blorp is used in more than blorp_surf_for_miptree(), for example implementing 
> GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 
> (you can see this handled in the patch series I posted). I chose the hammer 
> that the default is to just assume blorp is going to access auxiliary buffers 
> unless a flag is set when the caller knows that blorp is going to sample from 
> an astc5x5 (against see my patch series).
> 
> >Right. In the case of sampling both aux and astc5x5 in the same draw cycle 
> >the only thing to do is to disable aux. With my question of direction I 
> >meant the texture 
> > cache flush between two cycles. Do we need to flush in both cases
> > 1) ASTC5x5 in first cycle and AUX in the following
> > 2) AUX in first cycle and ASTC5x5 in the following
> 
> YES we need to flush in both cases. What is happening is that the sampler 
> hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. 
> Then if the sampler first samples from an ASTC5x5 then an AUX it would not 
> hang, but the other way it would. However, if there are multiple draws in 
> flight where one samples from an ASTC5x5 and the other does not, the command 
> buffer order gives ZERO guarantee that the sampler will sample in that order 
> because fragments get executed out-of-order even across draw calls even 
> within a subslice (this is why sendc is needed at end of shader in GEN).
> 

I need to clarify this bit a little more. In the current setup we have only
one draw cycle in flight per context that uses sample engine. There may be
blitter calls in parallel (although I'm not sure) but they don't use sampling
engine anyway.

The draw cycle itself may have multiple draws programmed into it but as they
all are based on the same surface setup there is naturally no flushing problem.
Surfaces with auxiliary would be resolved before the draw and programmed
without auxiliary buffer.

Are you saying that this bug extends over hardware context?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104035] When will the egl introp for vaapi be implemented

2017-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104035

Vedran Miletić  changed:

   What|Removed |Added

 CC||ved...@miletic.net

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: move cloning of NIR shader for compute

2017-12-05 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Tue, Dec 5, 2017 at 4:05 PM, Rob Clark  wrote:
> Since in the NIR case, driver takes ownership of the NIR shader, we need
> to clone what is passed to the driver.  Normally this is done as part of
> creating the shader variant (where is clone is anyways needed).  But
> compute shaders have no variants, so we were cloning earlier.
>
> The problem is that after the NIR linking optimizations, we ended up
> cloning *before* all the lowering passes where done.
>
> So move this into st_get_cp_variant(), to make compute shaders work more
> like other shader stages.
>
> Signed-off-by: Rob Clark 
> ---
>  src/mesa/state_tracker/st_glsl_to_nir.cpp | 2 +-
>  src/mesa/state_tracker/st_program.c   | 5 -
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
> b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> index 5d18e7b62bf..36adf55cd45 100644
> --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> @@ -431,7 +431,7 @@ set_st_program(struct gl_program *prog,
>stcp = (struct st_compute_program *)prog;
>stcp->shader_program = shader_program;
>stcp->tgsi.ir_type = PIPE_SHADER_IR_NIR;
> -  stcp->tgsi.prog = nir_shader_clone(NULL, nir);
> +  stcp->tgsi.prog = nir;
>break;
> default:
>unreachable("unknown shader stage");
> diff --git a/src/mesa/state_tracker/st_program.c 
> b/src/mesa/state_tracker/st_program.c
> index 5c0a58104fc..258f5e47cbe 100644
> --- a/src/mesa/state_tracker/st_program.c
> +++ b/src/mesa/state_tracker/st_program.c
> @@ -1659,7 +1659,10 @@ st_get_cp_variant(struct st_context *st,
>v = CALLOC_STRUCT(st_basic_variant);
>if (v) {
>   /* fill in new variant */
> - v->driver_shader = pipe->create_compute_state(pipe, tgsi);
> + struct pipe_compute_state cs = *tgsi;
> + if (tgsi->ir_type == PIPE_SHADER_IR_NIR)
> +cs.prog = nir_shader_clone(NULL, tgsi->prog);;
> + v->driver_shader = pipe->create_compute_state(pipe, );
>   v->key = key;
>
>   /* insert into list */
> --
> 2.13.6
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104119] radv: OpBitFieldInsert produces 0 with a loop counter for Insert

2017-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104119

--- Comment #5 from Samuel Pitoiset  ---
Patch is on the list.

https://patchwork.freedesktop.org/series/34930/

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] radv: enable lowering of nir_op_bitfield_insert

2017-12-05 Thread Samuel Pitoiset
Otherwise it's replaced by
"vec1 32 ssa_108 = load_const (0x /* 0.00 */)", which
looks clearly wrong.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104119
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_shader.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 4a3fdfa80e..0b19d23fa2 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -66,6 +66,7 @@ static const struct nir_shader_compiler_options nir_options = 
{
.lower_extract_byte = true,
.lower_extract_word = true,
.lower_ffma = true,
+   .lower_bitfield_insert = true,
.max_unroll_iterations = 32
 };
 
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radv: enable lowering of nir_op_bitfield_extract

2017-12-05 Thread Samuel Pitoiset
This instruction should also be lowered correctly.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_shader.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 0b19d23fa2..044bcd0641 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -67,6 +67,7 @@ static const struct nir_shader_compiler_options nir_options = 
{
.lower_extract_word = true,
.lower_ffma = true,
.lower_bitfield_insert = true,
+   .lower_bitfield_extract = true,
.max_unroll_iterations = 32
 };
 
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/miptree: Use the tiling from the modifier instead of the BO

2017-12-05 Thread Kristian Høgsberg
On Tue, Dec 5, 2017 at 9:43 AM, Kristian Høgsberg  wrote:
> On Tue, Dec 5, 2017 at 8:49 AM, Jason Ekstrand  wrote:
>> On Tue, Dec 5, 2017 at 8:23 AM, Kristian Høgsberg 
>> wrote:
>>>
>>> On Tue, Dec 5, 2017 at 7:57 AM, Jason Ekstrand 
>>> wrote:
>>> > On Tue, Dec 5, 2017 at 1:22 AM, Daniel Stone 
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> On 18 November 2017 at 00:10, Jason Ekstrand 
>>> >> wrote:
>>> >> > This fixes a bug where we were taking the tiling from the BO
>>> >> > regardless
>>> >> > of what the modifier said.  When we got images in from Vulkan where
>>> >> > it
>>> >> > doesn't set the tiling on the BO, we would treat them as linear even
>>> >> > though the modifier expressly said to treat it as Y-tiled.
>>> >>
>>> >> For some reason I thought Ken had already reviewed this and it landed,
>>> >> until Kristian mentioned last night.
>>> >
>>> >
>>> > There's been some discussion about what the right thing to do is here.
>>> > I've
>>> > got a patch (which is now landed) which will make us set the tiling from
>>> > Vulkan just like GL does.  It's rather annoying but I think that's
>>> > marginally better.
>>>
>>> I don't know that it's better - it reinforces the notion that you have
>>> to make the kernel-side tiling match for the dma-buf import extension
>>> to work. I think it'd be better to land these patches (btw, Rb: me
>>> (can you even do parenthetical Rbs?))
>>
>>
>> I'll allow it. :)
>>
>>>
>>> and call set-tiling in mesa.
>>
>>
>> Yeah, I think that's reasonable.  Really, this should only be a problem if
>> we have a bunch of users trying to use the same BO with different modifiers.
>> It can happen in theory (immagine two images in the same BO, one X and one
>> Y) but it's a pretty crazy case.
>
> It's not a complete solution, of course, but at least we're kicking
> the can down the road of increasingly esoteric use cases.
>
>>>
>>> The
>>> assumption being that if the tiling doesn't match the modifier, then
>>> maybe the producer didn't care about kernel tiling? Alternatively,
>>> could we set bo->tiling = INCONSISTENT or such in mesa and then use
>>> that to avoid the gtt map paths?

Actually, for compressed textures, you already must have a way to deal
with glTexSubImage2D and similar without falling back to GTT maps. Can
you just handle miptrees with mismatched modifiers (or perhaps just
any valid modifier) the same way?

Kristian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/miptree: Use the tiling from the modifier instead of the BO

2017-12-05 Thread Kristian Høgsberg
On Tue, Dec 5, 2017 at 8:49 AM, Jason Ekstrand  wrote:
> On Tue, Dec 5, 2017 at 8:23 AM, Kristian Høgsberg 
> wrote:
>>
>> On Tue, Dec 5, 2017 at 7:57 AM, Jason Ekstrand 
>> wrote:
>> > On Tue, Dec 5, 2017 at 1:22 AM, Daniel Stone 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On 18 November 2017 at 00:10, Jason Ekstrand 
>> >> wrote:
>> >> > This fixes a bug where we were taking the tiling from the BO
>> >> > regardless
>> >> > of what the modifier said.  When we got images in from Vulkan where
>> >> > it
>> >> > doesn't set the tiling on the BO, we would treat them as linear even
>> >> > though the modifier expressly said to treat it as Y-tiled.
>> >>
>> >> For some reason I thought Ken had already reviewed this and it landed,
>> >> until Kristian mentioned last night.
>> >
>> >
>> > There's been some discussion about what the right thing to do is here.
>> > I've
>> > got a patch (which is now landed) which will make us set the tiling from
>> > Vulkan just like GL does.  It's rather annoying but I think that's
>> > marginally better.
>>
>> I don't know that it's better - it reinforces the notion that you have
>> to make the kernel-side tiling match for the dma-buf import extension
>> to work. I think it'd be better to land these patches (btw, Rb: me
>> (can you even do parenthetical Rbs?))
>
>
> I'll allow it. :)
>
>>
>> and call set-tiling in mesa.
>
>
> Yeah, I think that's reasonable.  Really, this should only be a problem if
> we have a bunch of users trying to use the same BO with different modifiers.
> It can happen in theory (immagine two images in the same BO, one X and one
> Y) but it's a pretty crazy case.

It's not a complete solution, of course, but at least we're kicking
the can down the road of increasingly esoteric use cases.

>>
>> The
>> assumption being that if the tiling doesn't match the modifier, then
>> maybe the producer didn't care about kernel tiling? Alternatively,
>> could we set bo->tiling = INCONSISTENT or such in mesa and then use
>> that to avoid the gtt map paths?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] meson: Fix building gallium media targets with gallium-xlib glx

2017-12-05 Thread Dylan Baker
Signed-off-by: Dylan Baker 
---
 meson.build | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 3e8ea7d17e0..bf6bd551f08 100644
--- a/meson.build
+++ b/meson.build
@@ -1107,9 +1107,9 @@ if with_platform_x11
 dep_xcb_glx = dependency('xcb-glx', version : '>= 1.8.1')
 dep_xxf86vm = dependency('xxf86vm', required : false)
   endif
-  if with_any_vk or with_glx == 'dri' or
-  (with_gallium_vdpau or with_gallium_xvmc or with_gallium_omx or
-   with_gallium_xa)
+  if (with_any_vk or with_glx == 'dri' or
+   (with_gallium_vdpau or with_gallium_xvmc or with_gallium_omx or
+with_gallium_xa))
 dep_xcb = dependency('xcb')
 dep_x11_xcb = dependency('x11-xcb')
   endif
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 103126] glthread doesn't offload anything in Witcher 2

2017-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103126

--- Comment #2 from Alexander Tsoy  ---
I had the same problem with other eON ports: Spec Ops: The Line and Overlord at
least. It's turned out that it's due to disabled assembly in my 32-bit mesa
build (--disable-asm). Not sure if it worth fixing this. Leaving the bug open
for developers to decide.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104119] radv: OpBitFieldInsert produces 0 with a loop counter for Insert

2017-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104119

--- Comment #4 from Samuel Pitoiset  ---
Possible fix:

diff --git i/src/amd/vulkan/radv_shader.c w/src/amd/vulkan/radv_shader.c
index 4a3fdfa80e..0b19d23fa2 100644
--- i/src/amd/vulkan/radv_shader.c
+++ w/src/amd/vulkan/radv_shader.c
@@ -66,6 +66,7 @@ static const struct nir_shader_compiler_options nir_options =
{
.lower_extract_byte = true,
.lower_extract_word = true,
.lower_ffma = true,
+   .lower_bitfield_insert = true,
.max_unroll_iterations = 32
 };

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] glapi/check_table: Remove 'extern "C"' block

2017-12-05 Thread Dylan Baker
Quoting Emil Velikov (2017-12-05 07:36:25)
> On 4 December 2017 at 23:57, Dylan Baker  wrote:
> > Quoting Emil Velikov (2017-11-23 11:04:34)
> >> On 20 November 2017 at 23:12, Dylan Baker  wrote:
> >> > This doesn't actually accomplish what it's meant to do, as extern C
> >> > doesn't undefine __cplusplus, so the included headers define a template
> >> > (because __cplusplus is defined), but then that code is in an 'extern
> >> > "C"' block, and explosion.
> >> >
> >> The commit does sound a bit off, admittedly I don't fully grok what
> >> you're trying to say.
> >>
> >> The extern "C" { } construct annotates anything within as if it were a
> >> normal C code. Thus functions will have the C linkage (otherwise
> >> they'll have the C++ mangling) and structs are using the C rules.
> >> So if you have a C++ header further down the chain it will be
> >> considered as C and hell will break loose.
> >>
> >> This patch is correct, just sent a fix for glapitable.h, while glapi.h
> >> was handled with d38839a498b38b745f5bdf6b96fbafa2792670be.
> >>
> >> As-is "struct _glapi_table" will be interpret as C++ one, which may
> >> not have the same guarantees about sizeof and pointer arithmetic that
> >> we're using in the test.
> >>
> >> Hope the above provides some inspiration towards a better commit message.
> >>
> >> Thanks
> >> Emil
> >
> > So here's the problem I run into:
> >
> > In file included from ../include/c99_compat.h:28:0,
> >  from ../src/util/macros.h:29,
> >  from ../src/mapi/glapi/glapi.h:47,
> >  from ../src/mapi/glapi/tests/check_table.cpp:28:
> > ../include/no_extern_c.h:47:1: error: template with C linkage
> >  template class _IncludeInsideExternCNotPortable;
> >  ^~~~
> > [54/93] Compiling C++ object 'src/gtest/gtest@sta/src_gtest-all.cc.o'.
> >
> > So, my commit message still makes sense to me, since you can't put a 
> > template in
> > an extern "C" block, and that template is guarded by __cplusplus, which is 
> > still
> > defined even in an extern "C" block.
> >
> This is one symptom, of the issue described before. The code in
> c99_compat.h was explicitly added by Jose to make the issue obvious.
> 
> > Do you have a commit message in mind that is better?
> 
> Feel free to reuse as much or as little of my explanation. Borrowing
> from analogous fixes also works ;-)
> Some examples:
> 237dcb4aa7c39c59bfd225ae3d73caf709be216d
> a177a13033cf9356eb26e8757055037a54268a18
> 36ad8265489fc66ab45b9b74dafa353a93bdb03b
> 
> I'm going through 4/8 - seems like there's a bunch of other bugs around :-\
> Feel free to push the remainder of the series with the nitpicks addressed.
> 
> Thanks
> Emil

The APPLE extensions one?


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 00/44] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

2017-12-05 Thread Chema Casanova
El 05/12/17 a las 06:16, Jason Ekstrand escribió:
> A couple of notes:
>
>  1) I *think* I gave you enough reviews to land the UBO/SSBO part and
> the optimizations in 26-28.  If reviews are still missing anywhere,
> please let me know.  If not, let's try and get that part landed.

The series is almost ready to land, I have only pending to address your
feedback about use untyped_read for reading vec3 ssbos.

The only missing explicit R-b is that " [PATCH v4 28/44] i965/fs: Use
untyped_surface_read for 16-bit load_ssbo" and "[PATCH v4 23/44]
i965/fs: Enables 16-bit load_ubo with sampler" i've just answered your
review to confirm the R-b.

I expect to finish today vec3 ssbo and send the series to Jenkins before
landing, confirm your "pending" R-b, do a last rebase to master and ask
for a push.

>
>  2) I send out a patch to rewrite assign_constant_locations which I
> think should make it automatically handle 8 and 16-bit values as
> well.  I'd rather do that than more special casing if everything works
> out ok.

I'm testing this patch with 16-bits and make sure whatever is needed to
have 16-bit working.

>
>  3) I sent out a series of patches to enable pushing of UBOs in
> Vulkan.  If we're not careful, these will clash with 16bit storage as
> UBO support suddenly has to imply push constant support.  That said,
> I"m willing to wait at least a little while before landing them to let
> us get 16bit push constant support sorted out.  The UBO pushing
> patches give us a nice little performance boost but we're nowhere near
> a release and I don't want it blocking you.

That would be my next priority, so we would only have pending to land
the 16-bit input/output support to finish this extension.

Chema

> On Wed, Nov 29, 2017 at 6:07 PM, Jose Maria Casanova Crespo
> > wrote:
>
> Hello,
>
> this is the V4 series for the implementation of the
> SPV_KHR_16bit_storage
> and VK_KHR_16bit_storage extensions on the anv vulkan driver, in
> addition
> to the GLSL and NIR support needed.
>
> The original series can be found here [1], the following v2 [2]
> and v3 [3].
>
> In short V4 includes the following:
>
>  * Reorder the series to enable features as they are implemented,
> the series
>    now enables first UBO and SSBO support, and then inputs/outputs and
>    finally push constants.
>  * Support the byte scattered read/write messages with different
> bit sizes
>    byte/word/dword.
>  * Refactor of the store_ssbo code and also fix stores when
> writemask was .yz
>  * Uses the sampler for load_ubo avoiding the initial
> implementation of
>    the series using byte_scattered_read.
>  * Addressed all the feedback provided by Jason and Topi on v3 review.
>
> This series is also available at:
>
> https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4
> 
>
> The objective is to start landing part of this series, all
> feedback has been
> addressed for SSBO and UBO. But for input/outputs features it will
> probably
> need another iteration as was not completely reviewed. It is also
> needed
> to define the approach for push constants issues before of after
> landing
> the support with this implementation.
>
> Patches 1-5 and 8-17 have already been reviewed. Patch 7 was already
> reviewed but as it has changed too much i would appreciate another
> review. When patches until 25 or 28 are reviewed we could land
> UBOs and
> SSBOs support.
>
> Finally an updated overview of the patches:
>
> Patches 1-2 add 16-bit float, int and uint types to GLSL. This is
> needed because NIR uses GLSL types internally. We use the enums
> already defined at AMD_gpu_shader_half_float and NV_gpu_shader
> extensions. Patch 2 updates mesa/st, in order to avoid warnings for
> types not handled on a switch.
>
> Patches 3-6 add NIR support for those new GLSL 16-bit types,
> conversion opcodes, and rounding modes for float to half-float
> conversions.
>
> Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR support.
>
> Patches 10-12 add general 16-bit support for i965. This includes
> handling of new types on several general purpose methods,
> update/remove some asserts.
>
> Patches 14-17 add support for 32 to 16-bit conversions for i965,
> including rounding mode opcodes (needed for float to half-float
> conversions), and an optimization that removes superfluous rounding
> mode sets.
>
> Patches 18-21 add and use two new messages: byte scattered read and
> write. Those were needed because untyped surface message has a fixed
> 32-bit write size. Those messages are used on the 16-bit support of
> store SSBO, load SSBO and load shared.
>
> Patch 22 adds helpers to allow un/shuffle 

[Mesa-dev] [Bug 104119] radv: OpBitFieldInsert produces 0 with a loop counter for Insert

2017-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104119

--- Comment #3 from Samuel Pitoiset  ---
Something is wrong in the NIR.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable disk shader cache by default

2017-12-05 Thread Mark Janes
Jordan Justen  writes:

> On 2017-11-08 17:26:47, Timothy Arceri wrote:
>> Reviewed-by: Timothy Arceri 
>> 
>> Mark may want to consider adding some of the once a day type CI runs for 
>> this. For example running the test suite for two consecutive runs on the 
>> same build so that the second run uses the shader cache and also a 
>> second run the uses MESA_GLSL=cache_fb to force testing of the cache 
>> fallback path.
>
> Yeah. We discussed this previously, but I don't think it's been
> implemented yet. My opinion is that it could perhaps be a weekly test.

This automation is implemented now. It found the issues reported in
https://bugs.freedesktop.org/show_bug.cgi?id=103988

My opinion is that this test strategy is inadequate.  Adding a dimension
to the test matrix has high cost, especially when combined with other
dimensions of the test matrix (does shader cache need to be tested for
32/64 bit builds? For hswgt1/hswgt2/hswgt3e?).

Since 103988 is not hardware specific, and is not a regression, it's not
something that could only have been caught by CI.  I'm surprised to find
it at this point, considering piglit was used to validate the feature.

I'd be happy if there was at least a smoke test where complex programs
are cached, with the intent of exercising the shader cache mechanism
directly.  It doesn't have to be exhaustive to be effective.  Seems like
a good idea to at least directly test the issue that was fixed in
103988 tests.

> We also discussed a nir serialization test, similar to our current nir
> clone daily test. I don't think this is implemented yet either.
>
> -Jordan
>
>> 
>> On 09/11/17 11:58, Jordan Justen wrote:
>> > f9d5a7add42af5a2e4410526d1480a08f41317ae along with
>> > a16dc04ad51c32e5c7d136e4dd6273d983385d3f appears to have fixed the one
>> > known regression with shader cache. (Deus Ex instability.)
>> > 
>> > We should enable the shader cache by default to stabilize it before
>> > the next major Mesa release.
>> > 
>> > Signed-off-by: Jordan Justen 
>> > ---
>> >   docs/relnotes/17.4.0.html  | 2 +-
>> >   src/mesa/drivers/dri/i965/brw_disk_cache.c | 3 ---
>> >   2 files changed, 1 insertion(+), 4 deletions(-)
>> > 
>> > diff --git a/docs/relnotes/17.4.0.html b/docs/relnotes/17.4.0.html
>> > index f81b5bd62d3..48dcd5cce38 100644
>> > --- a/docs/relnotes/17.4.0.html
>> > +++ b/docs/relnotes/17.4.0.html
>> > @@ -44,7 +44,7 @@ Note: some of the new features are only available with 
>> > certain drivers.
>> >   
>> >   
>> >   
>> > -Disk shader cache support for i965 when MESA_GLSL_CACHE_DISABLE 
>> > environment variable is set to "0" or "false"
>> > +Disk shader cache support for i965
>> >   
>> >   
>> >   Bug fixes
>> > diff --git a/src/mesa/drivers/dri/i965/brw_disk_cache.c 
>> > b/src/mesa/drivers/dri/i965/brw_disk_cache.c
>> > index 853ea98af03..cd0524c5cbf 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_disk_cache.c
>> > +++ b/src/mesa/drivers/dri/i965/brw_disk_cache.c
>> > @@ -420,9 +420,6 @@ void
>> >   brw_disk_cache_init(struct brw_context *brw)
>> >   {
>> >   #ifdef ENABLE_SHADER_CACHE
>> > -   if (env_var_as_boolean("MESA_GLSL_CACHE_DISABLE", true))
>> > -  return;
>> > -
>> >  char renderer[10];
>> >  MAYBE_UNUSED int len = snprintf(renderer, sizeof(renderer), 
>> > "i965_%04x",
>> >  brw->screen->deviceID);
>> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler

2017-12-05 Thread Chema Casanova
El 30/11/17 a las 23:58, Jason Ekstrand escribió:
> On Wed, Nov 29, 2017 at 6:50 PM, Jose Maria Casanova Crespo
> > wrote:
>
> load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
> surface format defined. So when reading 16-bit components with the
> sampler we need to unshuffle two 16-bit components from each 32-bit
> component.
>
> Using the sampler avoids the use of the byte_scattered_read message
> that needs one message for each component and is supposed to be
> slower.
>
> In the case of SKL+ we take advantage of a hardware feature that
> automatically defines a channel mask based on the rlen value, so on
> SKL+ we only use half of the registers without using a header in the
> payload.
> ---
>  src/intel/compiler/brw_fs.cpp           | 31
> +++
>  src/intel/compiler/brw_fs_generator.cpp | 10 --
>  2 files changed, 35 insertions(+), 6 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp
> b/src/intel/compiler/brw_fs.cpp
> index 1ca4d416b2..9c543496ba 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -184,9 +184,17 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
> fs_builder ,
>      * a double this means we are only loading 2 elements worth of
> data.
>      * We also want to use a 32-bit data type for the dst of the
> load operation
>      * so other parts of the driver don't get confused about the
> size of the
> -    * result.
> +    * result. On the case of 16-bit data we only need half of the
> 32-bit
> +    * components on SKL+ as we take advance of using message
> return size to
> +    * define an xy channel mask.
>      */
> -   fs_reg vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> +   fs_reg vec4_result;
> +   if (type_sz(dst.type) == 2 && (devinfo->gen >= 9)) {
> +      vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
> +      vec4_result = retype(vec4_result, BRW_REGISTER_TYPE_HF);
> +   } else {
> +      vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> +   }
>
>     fs_inst *inst =
> bld.emit(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_LOGICAL,
>                              vec4_result, surf_index, vec4_offset);
>     inst->size_written = 4 *
> vec4_result.component_size(inst->exec_size);
> @@ -197,8 +205,23 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const
> fs_builder ,
>     }
>
>     vec4_result.type = dst.type;
> -   bld.MOV(dst, offset(vec4_result, bld,
> -                       (const_offset & 0xf) /
> type_sz(vec4_result.type)));
> +
> +   if (type_sz(dst.type) == 2) {
> +      /* 16-bit types need to be unshuffled as each pair of
> 16-bit components
> +       * is packed on a 32-bit compoment because we are using a
> 32-bit format
> +       * in the surface of uniform that is read by the sampler.
> +       * TODO: On BDW+ mark when an uniform has 16-bit type so we
> could setup a
> +       * surface format of 16-bit and use the 16-bit return
> format at the
> +       * sampler.
> +       */
> +      vec4_result.stride = 2;
> +      bld.MOV(dst, byte_offset(offset(vec4_result, bld,
> +                                      (const_offset & 0x7) / 4),
> +                               (const_offset & 0x7) / 2 % 2 * 2));
> +   } else {
> +      bld.MOV(dst, offset(vec4_result, bld,
> +                          (const_offset & 0xf) /
> type_sz(vec4_result.type)));
> +   }
>
>
> This seems overly complicated.  How about something like

> fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
> switch (type_sz(dst.type)) {
> case 2:
>    shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1);
>    bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
>    break;
> case 4:
>    bld.MOV(dst, dw);
>    break;
> case 8:
>    shuffle_32bit_load_result_to_64bit_data(bld, dst, dw, 1);
>    break;
> default:
>    unreachable();
> }

This implementation it is really more clear. Tested and works perfectly
fine.

>  
>
>  }
>
>  /**
> diff --git a/src/intel/compiler/brw_fs_generator.cpp
> b/src/intel/compiler/brw_fs_generator.cpp
> index a3861cd68e..00a4e29147 100644
> --- a/src/intel/compiler/brw_fs_generator.cpp
> +++ b/src/intel/compiler/brw_fs_generator.cpp
> @@ -1381,12 +1381,18 @@
> fs_generator::generate_varying_pull_constant_load_gen7(fs_inst *inst,
>     uint32_t simd_mode, rlen, mlen;
>     if (inst->exec_size == 16) {
>        mlen = 2;
> -      rlen = 8;
> +      if (type_sz(dst.type) == 2 && (devinfo->gen >= 9))
> +         rlen = 4;
> +      else
> +         rlen = 8;
>
>
> I'm not sure what I think of this.  We intentionally use a vec4 today
> instead 

[Mesa-dev] [PATCH 1/1] radv: use a faster version for nir_op_pack_half_2x16

2017-12-05 Thread Samuel Pitoiset
This patch is ported from RadeonSI and it has two effects.

It fixes a rendering issue which affects F1 2017 and Dawn
of War 3 (Vega only) because LLVM was ending up by generating
the new v_mad_mix_{hi,lo} instructions which appear to be
buggy in some way. Not sure if Mesa is generating something
wrong or if the issue is in LLVM only. Anyway, that explains why
the DOW3 issue can't be reproduced with GL on Vega.

It also improves performance because v_cvt_pkrtz_f16 is faster,
and because I guess the rounding mode behaviour is similar between
GL and VK, we can use it. About performance, it improves Talos
by +3/4% but I don't see any other impacts.

No CTS regressions on Polaris (Vega in progress).

Signed-off-by: Samuel Pitoiset 
---
 src/amd/common/ac_nir_to_llvm.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 96ba289a81..663b27d265 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1426,23 +1426,13 @@ static LLVMValueRef emit_bitfield_insert(struct 
ac_llvm_context *ctx,
 static LLVMValueRef emit_pack_half_2x16(struct ac_llvm_context *ctx,
LLVMValueRef src0)
 {
-   LLVMValueRef const16 = LLVMConstInt(ctx->i32, 16, false);
-   int i;
LLVMValueRef comp[2];
 
src0 = ac_to_float(ctx, src0);
comp[0] = LLVMBuildExtractElement(ctx->builder, src0, ctx->i32_0, "");
comp[1] = LLVMBuildExtractElement(ctx->builder, src0, ctx->i32_1, "");
-   for (i = 0; i < 2; i++) {
-   comp[i] = LLVMBuildFPTrunc(ctx->builder, comp[i], ctx->f16, "");
-   comp[i] = LLVMBuildBitCast(ctx->builder, comp[i], ctx->i16, "");
-   comp[i] = LLVMBuildZExt(ctx->builder, comp[i], ctx->i32, "");
-   }
-
-   comp[1] = LLVMBuildShl(ctx->builder, comp[1], const16, "");
-   comp[0] = LLVMBuildOr(ctx->builder, comp[0], comp[1], "");
 
-   return comp[0];
+   return ac_build_cvt_pkrtz_f16(ctx, comp);
 }
 
 static LLVMValueRef emit_unpack_half_2x16(struct ac_llvm_context *ctx,
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/1] radv: bugfix for F1 2017 and DOW3 on Vega

2017-12-05 Thread Samuel Pitoiset
Hi folks,

It took me a while to figure out the issue which is addressed by the
following patch, but I think it should be the right one (see the patch
description for more explanations).

Keep in mind that DOW3 is still affected by one other issue (Vega only).
Hopefully, I will be able to write a patch for that one soon.

Thanks.

Samuel Pitoiset (1):
  radv: use a faster version for nir_op_pack_half_2x16

 src/amd/common/ac_nir_to_llvm.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] st/mesa: remove unneeded #include in st_format.h

2017-12-05 Thread Ilia Mirkin
Series is

Reviewed-by: Ilia Mirkin 

On Tue, Dec 5, 2017 at 11:59 AM, Brian Paul  wrote:
> ---
>  src/mesa/state_tracker/st_format.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/src/mesa/state_tracker/st_format.h 
> b/src/mesa/state_tracker/st_format.h
> index 3dd9c10..466b5d0 100644
> --- a/src/mesa/state_tracker/st_format.h
> +++ b/src/mesa/state_tracker/st_format.h
> @@ -33,7 +33,6 @@
>  #include "main/formats.h"
>  #include "main/glheader.h"
>
> -#include "pipe/p_defines.h"
>  #include "pipe/p_format.h"
>
>  #ifdef __cplusplus
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/android: Partially handle HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED

2017-12-05 Thread Robert Foss
On Tue, 2017-12-05 at 18:22 +0900, Tomasz Figa wrote:
> On Sat, Dec 2, 2017 at 4:43 AM, Rob Herring  wrote:
> > On Fri, Dec 1, 2017 at 8:44 AM, Tomasz Figa 
> > wrote:
> > > On Fri, Dec 1, 2017 at 11:20 PM, Rob Herring 
> > > wrote:
> > > > On Fri, Dec 1, 2017 at 7:30 AM, Robert Foss  > > > ora.com> wrote:
> > > > > On Thu, 2017-11-30 at 11:14 -0600, Rob Herring wrote:
> > > > > > On Thu, Nov 30, 2017 at 12:11 AM, Tapani Pälli  > > > > > i...@intel.co
> > > > > > m> wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On 11/30/2017 06:13 AM, Tomasz Figa wrote:
> > > > > > > > 
> > > > > > > > On Thu, Nov 30, 2017 at 3:43 AM, Robert Foss  > > > > > > > ss@collabo
> > > > > > > > ra.com>
> > > > > > > > wrote:
> > 
> > [...]
> > 
> > > > > > > > (As a side note, I had an idea to create a new
> > > > > > > > interface,
> > > > > > > > standardized
> > > > > > > > by Mesa, let's say libdri_android, completely free of
> > > > > > > > any
> > > > > > > > gralloc-internals. It would have to be exposed
> > > > > > > > additionally by
> > > > > > > > any
> > > > > > > > Android that intends to run Mesa. Given the need to
> > > > > > > > deal with 3
> > > > > > > > different gralloc versions already, it could be
> > > > > > > > something easier
> > > > > > > > to
> > > > > > > > manage.)
> > > > > > > 
> > > > > > > 
> > > > > > > Makes sense, it is a bit messy and we have bit too much
> > > > > > > patches on
> > > > > > > our tree
> > > > > > > because of these differences.
> > > > > > 
> > > > > > Seems overly complicated to me. The information needed is
> > > > > > within the
> > > > > > ints in the native_handle in most/all implementations. I
> > > > > > don't think
> > > > > > there's another way to globally store dmabuf metadata
> > > > > > unless you have
> > > > > > a custom interface in your DRM driver. So standardizing to
> > > > > > a common
> > > > > > library implies a common handle struct here. I think the
> > > > > > options are:
> > > > > > 
> > > > > > - common struct definition (native_handle_t + dmabuf fd(s)
> > > > > > + width,
> > > > > > height, stride, format, usage, etc.)
> > > > > > - common struct plus inline accessor functions
> > > > > > - common opaque struct plus accessor library
> > > > > 
> > > > > So these common parts would be much like what currently
> > > > > exists in
> > > > > minigbm/cros_gralloc_handle.h and
> > > > > gbm_gralloc/gralloc_drm_handle.h
> > > > > then, but extended with the above suggestions?
> > > > 
> > > > Yes, but which part do you think needs to be extended?
> > > > 
> > > > As we discussed on IRC, I think perhaps we just need to change
> > > > the
> > > > handle format field in gralloc_drm_handle.h to use fourcc (aka
> > > > DRM and
> > > > GBM formats) instead of the Android format. I think all the
> > > > users just
> > > > end up converting it to their own internal format anyway.
> > > 
> > > We keep the handle opaque for consumers and even minigbm
> > > dereferences
> > > it only when creating/registering the buffer, further using the
> > > handle
> > > pointer only as a key to internal bookkeeping map.
> > 
> > What you say implies that you don't need any metadata in the
> > handle,
> > but you do have pretty much all the same data. Whether you
> > 
> > > Relying on the struct itself is not only error prone, as there is
> > > no
> > > way to check if the struct on gralloc implementation side matches
> > > what
> > > we expect, but also makes it difficult to change the handle
> > > struct at
> > > our convenience.
> > 
> > How does a library solve this?
> > 
> > Everything in Android gets built together and the handle pretty
> > much
> > has to stay the same across components in any implementation I've
> > seen. Maybe someday that will change and we'll need versioning and
> > backwards compatibility, but for now that's unnecessary complexity.
> > We'd have to get to a single, well controlled and defined handle
> > first
> > anyway before we start versioning.
> > 
> > Anyone is still free to change things downstream however they want.
> > We're only talking about what does mainline/upstream do.
> > 
> > > > > > Also, I don't think whatever is standardized should live in
> > > > > > Mesa.
> > > > > > There's a need to support drm_hwcomposer (which has the
> > > > > > same
> > > > > > dependencies as mesa) with non-Mesa GL implementations
> > > > > > (yes, vendor
> > > > > > binaries).
> > > > > 
> > > > > Excluding Mesa and the composer leaves us with the allocator
> > > > > or
> > > > > creating a new library.
> > > > > I would assume that creating a new library is the worse
> > > > > option.
> > > > 
> > > > Why excluding the composer? If we have to pick, I'd put it
> > > > there or
> > > > perhaps libdrm?
> > > 
> > > There is neither a single central composer nor libdrm is used on
> > > every
> > > system... (The latter is not even used by Intel driver in Mesa
> > > anymore.)
> 

[Mesa-dev] [PATCH 4/4] mesa: s/%u/%d/ in _mesa_error() call in check_layer()

2017-12-05 Thread Brian Paul
The layer parameter is signed.  Fixes the error message seen when
running the arb_texture_multisample-errors test which checks a
negative layer value.
---
 src/mesa/main/fbobject.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index f7702f1..30287ab 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -3189,8 +3189,7 @@ check_layer(struct gl_context *ctx, GLenum target, GLint 
layer,
 * and layer is negative."
 */
if (layer < 0) {
-  _mesa_error(ctx, GL_INVALID_VALUE,
-  "%s(layer %u < 0)", caller, layer);
+  _mesa_error(ctx, GL_INVALID_VALUE, "%s(layer %d < 0)", caller, layer);
   return false;
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] mesa: add const qualifier in test_attachment_completeness()

2017-12-05 Thread Brian Paul
---
 src/mesa/main/fbobject.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index 8116563..f7702f1 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -793,7 +793,7 @@ test_attachment_completeness(const struct gl_context *ctx, 
GLenum format,
/* Look for reasons why the attachment might be incomplete */
if (att->Type == GL_TEXTURE) {
   const struct gl_texture_object *texObj = att->Texture;
-  struct gl_texture_image *texImage;
+  const struct gl_texture_image *texImage;
   GLenum baseFormat;
 
   if (!texObj) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] mesa: trivial whitespace fixes in transformfeedback.c

2017-12-05 Thread Brian Paul
---
 src/mesa/main/transformfeedback.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/transformfeedback.c 
b/src/mesa/main/transformfeedback.c
index a5ea2a5..e4cc1db 100644
--- a/src/mesa/main/transformfeedback.c
+++ b/src/mesa/main/transformfeedback.c
@@ -346,7 +346,7 @@ _mesa_compute_max_transform_feedback_vertices(struct 
gl_context *ctx,
 
  /* Skip any inactive buffers, which have a stride of 0. */
  if (stride == 0)
-   continue;
+continue;
 
  max_for_this_buffer = obj->Size[i] / (4 * stride);
  max_index = MIN2(max_index, max_for_this_buffer);
@@ -605,7 +605,7 @@ _mesa_validate_buffer_range_xfb(struct gl_context *ctx,
   _mesa_error(ctx, GL_INVALID_VALUE, "%s(size=%d must be a multiple of "
   "four)", gl_methd_name, (int) size);
   return false;
-   }  
+   }
 
if (offset & 0x3) {
   /* OpenGL 4.5 core profile, 6.7, pdf page 103: multiple of 4 */
@@ -732,13 +732,13 @@ _mesa_TransformFeedbackBufferBase(GLuint xfb, GLuint 
index, GLuint buffer)
 
obj = lookup_transform_feedback_object_err(ctx, xfb,
   "glTransformFeedbackBufferBase");
-   if(!obj) {
+   if (!obj) {
   return;
}
 
bufObj = lookup_transform_feedback_bufferobj_err(ctx, buffer,
   "glTransformFeedbackBufferBase");
-   if(!bufObj) {
+   if (!bufObj) {
   return;
}
 
@@ -755,13 +755,13 @@ _mesa_TransformFeedbackBufferRange(GLuint xfb, GLuint 
index, GLuint buffer,
 
obj = lookup_transform_feedback_object_err(ctx, xfb,
   
"glTransformFeedbackBufferRange");
-   if(!obj) {
+   if (!obj) {
   return;
}
 
bufObj = lookup_transform_feedback_bufferobj_err(ctx, buffer,
   
"glTransformFeedbackBufferRange");
-   if(!bufObj) {
+   if (!bufObj) {
   return;
}
 
@@ -1337,7 +1337,7 @@ _mesa_GetTransformFeedbackiv(GLuint xfb, GLenum pname, 
GLint *param)
 
 obj = lookup_transform_feedback_object_err(ctx, xfb,
"glGetTransformFeedbackiv");
-if(!obj) {
+if (!obj) {
return;
 }
 
@@ -1363,7 +1363,7 @@ _mesa_GetTransformFeedbacki_v(GLuint xfb, GLenum pname, 
GLuint index,
 
obj = lookup_transform_feedback_object_err(ctx, xfb,
   "glGetTransformFeedbacki_v");
-   if(!obj) {
+   if (!obj) {
   return;
}
 
@@ -1392,7 +1392,7 @@ _mesa_GetTransformFeedbacki64_v(GLuint xfb, GLenum pname, 
GLuint index,
 
obj = lookup_transform_feedback_object_err(ctx, xfb,
   "glGetTransformFeedbacki64_v");
-   if(!obj) {
+   if (!obj) {
   return;
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] mesa: simplify/improve some _mesa_error() calls in teximage.c

2017-12-05 Thread Brian Paul
---
 src/mesa/main/teximage.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index 41de966..572e380 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -5778,14 +5778,10 @@ texture_image_multisample(struct gl_context *ctx, 
GLuint dims,
}
 
if (!check_multisample_target(dims, target, dsa)) {
-  if (dsa) {
- _mesa_error(ctx, GL_INVALID_OPERATION, "%s(target)", func);
- return;
-  }
-  else {
- _mesa_error(ctx, GL_INVALID_ENUM, "%s(target)", func);
- return;
-  }
+  GLenum err = dsa ? GL_INVALID_OPERATION : GL_INVALID_ENUM;
+  _mesa_error(ctx, err, "%s(target=%s)", func,
+  _mesa_enum_to_string(target));
+  return;
}
 
/* check that the specified internalformat is 
color/depth/stencil-renderable;
@@ -5826,7 +5822,7 @@ texture_image_multisample(struct gl_context *ctx, GLuint 
dims,
 *However, if samples is not supported, then no error is generated.
 */
if (!samplesOK && !_mesa_is_proxy_texture(target)) {
-  _mesa_error(ctx, sample_count_error, "%s(samples)", func);
+  _mesa_error(ctx, sample_count_error, "%s(samples=%d)", func, samples);
   return;
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] st/mesa: rename a few vars to 'bindings'

2017-12-05 Thread Brian Paul
To be consistent.
---
 src/mesa/state_tracker/st_format.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/mesa/state_tracker/st_format.c 
b/src/mesa/state_tracker/st_format.c
index da8b5e2..3f7e55e 100644
--- a/src/mesa/state_tracker/st_format.c
+++ b/src/mesa/state_tracker/st_format.c
@@ -1988,13 +1988,13 @@ find_supported_format(struct pipe_screen *screen,
   const enum pipe_format formats[],
   enum pipe_texture_target target,
   unsigned sample_count,
-  unsigned tex_usage,
+  unsigned bindings,
   boolean allow_dxt)
 {
uint i;
for (i = 0; formats[i]; i++) {
   if (screen->is_format_supported(screen, formats[i], target,
-  sample_count, tex_usage)) {
+  sample_count, bindings)) {
  if (!allow_dxt && util_format_is_s3tc(formats[i])) {
 /* we can't return a dxt format, continue searching */
 continue;
@@ -2158,13 +2158,13 @@ enum pipe_format
 st_choose_renderbuffer_format(struct st_context *st,
   GLenum internalFormat, unsigned sample_count)
 {
-   uint usage;
+   unsigned bindings;
if (_mesa_is_depth_or_stencil_format(internalFormat))
-  usage = PIPE_BIND_DEPTH_STENCIL;
+  bindings = PIPE_BIND_DEPTH_STENCIL;
else
-  usage = PIPE_BIND_RENDER_TARGET;
+  bindings = PIPE_BIND_RENDER_TARGET;
return st_choose_format(st, internalFormat, GL_NONE, GL_NONE,
-   PIPE_TEXTURE_2D, sample_count, usage, FALSE);
+   PIPE_TEXTURE_2D, sample_count, bindings, FALSE);
 }
 
 
@@ -2411,17 +2411,17 @@ st_QueryInternalFormat(struct gl_context *ctx, GLenum 
target,
* the driver, and if so return the same internal format, otherwise
* return GL_NONE.
*/
-  uint usage;
+  unsigned bindings;
   if (_mesa_is_depth_or_stencil_format(internalFormat))
- usage = PIPE_BIND_DEPTH_STENCIL;
+ bindings = PIPE_BIND_DEPTH_STENCIL;
   else
- usage = PIPE_BIND_RENDER_TARGET;
+ bindings = PIPE_BIND_RENDER_TARGET;
   enum pipe_format pformat = st_choose_format(st,
   internalFormat,
   GL_NONE,
   GL_NONE,
   PIPE_TEXTURE_2D, 1,
-  usage, FALSE);
+  bindings, FALSE);
   if (pformat)
  params[0] = internalFormat;
   break;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] st/mesa: remove unneeded #include in st_format.h

2017-12-05 Thread Brian Paul
---
 src/mesa/state_tracker/st_format.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_format.h 
b/src/mesa/state_tracker/st_format.h
index 3dd9c10..466b5d0 100644
--- a/src/mesa/state_tracker/st_format.h
+++ b/src/mesa/state_tracker/st_format.h
@@ -33,7 +33,6 @@
 #include "main/formats.h"
 #include "main/glheader.h"
 
-#include "pipe/p_defines.h"
 #include "pipe/p_format.h"
 
 #ifdef __cplusplus
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] st/mesa: whitespace fixes in st_format.c

2017-12-05 Thread Brian Paul
---
 src/mesa/state_tracker/st_format.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/mesa/state_tracker/st_format.c 
b/src/mesa/state_tracker/st_format.c
index 1ae677d..da8b5e2 100644
--- a/src/mesa/state_tracker/st_format.c
+++ b/src/mesa/state_tracker/st_format.c
@@ -58,7 +58,8 @@
  * Translate Mesa format to Gallium format.
  */
 enum pipe_format
-st_mesa_format_to_pipe_format(const struct st_context *st, mesa_format 
mesaFormat)
+st_mesa_format_to_pipe_format(const struct st_context *st,
+  mesa_format mesaFormat)
 {
switch (mesaFormat) {
case MESA_FORMAT_A8B8G8R8_UNORM:
@@ -1687,7 +1688,7 @@ static const struct format_mapping format_map[] = {
   { PIPE_FORMAT_R8G8B8A8_SINT, 0 }
},
{
-  { GL_RGB_INTEGER_EXT, 
+  { GL_RGB_INTEGER_EXT,
 GL_BGR_INTEGER_EXT,
 GL_RGB8I_EXT,
 GL_BLUE_INTEGER_EXT, 0 },
@@ -2005,6 +2006,7 @@ find_supported_format(struct pipe_screen *screen,
return PIPE_FORMAT_NONE;
 }
 
+
 struct exact_format_mapping
 {
GLenum format;
@@ -2040,6 +2042,7 @@ static const struct exact_format_mapping rgbx_tbl[] =
{ 0,   0,  0  }
 };
 
+
 /**
  * For unsized/base internal formats, we may choose a convenient effective
  * internal format for {format, type}. If one exists, return that, otherwise
@@ -2074,6 +2077,7 @@ find_exact_format(GLint internalFormat, GLenum format, 
GLenum type)
return PIPE_FORMAT_NONE;
 }
 
+
 /**
  * Given an OpenGL internalFormat value for a texture or surface, return
  * the best matching PIPE_FORMAT_x, or PIPE_FORMAT_NONE if there's no match.
@@ -2173,7 +2177,7 @@ st_choose_renderbuffer_format(struct st_context *st,
  */
 enum pipe_format
 st_choose_matching_format(struct st_context *st, unsigned bind,
- GLenum format, GLenum type, GLboolean swapBytes)
+  GLenum format, GLenum type, GLboolean swapBytes)
 {
struct pipe_screen *screen = st->pipe->screen;
mesa_format mesa_format;
@@ -2253,7 +2257,7 @@ st_ChooseTextureFormat(struct gl_context *ctx, GLenum 
target,
 internalFormat == GL_RGBA16F ||
 internalFormat == GL_RGB32F ||
 internalFormat == GL_RGBA32F)
-bindings |= PIPE_BIND_RENDER_TARGET;
+  bindings |= PIPE_BIND_RENDER_TARGET;
 
/* GLES allows the driver to choose any format which matches
 * the format+type combo, because GLES only supports unsized internal
@@ -2279,7 +2283,9 @@ st_ChooseTextureFormat(struct gl_context *ctx, GLenum 
target,
 return st_pipe_format_to_mesa_format(pFormat);
 
  if (!is_renderbuffer) {
-/* try choosing format again, this time without render target 
bindings */
+/* try choosing format again, this time without render
+ * target bindings.
+ */
 pFormat = st_choose_matching_format(st, PIPE_BIND_SAMPLER_VIEW,
 format, type,
 ctx->Unpack.SwapBytes);
@@ -2369,6 +2375,7 @@ st_QuerySamplesForFormat(struct gl_context *ctx, GLenum 
target,
return num_sample_counts;
 }
 
+
 /**
  * ARB_internalformat_query2 driver hook.
  */
@@ -2428,6 +2435,7 @@ st_QueryInternalFormat(struct gl_context *ctx, GLenum 
target,
}
 }
 
+
 /**
  * This is used for translating texture border color and the clear
  * color.  For example, the clear color is interpreted according to
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] swr/scons: Fix another intermittent build failure

2017-12-05 Thread George Kyriazis
gen_BackendPixelRate*.cpp depends on gen_ar_eventhandler.hpp.
Fix missing dependency.
---
 src/gallium/drivers/swr/SConscript | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/SConscript 
b/src/gallium/drivers/swr/SConscript
index 9204ecb..eca4830 100644
--- a/src/gallium/drivers/swr/SConscript
+++ b/src/gallium/drivers/swr/SConscript
@@ -146,6 +146,7 @@ env.CodeGenerate(
 Depends(backendPixelRateFiles,
 ['rasterizer/core/backends/gen_BackendPixelRate.hpp',
  'rasterizer/archrast/gen_ar_event.hpp',
+ 'rasterizer/archrast/gen_ar_eventhandler.hpp',
  'rasterizer/codegen/gen_knobs.h']
 )
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/miptree: Use the tiling from the modifier instead of the BO

2017-12-05 Thread Jason Ekstrand
On Tue, Dec 5, 2017 at 8:23 AM, Kristian Høgsberg 
wrote:

> On Tue, Dec 5, 2017 at 7:57 AM, Jason Ekstrand 
> wrote:
> > On Tue, Dec 5, 2017 at 1:22 AM, Daniel Stone 
> wrote:
> >>
> >> Hi,
> >>
> >> On 18 November 2017 at 00:10, Jason Ekstrand 
> wrote:
> >> > This fixes a bug where we were taking the tiling from the BO
> regardless
> >> > of what the modifier said.  When we got images in from Vulkan where it
> >> > doesn't set the tiling on the BO, we would treat them as linear even
> >> > though the modifier expressly said to treat it as Y-tiled.
> >>
> >> For some reason I thought Ken had already reviewed this and it landed,
> >> until Kristian mentioned last night.
> >
> >
> > There's been some discussion about what the right thing to do is here.
> I've
> > got a patch (which is now landed) which will make us set the tiling from
> > Vulkan just like GL does.  It's rather annoying but I think that's
> > marginally better.
>
> I don't know that it's better - it reinforces the notion that you have
> to make the kernel-side tiling match for the dma-buf import extension
> to work. I think it'd be better to land these patches (btw, Rb: me
> (can you even do parenthetical Rbs?))


I'll allow it. :)


> and call set-tiling in mesa.


Yeah, I think that's reasonable.  Really, this should only be a problem if
we have a bunch of users trying to use the same BO with different
modifiers.  It can happen in theory (immagine two images in the same BO,
one X and one Y) but it's a pretty crazy case.


> The
> assumption being that if the tiling doesn't match the modifier, then
> maybe the producer didn't care about kernel tiling? Alternatively,
> could we set bo->tiling = INCONSISTENT or such in mesa and then use
> that to avoid the gtt map paths?
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/miptree: Use the tiling from the modifier instead of the BO

2017-12-05 Thread Kristian Høgsberg
On Tue, Dec 5, 2017 at 7:57 AM, Jason Ekstrand  wrote:
> On Tue, Dec 5, 2017 at 1:22 AM, Daniel Stone  wrote:
>>
>> Hi,
>>
>> On 18 November 2017 at 00:10, Jason Ekstrand  wrote:
>> > This fixes a bug where we were taking the tiling from the BO regardless
>> > of what the modifier said.  When we got images in from Vulkan where it
>> > doesn't set the tiling on the BO, we would treat them as linear even
>> > though the modifier expressly said to treat it as Y-tiled.
>>
>> For some reason I thought Ken had already reviewed this and it landed,
>> until Kristian mentioned last night.
>
>
> There's been some discussion about what the right thing to do is here.  I've
> got a patch (which is now landed) which will make us set the tiling from
> Vulkan just like GL does.  It's rather annoying but I think that's
> marginally better.

I don't know that it's better - it reinforces the notion that you have
to make the kernel-side tiling match for the dma-buf import extension
to work. I think it'd be better to land these patches (btw, Rb: me
(can you even do parenthetical Rbs?)) and call set-tiling in mesa. The
assumption being that if the tiling doesn't match the modifier, then
maybe the producer didn't care about kernel tiling? Alternatively,
could we set bo->tiling = INCONSISTENT or such in mesa and then use
that to avoid the gtt map paths?

Kristian

>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glx: GLX_MESA_multithread_makecurrent is direct-only

2017-12-05 Thread Adam Jackson
This extension is not defined for indirect contexts. Marking it as
"client only", as the old code did here, would make the extension
available in indirect contexts, even though the server would certainly
not have it in its extension list.

Cc: 
Signed-off-by: Adam Jackson 
---
 src/glx/glxextensions.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glx/glxextensions.c b/src/glx/glxextensions.c
index af6ffbf660..638d8bcbbe 100644
--- a/src/glx/glxextensions.c
+++ b/src/glx/glxextensions.c
@@ -152,7 +152,7 @@ static const struct extension_info known_glx_extensions[] = 
{
{ GLX(ATI_pixel_format_float),  VER(0,0), N, N, N, N },
{ GLX(INTEL_swap_event),VER(0,0), Y, N, N, N },
{ GLX(MESA_copy_sub_buffer),VER(0,0), Y, N, N, N },
-   { GLX(MESA_multithread_makecurrent),VER(0,0), Y, N, Y, N },
+   { GLX(MESA_multithread_makecurrent),VER(0,0), Y, N, N, Y },
{ GLX(MESA_query_renderer), VER(0,0), Y, N, N, Y },
{ GLX(MESA_swap_control),   VER(0,0), Y, N, N, Y },
{ GLX(NV_float_buffer), VER(0,0), N, N, N, N },
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/miptree: Use the tiling from the modifier instead of the BO

2017-12-05 Thread Jason Ekstrand
On Tue, Dec 5, 2017 at 1:22 AM, Daniel Stone  wrote:

> Hi,
>
> On 18 November 2017 at 00:10, Jason Ekstrand  wrote:
> > This fixes a bug where we were taking the tiling from the BO regardless
> > of what the modifier said.  When we got images in from Vulkan where it
> > doesn't set the tiling on the BO, we would treat them as linear even
> > though the modifier expressly said to treat it as Y-tiled.
>
> For some reason I thought Ken had already reviewed this and it landed,
> until Kristian mentioned last night.


There's been some discussion about what the right thing to do is here.
I've got a patch (which is now landed) which will make us set the tiling
from Vulkan just like GL does.  It's rather annoying but I think that's
marginally better.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104119] radv: OpBitFieldInsert produces 0 with a loop counter for Insert

2017-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104119

--- Comment #2 from James Legg  ---
Created attachment 135982
  --> https://bugs.freedesktop.org/attachment.cgi?id=135982=edit
Output from RADV_DEBUG=shaders

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104119] radv: OpBitFieldInsert produces 0 with a loop counter for Insert

2017-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104119

--- Comment #1 from James Legg  ---
Created attachment 135981
  --> https://bugs.freedesktop.org/attachment.cgi?id=135981=edit
SPIR-V disassembly of test shader

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104119] radv: OpBitFieldInsert produces 0 with a loop counter for Insert

2017-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104119

Bug ID: 104119
   Summary: radv: OpBitFieldInsert produces 0 with a loop counter
for Insert
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Drivers/Vulkan/radeon
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: jl...@feralinteractive.com
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 135980
  --> https://bugs.freedesktop.org/attachment.cgi?id=135980=edit
test application that reproduces the bug

On RADV, the SPIR-V OpBitFieldInsert opcode produces 0 when the Insert
parameter derives from a variable used as a loop counter. For example, the
following GLSL compute shader writes 0 to the first 8 elements of the buffer at
binding 0:

#version 450 core
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout (std430, set = 0, binding = 0)
restrict writeonly buffer u2_cs { uint u2[]; };

void main()
{
for (int i = 0; i < 8; i++)
u2[i] = bitfieldInsert(0, i, 16, 2);
}


I've attached a program that reproduces the bug.

I'm using the LLVM release_50 branch at revision 318947 and the Mesa master
branch at 20d37da597653201d2c524434907e817bd03b1d0.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvir/nvc0: fix CVT lowering for dType == sType

2017-12-05 Thread Ilia Mirkin
On Tue, Dec 5, 2017 at 9:34 AM, Ilia Mirkin  wrote:
> An in any case, CVT with stype == dtype is illegal - whatever
> generates that should be fixed.

Without source modifiers, that is :) "cvt f32 dst neg src" is just
fine of course.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] glapi/check_table: Remove 'extern "C"' block

2017-12-05 Thread Emil Velikov
On 4 December 2017 at 23:57, Dylan Baker  wrote:
> Quoting Emil Velikov (2017-11-23 11:04:34)
>> On 20 November 2017 at 23:12, Dylan Baker  wrote:
>> > This doesn't actually accomplish what it's meant to do, as extern C
>> > doesn't undefine __cplusplus, so the included headers define a template
>> > (because __cplusplus is defined), but then that code is in an 'extern
>> > "C"' block, and explosion.
>> >
>> The commit does sound a bit off, admittedly I don't fully grok what
>> you're trying to say.
>>
>> The extern "C" { } construct annotates anything within as if it were a
>> normal C code. Thus functions will have the C linkage (otherwise
>> they'll have the C++ mangling) and structs are using the C rules.
>> So if you have a C++ header further down the chain it will be
>> considered as C and hell will break loose.
>>
>> This patch is correct, just sent a fix for glapitable.h, while glapi.h
>> was handled with d38839a498b38b745f5bdf6b96fbafa2792670be.
>>
>> As-is "struct _glapi_table" will be interpret as C++ one, which may
>> not have the same guarantees about sizeof and pointer arithmetic that
>> we're using in the test.
>>
>> Hope the above provides some inspiration towards a better commit message.
>>
>> Thanks
>> Emil
>
> So here's the problem I run into:
>
> In file included from ../include/c99_compat.h:28:0,
>  from ../src/util/macros.h:29,
>  from ../src/mapi/glapi/glapi.h:47,
>  from ../src/mapi/glapi/tests/check_table.cpp:28:
> ../include/no_extern_c.h:47:1: error: template with C linkage
>  template class _IncludeInsideExternCNotPortable;
>  ^~~~
> [54/93] Compiling C++ object 'src/gtest/gtest@sta/src_gtest-all.cc.o'.
>
> So, my commit message still makes sense to me, since you can't put a template 
> in
> an extern "C" block, and that template is guarded by __cplusplus, which is 
> still
> defined even in an extern "C" block.
>
This is one symptom, of the issue described before. The code in
c99_compat.h was explicitly added by Jose to make the issue obvious.

> Do you have a commit message in mind that is better?

Feel free to reuse as much or as little of my explanation. Borrowing
from analogous fixes also works ;-)
Some examples:
237dcb4aa7c39c59bfd225ae3d73caf709be216d
a177a13033cf9356eb26e8757055037a54268a18
36ad8265489fc66ab45b9b74dafa353a93bdb03b

I'm going through 4/8 - seems like there's a bunch of other bugs around :-\
Feel free to push the remainder of the series with the nitpicks addressed.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/st: move cloning of NIR shader for compute

2017-12-05 Thread Rob Clark
Since in the NIR case, driver takes ownership of the NIR shader, we need
to clone what is passed to the driver.  Normally this is done as part of
creating the shader variant (where is clone is anyways needed).  But
compute shaders have no variants, so we were cloning earlier.

The problem is that after the NIR linking optimizations, we ended up
cloning *before* all the lowering passes where done.

So move this into st_get_cp_variant(), to make compute shaders work more
like other shader stages.

Signed-off-by: Rob Clark 
---
 src/mesa/state_tracker/st_glsl_to_nir.cpp | 2 +-
 src/mesa/state_tracker/st_program.c   | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index 5d18e7b62bf..36adf55cd45 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -431,7 +431,7 @@ set_st_program(struct gl_program *prog,
   stcp = (struct st_compute_program *)prog;
   stcp->shader_program = shader_program;
   stcp->tgsi.ir_type = PIPE_SHADER_IR_NIR;
-  stcp->tgsi.prog = nir_shader_clone(NULL, nir);
+  stcp->tgsi.prog = nir;
   break;
default:
   unreachable("unknown shader stage");
diff --git a/src/mesa/state_tracker/st_program.c 
b/src/mesa/state_tracker/st_program.c
index 5c0a58104fc..258f5e47cbe 100644
--- a/src/mesa/state_tracker/st_program.c
+++ b/src/mesa/state_tracker/st_program.c
@@ -1659,7 +1659,10 @@ st_get_cp_variant(struct st_context *st,
   v = CALLOC_STRUCT(st_basic_variant);
   if (v) {
  /* fill in new variant */
- v->driver_shader = pipe->create_compute_state(pipe, tgsi);
+ struct pipe_compute_state cs = *tgsi;
+ if (tgsi->ir_type == PIPE_SHADER_IR_NIR)
+cs.prog = nir_shader_clone(NULL, tgsi->prog);;
+ v->driver_shader = pipe->create_compute_state(pipe, );
  v->key = key;
 
  /* insert into list */
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Android: enable noreturn and returns_nonnull attributes

2017-12-05 Thread Rob Herring
On Tue, Dec 5, 2017 at 5:56 AM, Emil Velikov  wrote:
> On 5 December 2017 at 02:21, Rob Herring  wrote:
>> Commit 94ca8e04adf6 ("spirv: Add vtn_fail and vtn_assert helpers") broke
>> Android builds which have -Werror enabled with the following errors:
>>
>> external/mesa3d/src/compiler/spirv/spirv_to_nir.c:272:1: error: control may 
>> reach end of non-void function [-Werror,-Wreturn-type]
>> external/mesa3d/src/compiler/spirv/spirv_to_nir.c:810:1: error: control may 
>> reach end of non-void function [-Werror,-Wreturn-type]
>> ...
>>
>> The problem is the noreturn attribute is not enabled and we to define
>> HAVE_FUNC_ATTRIBUTE_NORETURN.
>>
>> Auditing src/util/macros.h, we're also missing
>> HAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL, so add it too.
>>
>> Fixes: 94ca8e04adf6 ("spirv: Add vtn_fail and vtn_assert helpers")
>> Cc: Jason Ekstrand 
>> Signed-off-by: Rob Herring 
> Reviewed-by: Emil Velikov 
>
> The following shows the macros not set on Android. Feel free to check
> if they're supported and toggle them on.

I did this, but did another check of it.

> $ for i in `git grep "if.*def.*HAVE_FUNC_ATTRIBUTE_" origin/master  --
>  | grep -o "HAVE_FUNC_[A-Z_]*" | sort -u`; do git grep $i
> origin/master  | grep
> -qi android || echo missing $i; done
> missing HAVE_FUNC_ATTRIBUTE_CONST
> missing HAVE_FUNC_ATTRIBUTE_MALLOC
> missing HAVE_FUNC_ATTRIBUTE_NORETURN
> missing HAVE_FUNC_ATTRIBUTE_PURE
> missing HAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL
> missing HAVE_FUNC_ATTRIBUTE_VISIBILITY
> missing HAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT

At least according to documentation for clang, this is the only other
one that is supported. So I turned it on too and pushed the commit.

> missing HAVE_FUNC_ATTRIBUTE_WEAK
>
> -Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >