URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=357495b94dad4101a5491ed30782574162de58db
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 17:02:25 2016 -0700

    i965: Update compute workgroup size limit calculation for SIMD32.
    
    This should have the side effect of enabling the ARB_compute_shader
    extension on Gen8+ hardware and all Gen7 platforms that didn't
    previously expose it (VLV and IVB GT1) due to the number of hardware
    threads per subslice being insufficient in SIMD16 mode.
    
    v2: Bump workgroup size limit for GLES too (Jordan).
    
    Reviewed-by: Jason Ekstrand <[email protected]>
    Reviewed-by: Jordan Justen <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=46ce93ed22891455dbe3eb4c69f5eddd2a7dcf00
Author: Francisco Jerez <[email protected]>
Date:   Thu May 26 21:28:45 2016 -0700

    i965: Add do32 debug option.
    
    The do32 INTEL_DEBUG option causes the back-end to try to generate a
    SIMD32 program when compiling a compute shader regardless of the
    specified compute shader workgroup size, which will be useful for
    testing SIMD32 code generation in the most common case in which the
    workgroup size doesn't exceed the SIMD16 limit so SIMD32 codegen
    wouldn't be automatically enabled.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=864737ce6cd5bae030079e749b8b18774a62d073
Author: Francisco Jerez <[email protected]>
Date:   Mon May 16 18:25:22 2016 -0700

    i965/fs: Build 32-wide compute shader when needed.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=37fd13ee2daf1dbd80cc7b43f7dcfdd1bb64bcc7
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 14:39:52 2016 -0700

    i965/fs: Extend back-end interface for limiting the shader dispatch width.
    
    This replaces the current fs_visitor::no16() interface with
    fs_visitor::limit_dispatch_width(), which takes an additional
    parameter allowing the caller to specify the maximum dispatch width a
    shader can be compiled with.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=2d288cb9ea5b1b46eb4fe0061d694560bf54943f
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 13:52:25 2016 -0700

    i965/fs: Implement SIMD32 register allocation support.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=7f10d3983b1ef1bafbbb694c29430556122f4536
Author: Francisco Jerez <[email protected]>
Date:   Sat Apr 30 20:47:49 2016 -0700

    i965/fs: Remove pre-Gen7 register allocation class micro-optimization.
    
    This was trying to save some one-time init on pre-Gen7 hardware under
    the assumption that one would only ever need 1, 2, 4 and 8-wide
    registers on those platforms.  However nothing guarantees that those
    will be the only VGRF sizes used after lowering and optimization.  In
    some cases we may end up with a temporary of different size being
    allocated (e.g. by SIMD lowering to zip or unzip a multi-component
    register region of a logical send instruction), and there is no
    guarantee that they will be optimized away before register allocation
    (especially since the compute_to_mrf coalescing pass is
    rather... lacking...).  Instead just allocate classes for all possible
    VGRF sizes up to MAX_VGRF_SIZE to avoid a crash in pq_test() when we
    encounter a variable of any other size.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=1d5bf46ad1533ffdb30b5dc0f9244f60b0539285
Author: Francisco Jerez <[email protected]>
Date:   Sat Apr 30 21:54:47 2016 -0700

    i965/fs: Don't mutate multi-component arguments in sampler payload set-up.
    
    The Gen5+ sampler message payload construction code steps through the
    coordinate and derivative components by induction like 'coordinate =
    offset(coordinate, bld, 1)', the problem is that while doing that it
    may step one past the end of the coordinate vector causing an
    assertion failure in offset() if it happens to be a (single component)
    immediate.  Right now coordinates and derivatives are typically passed
    as actual registers but that will no longer be the case when we start
    propagating constants into logical messages.
    
    Instead express coordinate components in closed form like
    'offset(coordinate, bld, i)' -- The end result seems slightly more
    readable that way and it allows passing the coordinate and derivative
    registers by const reference instead of by value, so it seems like a
    clean-up in its own right.
    
    v2: Fold a few post-increment operators into the last MOV
        statement. (Jason)
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=ad8f66ed33172ab40d4679063780a501b6f80740
Author: Francisco Jerez <[email protected]>
Date:   Thu May 26 18:51:41 2016 -0700

    i965/fs: Fix multiple ACP interference during copy propagation.
    
    This is more fallout from cf375a3333e54a01462f192202d609436e5fbec8.
    It's possible for multiple ACP entries to interfere with a given VGRF
    write, so we need to continue iterating even if an overlapping entry
    has already been found.
    
    Cc: Samuel Iglesias Gonsálvez <[email protected]>
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=c88b52745c754619d3e7af73abb71adfcc63cc7a
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 15:39:29 2016 -0700

    i965/fs: Fix cmod propagation not to propagate non-identity cmod into 
CMP(N).
    
    The conditional mod of these instructions determines the semantics of
    the comparison itself (rather than being evaluated based on the result
    of the instruction as is usually the case for most other instructions
    that allow conditional mods), so it's in general not legal to
    propagate a conditional mod into a CMP instruction.  This prevents
    cmod propagation from (mis)optimizing:
    
     cmp.z.f0 tmp, ...
     mov.z.f0 null, tmp
    
    into:
    
     cmp.z.f0 tmp, ...
    
    which gives the negation of the flag result of the original sequence.
    I could reproduce this easily with SIMD32 but I don't see any reason
    why the problem would be SIMD32-specific, it was most likely working
    by luck.
    
    Cc: [email protected]
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=8476233ae22c77ca26d8109f0f0d6c74457969f8
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 17:09:00 2016 -0700

    i965/fs: Estimate number of registers written correctly in 
opt_register_renaming.
    
    The current estimate is incorrect for non-32b types.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=437e65f9d93f8df0e6aaf1bcaf74c6a211498db8
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 17:25:48 2016 -0700

    i965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg.
    
    These are completely ignored by the conversion to brw_reg, so they
    better be zero.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=51dd6a60f5ef43a12d1b4384a2aded4d55d14056
Author: Francisco Jerez <[email protected]>
Date:   Thu May 19 21:12:32 2016 -0700

    i965/fs: Reset reg_offset of the original destination to zero in 
compute_to_mrf().
    
    Prevents an assertion failure in the following commit.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=b9eab911baa380fea1a3d3393f5944c00aa63076
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 17:09:39 2016 -0700

    i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.
    
    The pass is disabled in SIMD16 dispatch mode for the same reason, it
    cannot handle instructions that write multiple MRF registers at once.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=796238d9e6eee0b942d34c57bd8bdf0f9c98b6c3
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 14:27:20 2016 -0700

    i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch 
width.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=29e471725115edf941458c5be0bb7e93218ddd0f
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 14:17:48 2016 -0700

    i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction 
unnecessarily.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=a55452530f7525e9cf5d2619bef66a61b488b4af
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 17:30:54 2016 -0700

    i965/fs: Emit fixed width memory fence opcode regardless of the dispatch 
width.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=ae730049c67cb882c3f936ba6a2c3b1449c45f5e
Author: Francisco Jerez <[email protected]>
Date:   Mon May 16 18:18:43 2016 -0700

    i965/fs: Return 32 bit mask from fs_builder::sample_mask().
    
    This doesn't actually handle the FS case, just add an assertion for
    the moment so I don't forget to update it later on for SIMD32 fragment
    shader dispatch.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=8b6edee6790f5e196a815f7a149792279564871f
Author: Francisco Jerez <[email protected]>
Date:   Thu May 19 21:26:51 2016 -0700

    i965/fs: Emit fixed-width null register regardless of the dispatch width.
    
    brw_null_vec() cannot handle widths over 16 but it doesn't really
    matter what width we specify for null registers because destination
    regions have no width field at the hardware level.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=298320280f7255c6ed18a65780a93cb6a29e8644
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 17:37:25 2016 -0700

    i965/fs: Fix half() to handle more exotic register files.
    
    horiz_offset() is able to deal with a superset of the register files
    currently special-cased in half().  Just call horiz_offset() in all
    cases.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=8c9601ef7b28b047af361126a8adc46c729493b2
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 17:32:55 2016 -0700

    i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.
    
    We'll hit these in some cases during SIMD lowering in 32-wide
    programs.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=7d430fc05e8f0a6211fb587f1bc7b2a76ed7de10
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 22:40:40 2016 -0700

    i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=ecd7a7255aa1d6c313ead14e1b472c073c7111ac
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 22:13:52 2016 -0700

    i965/fs: Keep track of flag dependencies with byte granularity during 
scheduling.
    
    This prevents false dependencies from being created between
    instructions that write disjoint 8-bit portions of the flag register
    and OTOH should make sure that the scheduler considers dependencies
    between instructions that write or read multiple flag subregisters
    at once (e.g. 32-wide predication or conditional mods).
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=0fec265373f269d116f6d4de900b208fffabe2a1
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 21:34:27 2016 -0700

    i965/fs: Track flag register liveness with byte granularity.
    
    This is required for correctness in presence of multiple 8-wide flag
    writes (e.g. 8-wide instructions with a conditional mod set) which
    update a different portion of the same 16-bit flag subregister.  Right
    now we keep track of flag dataflow with 16-bit granularity and
    consider flag writes to have killed any previous definition of the
    same subregister even if the write was less than 16 channels wide,
    which can cause live flag register updates to be dead code-eliminated
    incorrectly.
    
    Additionally this makes sure that we handle 32-wide flag writes and
    reads which may span multiple flag subregisters so the current
    approach of just setting/testing a single bit from the live set
    wouldn't have worked.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=df1aec763eb972c69bc5127be102a9f281ce94f6
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 21:54:35 2016 -0700

    i965/fs: Define methods to calculate the flag subset read or written by an 
fs_inst.
    
    v2: Codestyle fixes (Jason).
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=ece41df247af247fb573ae8ec208d50e895b7aef
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 16:14:13 2016 -0700

    i965/fs: Expose arbitrary channel execution groups to the IR.
    
    This generalizes the current fs_inst::force_sechalf flag to allow
    specifying channel enable groups other than 0 or 8.  At some point it
    will likely make sense to fix the vec4 generator to support arbitrary
    execution groups and then move the definition of fs_inst::group into
    backend_instruction (e.g. so we can do FP64 in the VEC4 back-end).
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=81bc6de8c0f7faafd0f3b0aee944a14ba3ef0b64
Author: Francisco Jerez <[email protected]>
Date:   Thu May 19 00:10:03 2016 -0700

    i965/ir: Make BROADCAST emit an unmasked single-channel move.
    
    Alternatively we could have extended the current semantics to 32-wide
    mode by changing brw_broadcast() to emit multiple indexed MOV
    instructions in the generator copying the selected value to all
    destination registers, but it seemed rather silly to waste EU cycles
    unnecessarily copying the exact same value 32 times in the GRF.
    
    The vstride change in the Align16 path is required to avoid assertions
    in validate_reg() since the change causes the execution size of the
    MOV and SEL instructions to be equal to the source region width.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=41562eb8f33558f02ff8f53b3094a0e6d54e4c49
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 16:25:42 2016 -0700

    i965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL.
    
    This makes FIND_LIVE_CHANNEL behave like a normal instruction for
    non-zero quarter control.  On Gen8+ we just leave the quarter control
    field of the emitted FBL instruction set to the default value so the
    hardware applies the expected shift to the execution mask signals.  On
    Gen7 we apply the offset manually by specifying a non-zero subregister
    offset in the source region of the FBL instruction.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=a5a08109608406438109bfa5def5a2af788d2840
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 17:34:14 2016 -0700

    i965/fs: Allow specifying arbitrary execution sizes up to 32 to 
FIND_LIVE_CHANNEL.
    
    Due to a Gen7-specific hardware bug native 32-wide instructions get
    the lower 16 bits of the execution mask applied incorrectly to both
    halves of the instruction, so the MOV trick we currently use wouldn't
    work.  Instead emit multiple 16-wide MOV instructions in 32-wide mode
    in order to cover the whole execution mask.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=1e3c58ffaf35c6d37284b53c7b742c1bf7f2e67c
Author: Francisco Jerez <[email protected]>
Date:   Fri May 27 23:29:02 2016 -0700

    i965/fs: Lower 32-wide scratch writes in the generator.
    
    The hardware has messages that can write 32 32bit components at once
    but the channel enable mask gets messed up.  We need to split them
    into several 16-wide scratch writes for the channel enables to be
    applied correctly.  The SIMD lowering pass cannot be used for this
    because scratch writes are emitted rather late during register
    allocation long after SIMD lowering has been done.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=a7d319c00be425be219a101b5b4d48f1cbe4ec01
Author: Francisco Jerez <[email protected]>
Date:   Mon May 16 15:47:39 2016 -0700

    i965/fs: Implement scratch reads and writes of 4 GRFs at a time.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=fe5cdde2f9f84022b512de1fa42a036a371d31ba
Author: Francisco Jerez <[email protected]>
Date:   Mon May 16 16:03:33 2016 -0700

    i965/eu: Fix Gen7+ DP scratch message size calculation on Gen7.
    
    Gen7 hardware expects the block size field in the message descriptor
    to be the number of registers minus one instead of the log2 of the
    number of registers.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=fc7107de1d7cac6be817e8951e53f997c248c277
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 19:20:12 2016 -0700

    i965/eu: Set execution size explicitly for memory fence send message.
    
    We don't want to emit a 32-wide send message in 32-wide programs.  The
    memory fence message should have the same effect regardless of the
    execution size (as long as it's valid) so just set it to one.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=5c887326c516e2de710ff2d90ed608d834920688
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 19:18:30 2016 -0700

    i965/eu: Consider QtrCtrl 3Q-4Q in typed surface message descriptor setup.
    
    In SIMD32 programs the compiler is responsible for providing the
    appropriate half of the sample mask in the message header, so the
    first and third quarters both map to the first slot group of the
    provided 16-bit half, while the second and fourth quarters map to the
    second slot group -- IOW they should be equivalent to 1Q and 2Q modulo
    two.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=448340d31f4d4d60fbd1935d5a50fe9ee22efd41
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 00:13:33 2016 -0700

    i965/fs: Clean up remaining uses of dispatch_width in the generator.
    
    Most of these are bugs because the intended execution size of an
    instruction and the dispatch width of the shader aren't necessarily
    the same (especially in SIMD32 programs).
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=7f28ad8c4d84d41db047e12ba56d86a6d5cf0fd7
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 15:25:28 2016 -0700

    i965/eu: Remove brw_codegen::compressed and ::compressed_stack.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=646213168ed1d2427f30cb92e783910a319cdbb4
Author: Francisco Jerez <[email protected]>
Date:   Fri May 27 23:28:46 2016 -0700

    i965/eu: Use current exec size instead of p->compressed in surface message 
generation.
    
    This was kind of an abuse of p->compressed, dataport send message
    instructions are always uncompressed.  Use the current execution size
    instead since p->compressed is on its way out.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=492286e90b4fe96ee247e88181446f7674fc8254
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 19:47:30 2016 -0700

    i965/fs: No need to reset predicate control after emitting some 
instructions.
    
    Trivial clean-up.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=8ef5637729cc11cbdcb84990f5896a70a8fae3a9
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 19:36:03 2016 -0700

    i965/fs: Pass current execution size to brw_IF() and brw_DO().
    
    This gets IF and DO instructions working in SIMD32 programs.  brw_IF()
    and brw_DO() should probably behave in the same way as other generator
    functions that emit control flow instructions and just figure out the
    right execution size by themselves from the current execution controls
    specified through the brw_codegen argument.  Changing that will
    require updating lots of Gen4-5 clipper code though, so for the moment
    just pass the current value redundantly from the FS generator.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=fdae8b9f91089aea3d4b88ddb62a39ac687bb9be
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 19:17:31 2016 -0700

    i965/eu: Stop using p->compressed to specify the exec size of control flow 
instructions.
    
    p->compressed won't work for SIMD32, we should just be using the
    execution size value specified via p->current instead.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=0b4cd91071fdf9802559974aa9fd32ac4bbd7439
Author: Francisco Jerez <[email protected]>
Date:   Thu May 19 21:43:48 2016 -0700

    i965/fs: Extend region width calculation to allow arbitrary execution sizes.
    
    Instead of just halving the execution size when the instruction is
    compressed hoping that it will give a legal source region width, we
    can calculate the maximum legal width value in closed form from the
    component size and stride.  This makes sure that brw_reg_from_fs_reg()
    always returns a valid hardware region even for virtual 32-wide
    instructions (e.g. send-like instructions) that would seem to exceed
    the hardware region width limit after halving.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=dabaf4fb9664a431014387cece356d5b64caf9b6
Author: Kenneth Graunke <[email protected]>
Date:   Wed May 18 19:02:45 2016 -0700

    i965/fs: Pass the compression mode to brw_reg_from_fs_reg().
    
    Curro is planning to eliminate p->compressed, so let's avoid using it
    here and just pass in the value directly.
    
    Signed-off-by: Kenneth Graunke <[email protected]>
    [ Francisco Jerez: Pass boolean flag instead of brw_compression enum. ]
    Reviewed-by: Francisco Jerez <[email protected]>
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=3340a66fce9adad943fd3448fb703c27cebe7139
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 18:48:04 2016 -0700

    i965/fs: Simplify per-instruction compression control setup in generator.
    
    By using the new compression/group control interface.  This will allow
    easier extension to support arbitrary channel enable groups at the IR
    level.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=c78edcea8b256743fb38c7cd519b3324e4716143
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 03:59:20 2016 -0700

    i965/fs: No need to set compression control at the top of generate_code().
    
    The right value is dependent on the specific IR instruction being
    generated so it has to be reset in every iteration of the loop anyway.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=c19c3d3a5285af2936025568a91020f566ae768c
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 15:29:27 2016 -0700

    i965/eu: Fix a bunch of compression control bugs in the generator.
    
    Most of these were resetting quarter control to zero incorrectly even
    though everything they needed to do was disable instruction
    compression -- The brw_SAMPLE() case was doing the right thing but it
    can be simplified slightly by using the new compression control
    interface.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=3dffd8158327ab55b23fe4f3ce0dae4ceda0af4a
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 15:29:07 2016 -0700

    i965/eu: Define alternative interface for setting compression and group 
controls.
    
    This implements some simple helper functions that can be used to
    specify the group of channel enable signals and compression enable
    that apply to a brw_inst instruction.
    
    It's intended to replace brw_set_default_compression_control
    eventually because the current interface has a number of shortcomings
    inherited from the Gen-4-5-centric representation of compression and
    group controls as a single non-orthogonal enum: On the one hand it
    doesn't work for specifying arbitrary group controls other than 1Q and
    2Q, which are frequently useful in SIMD32 and FP64 programs.  On the
    other hand the current interface forces you to update the compression
    *and* group controls simultaneously, which has been the source of a
    number of generator bugs (a bunch of them fixed in this series),
    because in many cases we would end up resetting the group controls to
    zero inadvertently even though everything we wanted to do was disable
    instruction compression -- The latter seems especially unfortunate on
    Gen6+ hardware which have no explicit compression control, so we would
    end up bashing the quarter control field of the instruction for no
    benefit.
    
    Instead of a single function that updates both at the same time
    introduce separate interfaces to update one or the other independently
    preserving the current value of the other (which typically comes from
    the back-end IR so it has to be respected).
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=5db4d623956ceb5ffa8599e7797bd13470898158
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 00:13:19 2016 -0700

    i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.
    
    It's just a byte MOV with strided source.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=29ce110be6d0d4e4df51be635810f528f7dd7f40
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 18:43:54 2016 -0700

    i965/fs: Remove extract virtual opcodes.
    
    These can be easily represented in the IR as a MOV instruction with
    strided source so they seem rather redundant.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=9dcb8ff6a11e7071ab660cf53194783b93c8b8bf
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 17:35:52 2016 -0700

    i965: Define brw_int_type() helper.
    
    Intended as a (partial) inverse of type_sz().  Will be useful in the
    next commit and some other SIMD32 generator changes I have queued up.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=bb89beb26bafb69e23a91e14c62b10f40c4790f8
Author: Francisco Jerez <[email protected]>
Date:   Fri May 27 23:22:02 2016 -0700

    i965/fs: Remove manual splitting of DDY ops in the generator.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=982c48dc34170c9de5e56a6d525ac1f8b7e2a07c
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 20:02:29 2016 -0700

    i965/fs: Remove manual unrolling of BFI instructions from the generator.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=95272f5c7e6914fe8a85a4e37e07f1e8e3634446
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 19:59:18 2016 -0700

    i965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=f14b9ea6e6aa3c688ac2be412b5cd86fbc2b9791
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 19:51:50 2016 -0700

    i965/fs: Drop lowering code for a few three-source instructions from the 
generator.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=117a9a0a6431a6c35aa1cf5fc5cb96d948045ce6
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 18:41:28 2016 -0700

    i965/fs: Set default access mode to Align1 for all instructions in the 
generator.
    
    Currently the generator code for most opcodes honours the default
    access mode (which should typically be Align1 in the scalar back-end),
    but generate_code() doesn't set it explicitly which means that the
    access mode from a previous instruction could leak into the following
    ones if you did something special and weren't careful enough to save
    and restore the previous access mode.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=3a541d0c0b821ee99761b8a251693862b33da509
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 19:10:48 2016 -0700

    i965/fs: Remove handcrafted math SIMD lowering from the generator.
    
    Most of this wouldn't have worked for SIMD32 and had various
    dispatch_width and compression control bugs.  It's mostly dead now
    with SIMD lowering of math instructions turned on in the compiler.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=cf5443f984da4eb500c9b1ad9b9f53bc8747fef3
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 13:34:46 2016 -0700

    i965/fs: Limit SIMD width of various virtual opcodes to the maximum 
supported value.
    
    Which is 16 or 8 in most cases.  This will make sure that 32-wide
    virtual instructions get chopped up into chunks of their maximum
    execution size.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=197833caa3d684c092ee76d1e9ff3fac28576b04
Author: Francisco Jerez <[email protected]>
Date:   Thu May 19 23:44:23 2016 -0700

    i965/fs: Lower LOAD_PAYLOAD instructions of unsupported width.
    
    Only per-channel LOAD_PAYLOAD instructions can be lowered, which
    should cover everything that comes in from the front-end.
    
    LOAD_PAYLOAD instructions used to construct actual message payloads
    cannot be easily lowered because they contain headers and vectors of
    variable type that aren't necessarily channel-aligned -- We shouldn't
    find any of them in the program at SIMD lowering time though because
    they're introduced during logical send lowering.
    
    An alternative that may be worth considering would be to re-run the
    SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=9eea3df29f21eb7507354c3b1d85d238b671a211
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 16:27:09 2016 -0700

    i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time
    
    ...on hardware lacking compressed Align16 support.  Will allow
    simplifying the generator code and fixing it for SIMD32 codegen.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=12ae87abb194e2fc5339d8944b6d0e9ddf54ea22
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 16:43:05 2016 -0700

    i965/fs: Apply usual FPU-like execution size restrictions to MULH.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=dea9c1df89cf58591cce83b67d3d905a28f0c101
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 16:10:38 2016 -0700

    i965/fs: Calculate maximum execution size of MOV_INDIRECT correctly.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=122e0315480704a7c6777b994c42448d360e6774
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 16:01:29 2016 -0700

    i965/fs: Assert that IF instruction with embedded compare has legal 
exec_size.
    
    We shouldn't encounter these right now but if we did it wouldn't be
    possible for the SIMD lowering pass to split it into multiple
    instructions because of its side effects on control flow, so just
    assert in order to kill the program.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=98c8bef01cae5fd70dda22fd7ac0b5694c4dfb5f
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 16:00:19 2016 -0700

    i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=88d9cc15637559229fe725c0531de8ad7a0a60a7
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 15:58:04 2016 -0700

    i965/fs: Implement workaround for IVB CMP dependency race in the SIMD 
lowering pass.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=a6bf5f88c7be5ba1d1d9ebf1412e99886e0cf75c
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 13:15:49 2016 -0700

    i965/fs: Enforce common regioning restrictions by SIMD splitting.
    
    This change addresses a number of hardware restrictions on the source
    and destination regions and other execution controls of regular
    FPU-like instructions that in some cases can be avoided by reducing
    the execution size of the instruction.  Some of these restrictions
    (e.g. the one about 3src instructions not supporting compression on
    some hardware) are currently being worked around case by case in the
    generator with ad-hoc splitting code that is buggy in several ways
    (e.g. doesn't handle non-trivial execution controls which would break
    SIMD32 code), but it seems cleaner to implement as many restrictions
    as we can in a single lowering pass since that will allow us to
    simplify some of the surrounding code considerably and also make sure
    that we don't forget applying them in the future.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=2b5adb942bad418058d266c85c396040d558f680
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 13:14:20 2016 -0700

    i965/fs: Enforce extended math exec size limits during SIMD lowering.
    
    This teaches the SIMD lowering pass about the hardware limits on the
    execution size of math instructions, which will allow simplifying the
    generator code and at the same time get rid of a number of bugs in the
    manual SIMD unrolling done currently that prevent SIMD32 codegen from
    working.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=a8e7b4f1d9ec50d2214e7694da26af6a108e506f
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 00:37:37 2016 -0700

    i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.
    
    Seems like this texturing opcode was missing its logical counterpart
    which would prevent it from taking advantage of the SIMD lowering
    infrastructure, define it and plumb it through the back-end.  At some
    point we'll likely want to emit a single SAMPLEINFO message shared
    among all channels irrespective of this change, but for the moment
    this should be enough to get the intrinsic working in SIMD32 mode.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=99b5476d33f967ac2a30c3f8f7f958a7169e7123
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 23:54:25 2016 -0700

    i965/fs: Lower math into Gen4-5 send-like instructions in 
lower_logical_sends.
    
    The benefit is we will be able to use the SIMD lowering pass to unroll
    math instructions of unsupported width and then remove some cruft from
    the generator.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=e531b7907a6a10922e09c42f9c78d3b59beab2b4
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 23:52:15 2016 -0700

    i965/fs: Add missing get_latency_gen7() cases for the Gen7 pull constant 
opcodes.
    
    This was causing the scheduler to be rather optimistic about the
    latency of pull constant opcodes on Gen7+.  This might seem to
    increase the cycle count estimate calculated by the scheduler itself
    for some shaders, even though the actual cycle count should actually
    be decreased.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=ed4d0e41acb78f268b8b5c2dd03f654d11c4460b
Author: Francisco Jerez <[email protected]>
Date:   Fri May 20 13:03:31 2016 -0700

    i965/fs: Rename Gen4 physical varying pull constant load opcode.
    
    For consistency with the Gen7 variant.  I'm not doing the same to the
    uniform pull constant message at this point because the non-GEN7 one
    is still overloaded to be either an expression-like logical
    instruction or a Gen4-specific physical send message.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=64a6cb87f1fbfe2e410d6a4087450c2d4eb72228
Author: Francisco Jerez <[email protected]>
Date:   Wed May 18 01:26:03 2016 -0700

    i965/fs: Implement promotion of varying pull loads on Gen4 during SIMD 
lowering.
    
    Varying pull constant loads inherit the same limitation of pre-ILK
    hardware that requires expanding SIMD8 texel fetch instructions to
    SIMD16, we can deal with pull constant loads in the same way it's done
    for texturing during SIMD lowering.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=d8a3294ac21741c3a78eef72b832902e15fbd948
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 23:18:38 2016 -0700

    i965/fs: Hide varying pull constant load message setup behind logical 
opcode.
    
    This will allow the SIMD lowering pass to split 32-wide varying pull
    constant loads (not natively supported by the hardware) into 16-wide
    instructions.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=0bc5ad8d1997fe33dd43bb476c67163039f065ff
Author: Francisco Jerez <[email protected]>
Date:   Thu May 19 21:32:14 2016 -0700

    i965/fs: Avoid constant propagation when the type sizes don't match.
    
    The case where the source type of the instruction is smaller than the
    immediate type could be handled by calculating the portion of the
    immediate read by the instruction (assuming that the source channels
    are aligned with the destination channels of the copy) and then
    representing the same value as an immediate of the source type
    (assuming such an immediate type exists), but the code below doesn't
    do that, so just bail for the moment.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=52cc80d85945f14d4556eb5df5b269338adf8299
Author: Francisco Jerez <[email protected]>
Date:   Mon Apr 25 17:25:26 2016 -0700

    i965/fs: Fix CSE temporary copy for some LOAD_PAYLOAD corner cases.
    
    If the LOAD_PAYLOAD instruction only has header sources it's possible
    for the number of registers written to be less than or equal to the
    SIMD component size, in which case it would take the single-MOV path
    at the bottom which would cause the channel enable masks to be applied
    incorrectly to the header contents and/or cause it to write past the
    end of the allocated temporary.  If the instruction is either
    LOAD_PAYLOAD or doesn't write exactly one component the MOV path is
    going to mess up the program so just don't use it.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=c5f224145a41079ddcc77c0d7df8b4b75ed2d4fe
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 16:48:32 2016 -0700

    i965/fs: Handle instruction predication in SIMD lowering pass.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=1760c24b4bcf028477404e283f5768f2b6f25123
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 16:54:16 2016 -0700

    i965/fs: No need to unzip SIMD-periodic sources during SIMD lowering.
    
    If the source value is going to the same for all SIMD-lowered chunks
    of the instruction there should be no need to unzip the value into
    multiple temporary registers one for each lowered chunk.  As a side
    effect this fixes SIMD lowering of instructions with a vector
    immediate source.  In the long term it *might* still be worth fixing
    offset() to handle vector immediates correctly though, this should be
    good enough for the moment.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=168163f5f08ac1c0f3e3af20d0b2ac3391d358ab
Author: Francisco Jerez <[email protected]>
Date:   Tue May 17 17:45:41 2016 -0700

    i965/fs: Generalize is_uniform() to is_periodic().
    
    This will be useful in the SIMD lowering pass.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=b736e78ddbafc8f3d45ab110cef618c1514e9c64
Author: Francisco Jerez <[email protected]>
Date:   Mon May 16 17:19:17 2016 -0700

    i965/fs: Fix byte_offset() for MRF/ARF/FIXED_GRF regs.
    
    Reviewed-by: Jason Ekstrand <[email protected]>

URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=2db9dd5aeb9566c8480651989981cb1169957748
Author: Francisco Jerez <[email protected]>
Date:   Mon May 23 19:32:51 2016 -0700

    i965/fs: Fix off-by-one region overlap comparison in copy propagation.
    
    This was introduced in cf375a3333e54a01462f192202d609436e5fbec8 but
    the blame is mine because the pseudocode I sent in my review comment
    for the original patch suggesting to do things this way already had
    the off-by-one error.  This may have caused copy propagation to be
    unnecessarily strict while checking whether VGRF writes interfere with
    any ACP entries and possibly miss valid optimization opportunities in
    cases where multiple copy instructions write sequential locations of
    the same VGRF.
    
    Cc: Iago Toral Quiroga <[email protected]>
    Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>

_______________________________________________
mesa-commit mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-commit

Reply via email to