optimizer: try pushing all swizzles towards the output

Niklas Haas via ffmpeg-cvslog Thu, 19 Feb 2026 11:46:19 -0800

This is an automated email from the git hooks/post-receive script.

Git pushed a commit to branch master
in repository ffmpeg.


commit b01236d5fb4404cbcf633b05b5e8b83f9cbda135
Author:     Niklas Haas <[email protected]>
AuthorDate: Wed Feb 4 17:03:11 2026 +0100
Commit:     Niklas Haas <[email protected]>
CommitDate: Thu Feb 19 19:44:46 2026 +0000

    swscale/optimizer: try pushing all swizzles towards the output
    
    Now that we can directly promote these to plane swizzles, we generally want
    to try pushing them in one direction - ideally towards the output, as in the
    case of split subpasses, the output is guaranteed to be planar. (And there
    may not even be a read)
    
    Results in a lot of diffs, ranging from the benign, e.g.:
    
     rgb24 -> bgr48be:
       [ u8 XXXX -> +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
       [ u8 ...X -> +++X] SWS_OP_CONVERT      : u8 -> u16 (expand)
    -  [u16 ...X -> +++X] SWS_OP_SWIZZLE      : 2103
       [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES
    +  [u16 ...X -> zzzX] SWS_OP_SWIZZLE      : 2103
       [u16 ...X -> zzzX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
    
     rgb24 -> gbrp9be:
       [ u8 XXXX -> +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
       [ u8 ...X -> +++X] SWS_OP_CONVERT      : u8 -> f32
       [f32 ...X -> ...X] SWS_OP_SCALE        : * 511/255
       [f32 ...X -> ...X] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 5}
       [f32 ...X -> ...X] SWS_OP_MIN          : x <= {511 511 511 _}
       [f32 ...X -> +++X] SWS_OP_CONVERT      : f32 -> u16
    -  [u16 ...X -> +++X] SWS_OP_SWIZZLE      : 1203
       [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES
    -  [u16 ...X -> zzzX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
    +  [u16 ...X -> zzzX] SWS_OP_WRITE        : 3 elem(s) planar >> 0, via {2, 
0, 1}
    
    To the clear improvements, e.g.:
    
     bgr24 -> gbrp16be:
       [ u8 XXXX -> +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
    -  [ u8 ...X -> +++X] SWS_OP_SWIZZLE      : 2103
       [ u8 ...X -> +++X] SWS_OP_CONVERT      : u8 -> u16 (expand)
    -  [u16 ...X -> +++X] SWS_OP_SWIZZLE      : 1203
       [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES
    -  [u16 ...X -> zzzX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
    +  [u16 ...X -> zzzX] SWS_OP_WRITE        : 3 elem(s) planar >> 0, via {1, 
0, 2}
    
    The only case worth careful consideration is when there are swizzled inputs
    that result in unusual plane patterns, e.g.:
    
     argb -> gbrp9be:
       [ u8 XXXX -> ++++] SWS_OP_READ         : 4 elem(s) packed >> 0
    -  [ u8 X... -> ++++] SWS_OP_SWIZZLE      : 1230
    -  [ u8 ...X -> ++++] SWS_OP_CONVERT      : u8 -> f32
    -  [f32 ...X -> ....] SWS_OP_SCALE        : * 511/255
    -  [f32 ...X -> ....] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 5}
    -  [f32 ...X -> ....] SWS_OP_MIN          : x <= {511 511 511 _}
    -  [f32 ...X -> ++++] SWS_OP_CONVERT      : f32 -> u16
    -  [u16 ...X -> ++++] SWS_OP_SWIZZLE      : 1203
    -  [u16 ...X -> zzzz] SWS_OP_SWAP_BYTES
    -  [u16 ...X -> zzzz] SWS_OP_WRITE        : 3 elem(s) planar >> 0
    +  [ u8 X... -> ++++] SWS_OP_CONVERT      : u8 -> f32
    +  [f32 X... -> ....] SWS_OP_SCALE        : * 511/255
    +  [f32 X... -> ....] SWS_OP_DITHER       : 16x16 matrix + {0 0 3 2}
    +  [f32 X... -> ....] SWS_OP_MIN          : x <= {511 511 511 511}
    +  [f32 X... -> ++++] SWS_OP_CONVERT      : f32 -> u16
    +  [u16 X... -> zzzz] SWS_OP_SWAP_BYTES
    +  [u16 X... -> zzzz] SWS_OP_SWIZZLE      : 3120
    +  [u16 ...X -> zzzz] SWS_OP_WRITE        : 3 elem(s) planar >> 0, via {1, 
2, 0}
         (X = unused, z = byteswapped, + = exact, 0 = zero)
    
    Observe the change from ...X to X..., which is a pattern that doesn't
    necessarily have a fast path and would usually end up falling back to the
    generic 4-component implementations (rather than the 3-component ones).
    
    That said, this is not a big deal, since we can ultimately re-align the
    set of implementations with what's actually needed; once we're done with
    plane splitting and so forth.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <[email protected]>
---
 libswscale/ops_optimizer.c  | 22 ++--------------------
 tests/ref/fate/sws-ops-list |  2 +-
 2 files changed, 3 insertions(+), 21 deletions(-)

diff --git a/libswscale/ops_optimizer.c b/libswscale/ops_optimizer.c
index 5319a954df..2abaaf37d9 100644
--- a/libswscale/ops_optimizer.c
+++ b/libswscale/ops_optimizer.c
@@ -623,30 +623,12 @@ retry:
      * too aggressively */
     for (int n = 0; n < ops->num_ops - 1; n++) {
         SwsOp *op = &ops->ops[n];
-        SwsOp *prev = &ops->ops[n - 1];
         SwsOp *next = &ops->ops[n + 1];
 
         switch (op->op) {
         case SWS_OP_SWIZZLE: {
-            bool seen[4] = {0};
-            bool has_duplicates = false;
-            for (int i = 0; i < 4; i++) {
-                if (next->comps.unused[i])
-                    continue;
-                has_duplicates |= seen[op->swizzle.in[i]];
-                seen[op->swizzle.in[i]] = true;
-            }
-
-            /* Try to push swizzles with duplicates towards the output */
-            if (has_duplicates && op_commute_swizzle(op, next)) {
-                FFSWAP(SwsOp, *op, *next);
-                goto retry;
-            }
-
-            /* Move swizzle out of the way between two converts so that
-             * they may be merged */
-            if (prev->op == SWS_OP_CONVERT && next->op == SWS_OP_CONVERT) {
-                op->type = next->convert.to;
+            /* Try to push swizzles towards the output */
+            if (op_commute_swizzle(op, next)) {
                 FFSWAP(SwsOp, *op, *next);
                 goto retry;
             }
diff --git a/tests/ref/fate/sws-ops-list b/tests/ref/fate/sws-ops-list
index 429b46b371..6111cc4cbd 100644
--- a/tests/ref/fate/sws-ops-list
+++ b/tests/ref/fate/sws-ops-list
@@ -1 +1 @@
-30ceeaa73f093642f28c1f17b3ee4e3e
+1c8369d53a092dd41f88f333f6a8e426

_______________________________________________
ffmpeg-cvslog mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[FFmpeg-cvslog] [ffmpeg] 19/21: swscale/optimizer: try pushing all swizzles towards the output

Reply via email to