This is an automated email from the git hooks/post-receive script. Git pushed a commit to branch master in repository ffmpeg.
commit b01236d5fb4404cbcf633b05b5e8b83f9cbda135 Author: Niklas Haas <[email protected]> AuthorDate: Wed Feb 4 17:03:11 2026 +0100 Commit: Niklas Haas <[email protected]> CommitDate: Thu Feb 19 19:44:46 2026 +0000 swscale/optimizer: try pushing all swizzles towards the output Now that we can directly promote these to plane swizzles, we generally want to try pushing them in one direction - ideally towards the output, as in the case of split subpasses, the output is guaranteed to be planar. (And there may not even be a read) Results in a lot of diffs, ranging from the benign, e.g.: rgb24 -> bgr48be: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> u16 (expand) - [u16 ...X -> +++X] SWS_OP_SWIZZLE : 2103 [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES + [u16 ...X -> zzzX] SWS_OP_SWIZZLE : 2103 [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) packed >> 0 rgb24 -> gbrp9be: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32 [f32 ...X -> ...X] SWS_OP_SCALE : * 511/255 [f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5} [f32 ...X -> ...X] SWS_OP_MIN : x <= {511 511 511 _} [f32 ...X -> +++X] SWS_OP_CONVERT : f32 -> u16 - [u16 ...X -> +++X] SWS_OP_SWIZZLE : 1203 [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES - [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0 + [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0, via {2, 0, 1} To the clear improvements, e.g.: bgr24 -> gbrp16be: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 - [ u8 ...X -> +++X] SWS_OP_SWIZZLE : 2103 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> u16 (expand) - [u16 ...X -> +++X] SWS_OP_SWIZZLE : 1203 [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES - [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0 + [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0, via {1, 0, 2} The only case worth careful consideration is when there are swizzled inputs that result in unusual plane patterns, e.g.: argb -> gbrp9be: [ u8 XXXX -> ++++] SWS_OP_READ : 4 elem(s) packed >> 0 - [ u8 X... -> ++++] SWS_OP_SWIZZLE : 1230 - [ u8 ...X -> ++++] SWS_OP_CONVERT : u8 -> f32 - [f32 ...X -> ....] SWS_OP_SCALE : * 511/255 - [f32 ...X -> ....] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5} - [f32 ...X -> ....] SWS_OP_MIN : x <= {511 511 511 _} - [f32 ...X -> ++++] SWS_OP_CONVERT : f32 -> u16 - [u16 ...X -> ++++] SWS_OP_SWIZZLE : 1203 - [u16 ...X -> zzzz] SWS_OP_SWAP_BYTES - [u16 ...X -> zzzz] SWS_OP_WRITE : 3 elem(s) planar >> 0 + [ u8 X... -> ++++] SWS_OP_CONVERT : u8 -> f32 + [f32 X... -> ....] SWS_OP_SCALE : * 511/255 + [f32 X... -> ....] SWS_OP_DITHER : 16x16 matrix + {0 0 3 2} + [f32 X... -> ....] SWS_OP_MIN : x <= {511 511 511 511} + [f32 X... -> ++++] SWS_OP_CONVERT : f32 -> u16 + [u16 X... -> zzzz] SWS_OP_SWAP_BYTES + [u16 X... -> zzzz] SWS_OP_SWIZZLE : 3120 + [u16 ...X -> zzzz] SWS_OP_WRITE : 3 elem(s) planar >> 0, via {1, 2, 0} (X = unused, z = byteswapped, + = exact, 0 = zero) Observe the change from ...X to X..., which is a pattern that doesn't necessarily have a fast path and would usually end up falling back to the generic 4-component implementations (rather than the 3-component ones). That said, this is not a big deal, since we can ultimately re-align the set of implementations with what's actually needed; once we're done with plane splitting and so forth. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <[email protected]> --- libswscale/ops_optimizer.c | 22 ++-------------------- tests/ref/fate/sws-ops-list | 2 +- 2 files changed, 3 insertions(+), 21 deletions(-) diff --git a/libswscale/ops_optimizer.c b/libswscale/ops_optimizer.c index 5319a954df..2abaaf37d9 100644 --- a/libswscale/ops_optimizer.c +++ b/libswscale/ops_optimizer.c @@ -623,30 +623,12 @@ retry: * too aggressively */ for (int n = 0; n < ops->num_ops - 1; n++) { SwsOp *op = &ops->ops[n]; - SwsOp *prev = &ops->ops[n - 1]; SwsOp *next = &ops->ops[n + 1]; switch (op->op) { case SWS_OP_SWIZZLE: { - bool seen[4] = {0}; - bool has_duplicates = false; - for (int i = 0; i < 4; i++) { - if (next->comps.unused[i]) - continue; - has_duplicates |= seen[op->swizzle.in[i]]; - seen[op->swizzle.in[i]] = true; - } - - /* Try to push swizzles with duplicates towards the output */ - if (has_duplicates && op_commute_swizzle(op, next)) { - FFSWAP(SwsOp, *op, *next); - goto retry; - } - - /* Move swizzle out of the way between two converts so that - * they may be merged */ - if (prev->op == SWS_OP_CONVERT && next->op == SWS_OP_CONVERT) { - op->type = next->convert.to; + /* Try to push swizzles towards the output */ + if (op_commute_swizzle(op, next)) { FFSWAP(SwsOp, *op, *next); goto retry; } diff --git a/tests/ref/fate/sws-ops-list b/tests/ref/fate/sws-ops-list index 429b46b371..6111cc4cbd 100644 --- a/tests/ref/fate/sws-ops-list +++ b/tests/ref/fate/sws-ops-list @@ -1 +1 @@ -30ceeaa73f093642f28c1f17b3ee4e3e +1c8369d53a092dd41f88f333f6a8e426 _______________________________________________ ffmpeg-cvslog mailing list -- [email protected] To unsubscribe send an email to [email protected]
