With a RTX 5070 Ti OC and Linux, transcoding from 6k5k16bit DPX to null/FFV1 using FFV1 Vulkan encoder
- DPX Software and files on SSD, ffmpeg (reference): 7.2 fps
- DPX Software and repeated frame, libavcodec (reference): 7.2 fps
- DPX Vulkan and files on SSD, ffmpeg: 11.5 fps (same as ffmpeg -c copy -f null - >nul) - DPX Vulkan and files in RAM, ffmpeg: 16.5 fps (same as ffmpeg -c copy -f null - >nul)
- DPX Vulkan and and repeated frame, libavcodec: 23.2 fps

It is very useful.
After this patch, image2 (used by ffmpeg but not the libavcodec test) seems to be the new bottleneck, having difficulties to feed libavcodec quickly enough (here, 24 fps = 4.5 GB/s).

Some DPX are not well decoded. RGB and Y 12/16 bit (for 12-bit it seems to come from the FFV1 Vulkan encoder we tested at the same time).

Le 23/11/2025 à 01:20, Lynne via ffmpeg-devel a écrit :
PR #21000 opened by Lynne
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21000
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21000.patch

The format is raw and uncompressed, however unpacking is a memcpy, and in case 
of big 6k or even 12k frames being several hundred megabytes large, the 
bandwidth can be far too much for CPUs.
Also, most of DPX cannot be directly uploaded without a software conversion 
since almost nothing supports 3-component image formats like RGB48, and 10 and 
12-bit formats are stored either tightly packed or in 32-bit dwords.

Speedup over software is around 3x for Intel, 6x for AMD, and 185x (!) for 
Nvidia. You can use the [small program I 
wrote](https://github.com/cyanreg/dec_tx_test) to benchmark it, and encoding.

Nvidia hardware seems to really suck at uploading or downloading anything, but 
host image copies seem to be the only fast way to do so, so we exploit them a 
bit here. They're not yet stable enough for all uploads, but we'll get there.


>From 41c50176c70ae80b9e49ddb89555acf4da6256cf Mon Sep 17 00:00:00 2001
From: Lynne <[email protected]>
Date: Thu, 13 Nov 2025 12:09:11 +0100
Subject: [PATCH 01/11] hwcontext_vulkan: enable runtime descriptor sizing

We were already using this in places, but it seems validation
layers finally got support to detect it.
---
  libavutil/hwcontext_vulkan.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/libavutil/hwcontext_vulkan.c b/libavutil/hwcontext_vulkan.c
index a6bf9a590b..0408b9c117 100644
--- a/libavutil/hwcontext_vulkan.c
+++ b/libavutil/hwcontext_vulkan.c
@@ -307,6 +307,7 @@ static void 
device_features_copy_needed(VulkanDeviceFeatures *dst, VulkanDeviceF
      COPY_VAL(vulkan_1_2.vulkanMemoryModel);
      COPY_VAL(vulkan_1_2.vulkanMemoryModelDeviceScope);
      COPY_VAL(vulkan_1_2.uniformBufferStandardLayout);
+    COPY_VAL(vulkan_1_2.runtimeDescriptorArray);
COPY_VAL(vulkan_1_3.dynamicRendering);
      COPY_VAL(vulkan_1_3.maintenance4);


_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to