[FFmpeg-devel] [PR] build: add amd hip sdk support for Windows (PR #21595)

Steven Xiao via ffmpeg-devel Mon, 26 Jan 2026 10:51:48 -0800

PR #21595 opened by Steven Xiao (younengxiao)
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21595
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21595.patch


Add support for AMD's HIP (Heterogeneous-compute Interface for Portability)
SDK on Windows, enabling GPU-accelerated video processing on AMD GPUs.

This commit introduces the build infrastructure for HIP, similar to the
existing CUDA support. HIP allows writing portable GPU code that can run
on AMD GPUs with RDNA2 architecture or newer.

Build Instructions (MSYS2 MinGW):

1. Install MSYS2 from https://www.msys2.org/

2. Install build tools in MSYS2 MinGW 64-bit terminal:
   pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-nasm make diffutils 
pkg-config

3. Install AMD HIP SDK from:
   https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html

   Default installation path: C:\Program Files\AMD\ROCm\<version>\
   Ensure hipcc is available: C:\Program Files\AMD\ROCm\<version>\bin\hipcc.bat

4. Add HIP SDK to PATH (in MSYS2 terminal):
   export PATH="/c/Program Files/AMD/ROCm/6.4/bin:$PATH"

   (Adjust version number as needed)

5. Configure and Build:

   ./configure --enable-hip --enable-nonfree
   make -j$(nproc)

Note: The default target architecture is gfx900. To target a specific
GPU architecture, use --hipccflags="--offload-arch=<arch>" where <arch>
is your GPU's architecture code (e.g., gfx1030 for RDNA2, gfx1100 for RDNA3).

External Dependencies:

    Header files:
       - hip/hip_runtime.h (from HIP SDK, or compat/hip/hip_runtime.h)

    Tools:
       - hipcc (HIP compiler, from AMD HIP SDK)

    Runtime:
       - AMD GPU driver with ROCm/HIP support
       - amdhip64.dll (HIP runtime library)

Notes:

- HIP enables GPU-accelerated code that can run on AMD GPUs.

- The --enable-hip option enables the hipcc compiler for building
  GPU kernels (.hip files) into GPU code objects (.hsaco files).

- HIP support provides infrastructure for GPU-accelerated filters.
  Actual GPU filters must implement HIP kernels similar to existing
  CUDA filters (scale_cuda, yadif_cuda, etc.).



>From 35ffe9f2a57b14cb029eb1c7c5d2931883e8f491 Mon Sep 17 00:00:00 2001
From: stevxiao <[email protected]>
Date: Mon, 26 Jan 2026 13:39:06 -0500
Subject: [PATCH] build: add amd hip sdk support for Windows

Add support for AMD's HIP (Heterogeneous-compute Interface for Portability)
SDK on Windows, enabling GPU-accelerated video processing on AMD GPUs.

This commit introduces the build infrastructure for HIP, similar to the
existing CUDA support. HIP allows writing portable GPU code that can run
on AMD GPUs with RDNA2 architecture or newer.

Build Instructions (MSYS2 MinGW):

1. Install MSYS2 from https://www.msys2.org/

2. Install build tools in MSYS2 MinGW 64-bit terminal:
   pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-nasm make diffutils 
pkg-config

3. Install AMD HIP SDK from:
   https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html

   Default installation path: C:\Program Files\AMD\ROCm\<version>\
   Ensure hipcc is available: C:\Program Files\AMD\ROCm\<version>\bin\hipcc.bat

4. Add HIP SDK to PATH (in MSYS2 terminal):
   export PATH="/c/Program Files/AMD/ROCm/6.4/bin:$PATH"

   (Adjust version number as needed)

5. Configure and Build:

   ./configure --enable-hip --enable-nonfree
   make -j$(nproc)

Note: The default target architecture is gfx900. To target a specific
GPU architecture, use --hipccflags="--offload-arch=<arch>" where <arch>
is your GPU's architecture code (e.g., gfx1030 for RDNA2, gfx1100 for RDNA3).

External Dependencies:

    Header files:
       - hip/hip_runtime.h (from HIP SDK, or compat/hip/hip_runtime.h)

    Tools:
       - hipcc (HIP compiler, from AMD HIP SDK)

    Runtime:
       - AMD GPU driver with ROCm/HIP support
       - amdhip64.dll (HIP runtime library)

Notes:

- HIP enables GPU-accelerated code that can run on AMD GPUs.

- The --enable-hip option enables the hipcc compiler for building
  GPU kernels (.hip files) into GPU code objects (.hsaco files).

- HIP support provides infrastructure for GPU-accelerated filters.
  Actual GPU filters must implement HIP kernels similar to existing
  CUDA filters (scale_cuda, yadif_cuda, etc.).

Signed-off-by: Xiao, Youneng <[email protected]>
---
 compat/hip/dynlink_loader.h | 307 ++++++++++++++++++++++++++++++++++++
 compat/hip/hip_runtime.h    | 187 ++++++++++++++++++++++
 configure                   |  76 +++++++++
 doc/build_with_hip_sdk.txt  |  91 +++++++++++
 doc/general_contents.texi   |  63 ++++++++
 ffbuild/common.mak          |  24 ++-
 6 files changed, 745 insertions(+), 3 deletions(-)
 create mode 100644 compat/hip/dynlink_loader.h
 create mode 100644 compat/hip/hip_runtime.h
 create mode 100644 doc/build_with_hip_sdk.txt

diff --git a/compat/hip/dynlink_loader.h b/compat/hip/dynlink_loader.h
new file mode 100644
index 0000000000..12d58bcd04
--- /dev/null
+++ b/compat/hip/dynlink_loader.h
@@ -0,0 +1,307 @@
+/*
+ * HIP dynamic linking loader header
+ *
+ * Copyright (c) 2024-2026 FFmpeg contributors
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef COMPAT_HIP_DYNLINK_LOADER_H
+#define COMPAT_HIP_DYNLINK_LOADER_H
+
+/**
+ * @file
+ * Dynamic linking support for AMD HIP runtime.
+ *
+ * This header provides function pointers and loading mechanisms for
+ * the HIP runtime library (amdhip64.dll on Windows).
+ */
+
+#ifdef _WIN32
+#include <windows.h>
+#define HIP_LIBNAME "amdhip64.dll"
+#define HIP_LOAD_LIBRARY(name) LoadLibraryA(name)
+#define HIP_GET_PROC(lib, name) GetProcAddress(lib, name)
+#define HIP_FREE_LIBRARY(lib) FreeLibrary(lib)
+typedef HMODULE hip_library_t;
+#else
+#include <dlfcn.h>
+#define HIP_LIBNAME "libamdhip64.so"
+#define HIP_LOAD_LIBRARY(name) dlopen(name, RTLD_LAZY)
+#define HIP_GET_PROC(lib, name) dlsym(lib, name)
+#define HIP_FREE_LIBRARY(lib) dlclose(lib)
+typedef void* hip_library_t;
+#endif
+
+// HIP error codes (subset matching common CUDA error codes)
+typedef enum hipError_t {
+    hipSuccess = 0,
+    hipErrorInvalidValue = 1,
+    hipErrorOutOfMemory = 2,
+    hipErrorNotInitialized = 3,
+    hipErrorDeinitialized = 4,
+    hipErrorProfilerDisabled = 5,
+    hipErrorProfilerNotInitialized = 6,
+    hipErrorProfilerAlreadyStarted = 7,
+    hipErrorProfilerAlreadyStopped = 8,
+    hipErrorInvalidConfiguration = 9,
+    hipErrorInvalidPitchValue = 12,
+    hipErrorInvalidSymbol = 13,
+    hipErrorInvalidDevicePointer = 17,
+    hipErrorInvalidMemcpyDirection = 21,
+    hipErrorInsufficientDriver = 35,
+    hipErrorMissingConfiguration = 52,
+    hipErrorPriorLaunchFailure = 53,
+    hipErrorInvalidDeviceFunction = 98,
+    hipErrorNoDevice = 100,
+    hipErrorInvalidDevice = 101,
+    hipErrorInvalidImage = 200,
+    hipErrorInvalidContext = 201,
+    hipErrorContextAlreadyCurrent = 202,
+    hipErrorMapFailed = 205,
+    hipErrorUnmapFailed = 206,
+    hipErrorArrayIsMapped = 207,
+    hipErrorAlreadyMapped = 208,
+    hipErrorNoBinaryForGpu = 209,
+    hipErrorAlreadyAcquired = 210,
+    hipErrorNotMapped = 211,
+    hipErrorNotMappedAsArray = 212,
+    hipErrorNotMappedAsPointer = 213,
+    hipErrorECCNotCorrectable = 214,
+    hipErrorUnsupportedLimit = 215,
+    hipErrorContextAlreadyInUse = 216,
+    hipErrorPeerAccessUnsupported = 217,
+    hipErrorInvalidKernelFile = 218,
+    hipErrorInvalidGraphicsContext = 219,
+    hipErrorInvalidSource = 300,
+    hipErrorFileNotFound = 301,
+    hipErrorSharedObjectSymbolNotFound = 302,
+    hipErrorSharedObjectInitFailed = 303,
+    hipErrorOperatingSystem = 304,
+    hipErrorInvalidHandle = 400,
+    hipErrorIllegalState = 401,
+    hipErrorNotFound = 500,
+    hipErrorNotReady = 600,
+    hipErrorIllegalAddress = 700,
+    hipErrorLaunchOutOfResources = 701,
+    hipErrorLaunchTimeOut = 702,
+    hipErrorPeerAccessAlreadyEnabled = 704,
+    hipErrorPeerAccessNotEnabled = 705,
+    hipErrorSetOnActiveProcess = 708,
+    hipErrorContextIsDestroyed = 709,
+    hipErrorAssert = 710,
+    hipErrorHostMemoryAlreadyRegistered = 712,
+    hipErrorHostMemoryNotRegistered = 713,
+    hipErrorLaunchFailure = 719,
+    hipErrorCooperativeLaunchTooLarge = 720,
+    hipErrorNotSupported = 801,
+    hipErrorStreamCaptureUnsupported = 900,
+    hipErrorStreamCaptureInvalidated = 901,
+    hipErrorStreamCaptureMerge = 902,
+    hipErrorStreamCaptureUnmatched = 903,
+    hipErrorStreamCaptureUnjoined = 904,
+    hipErrorStreamCaptureIsolation = 905,
+    hipErrorStreamCaptureImplicit = 906,
+    hipErrorCapturedEvent = 907,
+    hipErrorStreamCaptureWrongThread = 908,
+    hipErrorGraphExecUpdateFailure = 910,
+    hipErrorUnknown = 999,
+    hipErrorRuntimeMemory = 1052,
+    hipErrorRuntimeOther = 1053,
+    hipErrorTbd
+} hipError_t;
+
+// HIP memory copy types
+typedef enum hipMemcpyKind {
+    hipMemcpyHostToHost = 0,
+    hipMemcpyHostToDevice = 1,
+    hipMemcpyDeviceToHost = 2,
+    hipMemcpyDeviceToDevice = 3,
+    hipMemcpyDefault = 4
+} hipMemcpyKind;
+
+// Forward declarations of HIP types
+typedef struct ihipStream_t* hipStream_t;
+typedef struct ihipEvent_t* hipEvent_t;
+typedef struct ihipModule_t* hipModule_t;
+typedef struct ihipModuleSymbol_t* hipFunction_t;
+typedef struct ihipCtx_t* hipCtx_t;
+typedef int hipDevice_t;
+
+// Device properties structure
+typedef struct hipDeviceProp_t {
+    char name[256];
+    size_t totalGlobalMem;
+    size_t sharedMemPerBlock;
+    int regsPerBlock;
+    int warpSize;
+    size_t memPitch;
+    int maxThreadsPerBlock;
+    int maxThreadsDim[3];
+    int maxGridSize[3];
+    int clockRate;
+    size_t totalConstMem;
+    int major;
+    int minor;
+    size_t textureAlignment;
+    int deviceOverlap;
+    int multiProcessorCount;
+    int kernelExecTimeoutEnabled;
+    int integrated;
+    int canMapHostMemory;
+    int computeMode;
+    int maxTexture1D;
+    int maxTexture2D[2];
+    int maxTexture3D[3];
+    int concurrentKernels;
+    int pciDomainID;
+    int pciBusID;
+    int pciDeviceID;
+    size_t maxSharedMemoryPerMultiProcessor;
+    int isMultiGpuBoard;
+    int canUseHostPointerForRegisteredMem;
+    int cooperativeLaunch;
+    int cooperativeMultiDeviceLaunch;
+    int pageableMemoryAccessUsesHostPageTables;
+    int directManagedMemAccessFromHost;
+    int maxBlocksPerMultiProcessor;
+    int accessPolicyMaxWindowSize;
+    size_t reservedSharedMemPerBlock;
+    // Additional fields may be added in newer HIP versions
+} hipDeviceProp_t;
+
+// Function pointer types for dynamic loading
+typedef hipError_t (*hip_init_fn)(unsigned int flags);
+typedef hipError_t (*hip_get_device_count_fn)(int* count);
+typedef hipError_t (*hip_get_device_fn)(int* device);
+typedef hipError_t (*hip_set_device_fn)(int device);
+typedef hipError_t (*hip_get_device_properties_fn)(hipDeviceProp_t* props, int 
device);
+typedef hipError_t (*hip_malloc_fn)(void** ptr, size_t size);
+typedef hipError_t (*hip_free_fn)(void* ptr);
+typedef hipError_t (*hip_memcpy_fn)(void* dst, const void* src, size_t count, 
hipMemcpyKind kind);
+typedef hipError_t (*hip_memcpy_async_fn)(void* dst, const void* src, size_t 
count, hipMemcpyKind kind, hipStream_t stream);
+typedef hipError_t (*hip_memset_fn)(void* dst, int value, size_t count);
+typedef hipError_t (*hip_memset_async_fn)(void* dst, int value, size_t count, 
hipStream_t stream);
+typedef hipError_t (*hip_stream_create_fn)(hipStream_t* stream);
+typedef hipError_t (*hip_stream_destroy_fn)(hipStream_t stream);
+typedef hipError_t (*hip_stream_synchronize_fn)(hipStream_t stream);
+typedef hipError_t (*hip_device_synchronize_fn)(void);
+typedef hipError_t (*hip_module_load_fn)(hipModule_t* module, const char* 
fname);
+typedef hipError_t (*hip_module_load_data_fn)(hipModule_t* module, const void* 
image);
+typedef hipError_t (*hip_module_unload_fn)(hipModule_t module);
+typedef hipError_t (*hip_module_get_function_fn)(hipFunction_t* function, 
hipModule_t module, const char* name);
+typedef hipError_t (*hip_module_launch_kernel_fn)(hipFunction_t f, unsigned 
int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int 
blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int 
sharedMemBytes, hipStream_t stream, void** kernelParams, void** extra);
+typedef const char* (*hip_get_error_string_fn)(hipError_t error);
+typedef const char* (*hip_get_error_name_fn)(hipError_t error);
+typedef hipError_t (*hip_get_last_error_fn)(void);
+typedef hipError_t (*hip_peek_at_last_error_fn)(void);
+
+// HIP loader context structure
+typedef struct HIPLoaderContext {
+    hip_library_t lib;
+    int loaded;
+
+    // Core functions
+    hip_init_fn hipInit;
+    hip_get_device_count_fn hipGetDeviceCount;
+    hip_get_device_fn hipGetDevice;
+    hip_set_device_fn hipSetDevice;
+    hip_get_device_properties_fn hipGetDeviceProperties;
+
+    // Memory management
+    hip_malloc_fn hipMalloc;
+    hip_free_fn hipFree;
+    hip_memcpy_fn hipMemcpy;
+    hip_memcpy_async_fn hipMemcpyAsync;
+    hip_memset_fn hipMemset;
+    hip_memset_async_fn hipMemsetAsync;
+
+    // Streams
+    hip_stream_create_fn hipStreamCreate;
+    hip_stream_destroy_fn hipStreamDestroy;
+    hip_stream_synchronize_fn hipStreamSynchronize;
+    hip_device_synchronize_fn hipDeviceSynchronize;
+
+    // Module/kernel management
+    hip_module_load_fn hipModuleLoad;
+    hip_module_load_data_fn hipModuleLoadData;
+    hip_module_unload_fn hipModuleUnload;
+    hip_module_get_function_fn hipModuleGetFunction;
+    hip_module_launch_kernel_fn hipModuleLaunchKernel;
+
+    // Error handling
+    hip_get_error_string_fn hipGetErrorString;
+    hip_get_error_name_fn hipGetErrorName;
+    hip_get_last_error_fn hipGetLastError;
+    hip_peek_at_last_error_fn hipPeekAtLastError;
+} HIPLoaderContext;
+
+// Load HIP runtime library and initialize function pointers
+static inline int hip_load_library(HIPLoaderContext* ctx) {
+    if (ctx->loaded)
+        return 0;
+
+    ctx->lib = HIP_LOAD_LIBRARY(HIP_LIBNAME);
+    if (!ctx->lib)
+        return -1;
+
+#define LOAD_FUNC(name) \
+    ctx->name = (name##_fn)HIP_GET_PROC(ctx->lib, #name); \
+    if (!ctx->name) { HIP_FREE_LIBRARY(ctx->lib); ctx->lib = NULL; return -1; }
+
+    LOAD_FUNC(hipInit)
+    LOAD_FUNC(hipGetDeviceCount)
+    LOAD_FUNC(hipGetDevice)
+    LOAD_FUNC(hipSetDevice)
+    LOAD_FUNC(hipGetDeviceProperties)
+    LOAD_FUNC(hipMalloc)
+    LOAD_FUNC(hipFree)
+    LOAD_FUNC(hipMemcpy)
+    LOAD_FUNC(hipMemcpyAsync)
+    LOAD_FUNC(hipMemset)
+    LOAD_FUNC(hipMemsetAsync)
+    LOAD_FUNC(hipStreamCreate)
+    LOAD_FUNC(hipStreamDestroy)
+    LOAD_FUNC(hipStreamSynchronize)
+    LOAD_FUNC(hipDeviceSynchronize)
+    LOAD_FUNC(hipModuleLoad)
+    LOAD_FUNC(hipModuleLoadData)
+    LOAD_FUNC(hipModuleUnload)
+    LOAD_FUNC(hipModuleGetFunction)
+    LOAD_FUNC(hipModuleLaunchKernel)
+    LOAD_FUNC(hipGetErrorString)
+    LOAD_FUNC(hipGetErrorName)
+    LOAD_FUNC(hipGetLastError)
+    LOAD_FUNC(hipPeekAtLastError)
+
+#undef LOAD_FUNC
+
+    ctx->loaded = 1;
+    return 0;
+}
+
+// Unload HIP runtime library
+static inline void hip_unload_library(HIPLoaderContext* ctx) {
+    if (ctx->lib) {
+        HIP_FREE_LIBRARY(ctx->lib);
+        ctx->lib = NULL;
+    }
+    ctx->loaded = 0;
+}
+
+#endif /* COMPAT_HIP_DYNLINK_LOADER_H */
+
diff --git a/compat/hip/hip_runtime.h b/compat/hip/hip_runtime.h
new file mode 100644
index 0000000000..5ac7db53a6
--- /dev/null
+++ b/compat/hip/hip_runtime.h
@@ -0,0 +1,187 @@
+/*
+ * Minimum HIP compatibility definitions header for AMD GPUs
+ *
+ * Copyright (c) 2024-2026 FFmpeg contributors
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef COMPAT_HIP_HIP_RUNTIME_H
+#define COMPAT_HIP_HIP_RUNTIME_H
+
+/**
+ * @file
+ * AMD HIP SDK compatibility header for FFmpeg.
+ *
+ * This header provides minimal definitions needed to compile HIP kernels
+ * for AMD GPUs when using clang/hipcc as the compiler. It mirrors the
+ * structure of compat/cuda/cuda_runtime.h for NVIDIA GPUs.
+ *
+ * For full HIP functionality, install AMD HIP SDK from:
+ * https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html
+ */
+
+// Common macros for HIP kernel attributes
+#define __global__ __attribute__((amdgpu_kernel))
+#define __device__ __attribute__((device))
+#define __host__ __attribute__((host))
+#define __shared__ __attribute__((shared))
+#define __constant__ __attribute__((constant))
+#define __align__(N) __attribute__((aligned(N)))
+#define __inline__ __inline__ __attribute__((always_inline))
+
+// Math helper macros
+#define max(a, b) ((a) > (b) ? (a) : (b))
+#define min(a, b) ((a) < (b) ? (a) : (b))
+#define abs(x) ((x) < 0 ? -(x) : (x))
+
+// Atomic operations
+#define atomicAdd(a, b) (__atomic_fetch_add(a, b, __ATOMIC_SEQ_CST))
+#define atomicSub(a, b) (__atomic_fetch_sub(a, b, __ATOMIC_SEQ_CST))
+#define atomicExch(a, b) (__atomic_exchange_n(a, b, __ATOMIC_SEQ_CST))
+#define atomicMin(a, b) (__atomic_fetch_min(a, b, __ATOMIC_SEQ_CST))
+#define atomicMax(a, b) (__atomic_fetch_max(a, b, __ATOMIC_SEQ_CST))
+#define atomicAnd(a, b) (__atomic_fetch_and(a, b, __ATOMIC_SEQ_CST))
+#define atomicOr(a, b) (__atomic_fetch_or(a, b, __ATOMIC_SEQ_CST))
+#define atomicXor(a, b) (__atomic_fetch_xor(a, b, __ATOMIC_SEQ_CST))
+
+// Basic typedefs - texture object handle
+typedef unsigned long long hipTextureObject_t;
+
+// Vector types with proper alignment
+typedef struct __align__(2) uchar2
+{
+    unsigned char x, y;
+} uchar2;
+
+typedef struct __align__(4) ushort2
+{
+    unsigned short x, y;
+} ushort2;
+
+typedef struct __align__(8) float2
+{
+    float x, y;
+} float2;
+
+typedef struct __align__(8) int2
+{
+    int x, y;
+} int2;
+
+typedef struct uint3
+{
+    unsigned int x, y, z;
+} uint3;
+
+typedef struct uint3 dim3;
+
+typedef struct __align__(4) uchar4
+{
+    unsigned char x, y, z, w;
+} uchar4;
+
+typedef struct __align__(8) ushort4
+{
+    unsigned short x, y, z, w;
+} ushort4;
+
+typedef struct __align__(16) int4
+{
+    int x, y, z, w;
+} int4;
+
+typedef struct __align__(16) float4
+{
+    float x, y, z, w;
+} float4;
+
+// Thread/block indexing - AMD GCN/RDNA architecture
+// These are provided by the HIP runtime when using hipcc
+#ifdef __HIP_DEVICE_COMPILE__
+extern "C" __device__ uint3 __ockl_get_local_id(void);
+extern "C" __device__ uint3 __ockl_get_group_id(void);
+extern "C" __device__ uint3 __ockl_get_local_size(void);
+
+#define threadIdx (__ockl_get_local_id())
+#define blockIdx (__ockl_get_group_id())
+#define blockDim (__ockl_get_local_size())
+#else
+// Host-side stubs for compilation
+static inline uint3 get_threadIdx(void) { uint3 r = {0,0,0}; return r; }
+static inline uint3 get_blockIdx(void) { uint3 r = {0,0,0}; return r; }
+static inline uint3 get_blockDim(void) { uint3 r = {0,0,0}; return r; }
+#define threadIdx (get_threadIdx())
+#define blockIdx (get_blockIdx())
+#define blockDim (get_blockDim())
+#endif
+
+// Vector initializers
+#define make_int2(a, b) ((int2){.x = a, .y = b})
+#define make_uchar2(a, b) ((uchar2){.x = a, .y = b})
+#define make_ushort2(a, b) ((ushort2){.x = a, .y = b})
+#define make_float2(a, b) ((float2){.x = a, .y = b})
+#define make_int4(a, b, c, d) ((int4){.x = a, .y = b, .z = c, .w = d})
+#define make_uchar4(a, b, c, d) ((uchar4){.x = a, .y = b, .z = c, .w = d})
+#define make_ushort4(a, b, c, d) ((ushort4){.x = a, .y = b, .z = c, .w = d})
+#define make_float4(a, b, c, d) ((float4){.x = a, .y = b, .z = c, .w = d})
+
+// Texture sampling - simplified version for basic texture operations
+// Full texture support requires the HIP SDK
+template<typename T>
+inline __device__ T tex2D(hipTextureObject_t texObject, float x, float y);
+
+// Math helper functions
+static inline __device__ float floorf(float a) { return __builtin_floorf(a); }
+static inline __device__ float floor(float a) { return __builtin_floorf(a); }
+static inline __device__ double floor(double a) { return __builtin_floor(a); }
+static inline __device__ float ceilf(float a) { return __builtin_ceilf(a); }
+static inline __device__ float ceil(float a) { return __builtin_ceilf(a); }
+static inline __device__ double ceil(double a) { return __builtin_ceil(a); }
+static inline __device__ float truncf(float a) { return __builtin_truncf(a); }
+static inline __device__ float trunc(float a) { return __builtin_truncf(a); }
+static inline __device__ double trunc(double a) { return __builtin_trunc(a); }
+static inline __device__ float fabsf(float a) { return __builtin_fabsf(a); }
+static inline __device__ float fabs(float a) { return __builtin_fabsf(a); }
+static inline __device__ double fabs(double a) { return __builtin_fabs(a); }
+static inline __device__ float sqrtf(float a) { return __builtin_sqrtf(a); }
+static inline __device__ float sqrt(float a) { return __builtin_sqrtf(a); }
+static inline __device__ double sqrt(double a) { return __builtin_sqrt(a); }
+static inline __device__ float rsqrtf(float a) { return 1.0f / 
__builtin_sqrtf(a); }
+static inline __device__ float sinf(float a) { return __builtin_sinf(a); }
+static inline __device__ float cosf(float a) { return __builtin_cosf(a); }
+static inline __device__ float expf(float a) { return __builtin_expf(a); }
+static inline __device__ float logf(float a) { return __builtin_logf(a); }
+static inline __device__ float powf(float a, float b) { return 
__builtin_powf(a, b); }
+
+// Saturate function (clamp to [0.0, 1.0])
+static inline __device__ float __saturatef(float a) {
+    return a < 0.0f ? 0.0f : (a > 1.0f ? 1.0f : a);
+}
+
+// Synchronization primitives
+#ifdef __HIP_DEVICE_COMPILE__
+extern "C" __device__ void __syncthreads(void);
+#else
+static inline void __syncthreads(void) {}
+#endif
+
+// Printf support for device code debugging
+extern "C" __device__ int printf(const char*, ...);
+
+#endif /* COMPAT_HIP_HIP_RUNTIME_H */
+
diff --git a/configure b/configure
index f2e0981a60..5454075218 100755
--- a/configure
+++ b/configure
@@ -351,6 +351,8 @@ External library support:
   --disable-audiotoolbox   disable Apple AudioToolbox code [autodetect]
   --enable-cuda-nvcc       enable Nvidia CUDA compiler [no]
   --disable-cuda-llvm      disable CUDA compilation using clang [autodetect]
+  --enable-hip             enable AMD HIP SDK support [no]
+  --disable-hip-llvm       disable HIP compilation using clang [autodetect]
   --disable-cuvid          disable Nvidia CUVID support [autodetect]
   --disable-d3d11va        disable Microsoft Direct3D 11 video acceleration 
code [autodetect]
   --disable-d3d12va        disable Microsoft Direct3D 12 video acceleration 
code [autodetect]
@@ -407,6 +409,7 @@ Toolchain options:
   --dep-cc=DEPCC           use dependency generator DEPCC [$cc_default]
   --glslc=GLSLC            use GLSL compiler GLSLC [$glslc_default]
   --nvcc=NVCC              use Nvidia CUDA compiler NVCC or clang 
[$nvcc_default]
+  --hipcc=HIPCC            use AMD HIP compiler HIPCC or clang 
[\$hipcc_default]
   --ld=LD                  use linker LD [$ld_default]
   --metalcc=METALCC        use metal compiler METALCC [$metalcc_default]
   --metallib=METALLIB      use metal linker METALLIB [$metallib_default]
@@ -432,6 +435,7 @@ Toolchain options:
   --optflags=OPTFLAGS      override optimization-related compiler flags
   --glslcflags=GLSLCFLAGS  extra glslc flags [$glslcflags_default]
   --nvccflags=NVCCFLAGS    override nvcc flags [$nvccflags_default]
+  --hipccflags=HIPCCFLAGS  override hipcc flags [\$hipccflags_default]
   --build-suffix=SUFFIX    library name suffix []
   --enable-pic             build position-independent code
   --enable-thumb           compile for Thumb instruction set
@@ -1084,6 +1088,10 @@ nvcc_o(){
     eval printf '%s\\n' $NVCC_O
 }
 
+hipcc_o(){
+    eval printf '%s\\n' $HIPCC_O
+}
+
 test_cc(){
     log test_cc "$@"
     cat > $TMPC
@@ -1148,6 +1156,29 @@ extern "C" {
 EOF
 }
 
+test_hipcc(){
+    log test_hipcc "$@"
+    cat > $TMPHIP
+    log_file $TMPHIP
+    tmphip_=$TMPHIP
+    tmpo_=$TMPO
+    [ -x "$(command -v cygpath)" ] && tmphip_=$(cygpath -m $tmphip_) && 
tmpo_=$(cygpath -m $tmpo_)
+    test_cmd $hipcc $hipccflags "$@" $HIPCC_C $(hipcc_o $tmpo_) $tmphip_
+}
+
+check_hipcc() {
+    log check_hipcc "$@"
+    name=$1
+    shift 1
+    disabled $name && return
+    disable $name
+    test_hipcc "$@" <<EOF && enable $name
+extern "C" {
+    __global__ void hello(unsigned char *data) {}
+}
+EOF
+}
+
 test_cpp(){
     log test_cpp "$@"
     cat > $TMPC
@@ -2091,6 +2122,7 @@ HWACCEL_AUTODETECT_LIBRARY_LIST="
     cuda
     cuda_llvm
     cuvid
+    hip_llvm
     d3d11va
     d3d12va
     dxva2
@@ -2114,6 +2146,7 @@ EXTRALIBS_LIST="
 HWACCEL_LIBRARY_NONFREE_LIST="
     cuda_nvcc
     cuda_sdk
+    hip
     libnpp
 "
 
@@ -2826,6 +2859,8 @@ CMDLINE_SET="
     optflags
     nvcc
     nvccflags
+    hipcc
+    hipccflags
     pkg_config
     pkg_config_flags
     progs_suffix
@@ -4408,6 +4443,8 @@ HOSTLD_O='-o $@'
 GLSLC_O='-o $@'
 NVCC_C='-c'
 NVCC_O='-o $@'
+HIPCC_C='-c'
+HIPCC_O='-o $@'
 
 host_extralibs='-lm'
 host_cflags_filter=echo
@@ -4928,6 +4965,19 @@ fi
 
 set_default nvcc
 
+# AMD HIP SDK configuration
+if enabled hip; then
+    hipcc_default="hipcc"
+    hipccflags_default="--offload-arch=gfx900 -O2"
+    HIPCC_C=""
+else
+    hipcc_default="clang"
+    hipccflags_default="--offload-arch=gfx900 -O2 -x hip"
+    HIPCC_C=""
+fi
+
+set_default hipcc
+
 if enabled cuda_nvcc; then
     if $nvcc $nvccflags_default 2>&1 | grep -qi unsupported; then
         nvccflags_default="-gencode arch=compute_60,code=sm_60 -O2"
@@ -5010,6 +5060,7 @@ tmpfile TMPE   $EXESUF
 tmpfile TMPH   .h
 tmpfile TMPM   .m
 tmpfile TMPCU  .cu
+tmpfile TMPHIP .hip
 tmpfile TMPGLSL .comp.glsl
 tmpfile TMPO   .o
 tmpfile TMPS   .S
@@ -7066,6 +7117,26 @@ else
     check_nvcc cuda_llvm
 fi
 
+# AMD HIP SDK flags configuration
+if [ -z "$hipccflags" ]; then
+    hipccflags=$hipccflags_default
+fi
+
+hipccflags="$hipccflags -std=c++17"
+
+if enabled x86_64 || enabled ppc64 || enabled aarch64; then
+    hipccflags="$hipccflags -m64"
+else
+    hipccflags="$hipccflags -m32"
+fi
+
+if enabled hip; then
+    hipccflags="$hipccflags --genco -nogpuinc -nogpulib -include 
${source_link}/compat/hip/hip_runtime.h"
+else
+    hipccflags="$hipccflags -S -x hip --hip-device-only -Wno-c++11-narrowing 
-include ${source_link}/compat/hip/hip_runtime.h"
+    check_hipcc hip_llvm
+fi
+
 if ! disabled ffnvcodec; then
     ffnv_hdr_list="ffnvcodec/nvEncodeAPI.h ffnvcodec/dynlink_cuda.h 
ffnvcodec/dynlink_cuviddec.h ffnvcodec/dynlink_nvcuvid.h"
     check_pkg_config ffnvcodec "ffnvcodec >= 12.1.14.0" "$ffnv_hdr_list" "" || 
\
@@ -7167,6 +7238,7 @@ enabled avisynth          && { require_headers 
"avisynth/avisynth_c.h avisynth/a
                                  die "ERROR: AviSynth+ header version must be 
>= 3.7.3"; } }
 enabled cairo             && require_pkg_config cairo cairo "cairo.h" 
cairo_create
 enabled cuda_nvcc         && { check_nvcc cuda_nvcc || die "ERROR: failed 
checking for nvcc."; }
+enabled hip               && { check_hipcc hip || die "ERROR: failed checking 
for hipcc (AMD HIP SDK)."; }
 enabled chromaprint       && { check_pkg_config chromaprint libchromaprint 
"chromaprint.h" chromaprint_get_version ||
                                require chromaprint chromaprint.h 
chromaprint_get_version -lchromaprint; }
 enabled decklink          && { require_headers DeckLinkAPI.h &&
@@ -8487,6 +8559,7 @@ RANLIB=$ranlib
 STRIP=$strip
 STRIPTYPE=$striptype
 NVCC=$nvcc
+HIPCC=$hipcc
 CP=cp -p
 LN_S=$ln_s
 CPPFLAGS=$CPPFLAGS
@@ -8496,6 +8569,7 @@ OBJCFLAGS=$OBJCFLAGS
 ASFLAGS=$ASFLAGS
 GLSLCFLAGS=$GLSLCFLAGS
 NVCCFLAGS=$nvccflags
+HIPCCFLAGS=$hipccflags
 AS_C=$AS_C
 AS_O=$AS_O
 OBJCC_C=$OBJCC_C
@@ -8509,6 +8583,8 @@ CXX_O=$CXX_O
 GLSLC_O=$GLSLC_O
 NVCC_C=$NVCC_C
 NVCC_O=$NVCC_O
+HIPCC_C=$HIPCC_C
+HIPCC_O=$HIPCC_O
 LD_O=$LD_O
 X86ASM_O=$X86ASM_O
 LD_LIB=$LD_LIB
diff --git a/doc/build_with_hip_sdk.txt b/doc/build_with_hip_sdk.txt
new file mode 100644
index 0000000000..4760b1ca95
--- /dev/null
+++ b/doc/build_with_hip_sdk.txt
@@ -0,0 +1,91 @@
+Building FFmpeg with AMD HIP SDK Support (Windows)
+===================================================
+
+This document describes how to build FFmpeg with AMD HIP SDK support
+for GPU-accelerated video processing on AMD GPUs.
+
+Requirements:
+-------------
+- Windows 10/11 or Windows Server 2022
+- AMD GPU with RDNA2, RDNA3, or newer architecture
+- AMD HIP SDK 5.x or 6.x
+- MSYS2 with MinGW64 toolchain
+
+Installation Steps:
+-------------------
+
+1. Install MSYS2 from https://www.msys2.org/
+
+2. Install build tools in MSYS2 MinGW 64-bit terminal:
+   pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-nasm make diffutils 
pkg-config
+
+3. Install AMD HIP SDK from:
+   https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html
+
+   Default installation path: C:\Program Files\AMD\ROCm\<version>\
+   Ensure hipcc is available: C:\Program Files\AMD\ROCm\<version>\bin\hipcc.bat
+
+4. Add HIP SDK to PATH (in MSYS2 terminal):
+   export PATH="/c/Program Files/AMD/ROCm/6.4/bin:$PATH"
+
+   (Adjust version number as needed)
+
+Configure and Build:
+--------------------
+
+   ./configure --enable-hip --enable-nonfree
+   make -j$(nproc)
+
+Note: The default target architecture is gfx900. To target a specific
+GPU architecture, use --hipccflags="--offload-arch=<arch>" where <arch>
+is your GPU's architecture code (e.g., gfx1030 for RDNA2, gfx1100 for RDNA3).
+
+Verification:
+-------------
+
+Check if HIP is enabled in the build:
+
+   ./ffmpeg -buildconf | grep hip
+
+Expected output should include: --enable-hip
+
+Troubleshooting:
+----------------
+
+1. "hipcc: command not found"
+   - Ensure HIP SDK bin directory is in PATH
+   - Check: ls "/c/Program Files/AMD/ROCm/6.4/bin/hipcc.bat"
+
+2. "failed checking for hipcc"
+   - Verify HIP SDK installation is complete
+   - Check ffbuild/config.log for detailed error messages
+
+3. Incompatible object files after switching toolchains:
+   - Run: make clean
+   - Then rebuild: make -j$(nproc)
+
+Notes:
+------
+
+- HIP (Heterogeneous-compute Interface for Portability) enables
+  GPU-accelerated code that can run on AMD GPUs.
+
+- The --enable-hip option enables the hipcc compiler for building
+  GPU kernels (.hip files) into GPU code objects (.hsaco files).
+
+- HIP support provides infrastructure for GPU-accelerated filters.
+  Actual GPU filters must implement HIP kernels similar to existing
+  CUDA filters (scale_cuda, yadif_cuda, etc.).
+
+External Dependencies:
+----------------------
+
+Header files:
+   - hip/hip_runtime.h (from HIP SDK, or compat/hip/hip_runtime.h)
+
+Tools:
+   - hipcc (HIP compiler, from AMD HIP SDK)
+
+Runtime:
+   - AMD GPU driver with ROCm/HIP support
+   - amdhip64.dll (HIP runtime library)
diff --git a/doc/general_contents.texi b/doc/general_contents.texi
index 47ac1989f2..eed0f5cf9d 100644
--- a/doc/general_contents.texi
+++ b/doc/general_contents.texi
@@ -35,6 +35,69 @@ package(amdgru-pro contains, but does not install 
automatically) are required.
 
 This driver can be installed using amdgpu-pro-install script in official amd 
driver archive.
 
+@section AMD HIP SDK
+
+FFmpeg can use the AMD HIP (Heterogeneous-compute Interface for Portability) 
SDK
+for GPU-accelerated video processing filters on AMD GPUs with RDNA2 
architecture or newer.
+
+@subsection Requirements
+
+@itemize
+@item Windows 10/11 or Windows Server 2022
+@item AMD GPU with RDNA2, RDNA3, or newer architecture
+@item AMD HIP SDK 5.x or 6.x (available from 
@url{https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html})
+@item Visual Studio 2019 or 2022 (for Windows development)
+@end itemize
+
+@subsection Installation on Windows
+
+@enumerate
+@item Download the HIP SDK installer from AMD's ROCm Hub.
+@item Run the installer with administrator privileges.
+@item Select the components to install: HIP Runtime, HIP SDK (headers and 
libraries), and optionally Visual Studio integration.
+@item The SDK will be installed to @code{C:\Program Files\AMD\ROCm\<version>\} 
by default.
+@item Ensure the @code{hipcc} compiler is available in your PATH, typically at 
@code{C:\Program Files\AMD\ROCm\<version>\bin}.
+@item A system restart may be required after installation.
+@end enumerate
+
+@subsection Building FFmpeg with HIP support
+
+To enable HIP SDK support with the official hipcc compiler:
+@example
+./configure --enable-hip
+@end example
+
+To use clang for HIP compilation (autodetected if hipcc is not available):
+@example
+./configure --enable-hip-llvm
+@end example
+
+You can specify a custom hipcc path and flags:
+@example
+./configure --enable-hip --hipcc=/path/to/hipcc 
--hipccflags="--offload-arch=gfx1030 -O2"
+@end example
+
+@subsection Supported GPU Architectures
+
+The @code{--offload-arch} flag specifies the target GPU architecture:
+@itemize
+@item @code{gfx900} - Vega (Vega 56, Vega 64)
+@item @code{gfx906} - Vega 20 (Radeon VII)
+@item @code{gfx1030} - RDNA2 (RX 6800, RX 6900)
+@item @code{gfx1100} - RDNA3 (RX 7900)
+@item @code{gfx1101} - RDNA3 (RX 7700, RX 7800)
+@item @code{gfx1102} - RDNA3 (RX 7600)
+@end itemize
+
+@subsection Notes
+
+HIP is designed for portability between AMD and NVIDIA GPUs. Code written for
+HIP can often be compiled for CUDA targets as well. The HIP SDK includes tools
+like HIPIFY for converting existing CUDA code to HIP.
+
+For more information, see the official AMD HIP documentation at
+@url{https://rocm.docs.amd.com/projects/HIP/}.
+
 @section AviSynth
 
 FFmpeg can read AviSynth scripts as input. To enable support, pass
diff --git a/ffbuild/common.mak b/ffbuild/common.mak
index 06d6f39715..51a010b598 100644
--- a/ffbuild/common.mak
+++ b/ffbuild/common.mak
@@ -27,7 +27,7 @@ BIN2C = $(BIN2CEXE)
 ifndef V
 Q      = @
 ECHO   = printf "$(1)\t%s\n" $(2)
-BRIEF  = CC CXX OBJCC HOSTCC HOSTLD AS X86ASM AR LD LDXX STRIP CP WINDRES 
GLSLC NVCC BIN2C METALCC METALLIB
+BRIEF  = CC CXX OBJCC HOSTCC HOSTLD AS X86ASM AR LD LDXX STRIP CP WINDRES 
GLSLC NVCC HIPCC BIN2C METALCC METALLIB
 SILENT = DEPCC DEPCXX DEPHOSTCC DEPAS DEPX86ASM RANLIB RM
 
 MSG    = $@
@@ -70,6 +70,7 @@ COMPILE_X86ASM = $(call COMPILE,X86ASM)
 COMPILE_HOSTC = $(call COMPILE,HOSTCC)
 COMPILE_GLSLC = $(call COMPILE,GLSLC)
 COMPILE_NVCC = $(call COMPILE,NVCC)
+COMPILE_HIPCC = $(call COMPILE,HIPCC)
 COMPILE_MMI = $(call COMPILE,CC,MMIFLAGS)
 COMPILE_MSA = $(call COMPILE,CC,MSAFLAGS)
 COMPILE_LSX = $(call COMPILE,CC,LSXFLAGS)
@@ -157,6 +158,10 @@ endif
 %.ptx: %.cu $(SRC_PATH)/compat/cuda/cuda_runtime.h
        $(COMPILE_NVCC)
 
+# AMD HIP SDK compilation rules
+%.hsaco: %.hip $(SRC_PATH)/compat/hip/hip_runtime.h
+       $(COMPILE_HIPCC)
+
 ifdef CONFIG_SHADER_COMPRESSION
 %.ptx.gz: %.ptx
        $(RUN_GZIP)
@@ -168,6 +173,18 @@ else
        $(RUN_BIN2C)
 endif
 
+# AMD HIP HSACO binary rules
+ifdef CONFIG_HSACO_COMPRESSION
+%.hsaco.gz: %.hsaco
+       $(RUN_GZIP)
+
+%.hsaco.c: %.hsaco.gz $(BIN2CEXE)
+       $(RUN_BIN2C)
+else
+%.hsaco.c: %.hsaco $(BIN2CEXE)
+       $(RUN_BIN2C)
+endif
+
 %.css.min: %.css
        $(RUN_MINIFY)
 
@@ -245,9 +262,10 @@ SKIPHEADERS := $(SKIPHEADERS:%=$(SUBDIR)%)
 HOBJS        = $(filter-out $(SKIPHEADERS:.h=.h.o),$(ALLHEADERS:.h=.h.o))
 SPVOBJS      = $(filter %.spv.o,$(OBJS))
 PTXOBJS      = $(filter %.ptx.o,$(OBJS))
+HSACOOBJS    = $(filter %.hsaco.o,$(OBJS))
 $(HOBJS):     CCFLAGS += $(CFLAGS_HEADERS)
 checkheaders: $(HOBJS)
-.SECONDARY:   $(HOBJS:.o=.c) $(SPVOBJS:.o=.c) $(SPVOBJS:.o=.gz) $(SPVOBJS:.o=) 
$(PTXOBJS:.o=.c) $(PTXOBJS:.o=.gz) $(PTXOBJS:.o=)
+.SECONDARY:   $(HOBJS:.o=.c) $(SPVOBJS:.o=.c) $(SPVOBJS:.o=.gz) $(SPVOBJS:.o=) 
$(PTXOBJS:.o=.c) $(PTXOBJS:.o=.gz) $(PTXOBJS:.o=) $(HSACOOBJS:.o=.c) 
$(HSACOOBJS:.o=.gz) $(HSACOOBJS:.o=)
 alltools: $(TOOLS)
 
 $(HOSTOBJS): %.o: %.c
@@ -266,7 +284,7 @@ $(TOOLOBJS): | tools
 
 OUTDIRS := $(OUTDIRS) $(dir $(OBJS) $(HOBJS) $(HOSTOBJS) $(SHLIBOBJS) 
$(STLIBOBJS) $(TESTOBJS))
 
-CLEANSUFFIXES     = *.d *.gcda *.gcno *.h.c *.ho *.map *.o *.objs *.pc *.ptx 
*.ptx.gz *.ptx.c *.spv *.spv.gz *.spv.c *.ver *.version *.html.gz *.html.c 
*.css.min.gz *.css.min *.css.c  *$(DEFAULT_X86ASMD).asm *~ *.ilk *.pdb
+CLEANSUFFIXES     = *.d *.gcda *.gcno *.h.c *.ho *.map *.o *.objs *.pc *.ptx 
*.ptx.gz *.ptx.c *.hsaco *.hsaco.gz *.hsaco.c *.spv *.spv.gz *.spv.c *.ver 
*.version *.html.gz *.html.c *.css.min.gz *.css.min *.css.c  
*$(DEFAULT_X86ASMD).asm *~ *.ilk *.pdb
 LIBSUFFIXES       = *.a *.lib *.so *.so.* *.dylib *.dll *.def *.dll.a
 
 define RULES
-- 
2.52.0

_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[FFmpeg-devel] [PR] build: add amd hip sdk support for Windows (PR #21595)

Reply via email to