Re: [FFmpeg-devel] [PATCH 1/4] avutil: add generic side data for video coding info

Timothée Sun, 20 Jul 2025 11:25:04 -0700

On 18/07/2025 17:48, Michael Niedermayer wrote :

Hi


On Fri, Jul 18, 2025 at 12:30:52PM +0200, Timothée Regaud wrote:

From: Timothee Regaud<timothee.informati...@regaud-chapuy.fr>

Adds the generic data structures to libavutil. The design is recursive to 
support other codecs, even though the implementation is only for H.264 for now.

Signed-off-by: Timothee Regaud<timothee.informati...@regaud-chapuy.fr>
---
  libavutil/Makefile            |   1 +
  libavutil/frame.h             |   7 ++
  libavutil/side_data.c         |   1 +
  libavutil/video_coding_info.h | 163 ++++++++++++++++++++++++++++++++++
  4 files changed, 172 insertions(+)
  create mode 100644 libavutil/video_coding_info.h

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 94a56bb72f..44e51ab7ae 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -93,6 +93,7 @@ HEADERS = adler32.h                                           
          \
            tree.h                                                        \
            twofish.h                                                     \
            uuid.h                                                        \
+          video_coding_info.h                                           \
            version.h                                                     \
            video_enc_params.h                                            \
            xtea.h                                                        \
diff --git a/libavutil/frame.h b/libavutil/frame.h
index c50cd263d9..f4404472a0 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -254,6 +254,13 @@ enum AVFrameSideDataType {
       * libavutil/tdrdi.h.
       */
      AV_FRAME_DATA_3D_REFERENCE_DISPLAYS,
+
+    /**
+     * Detailed block-level coding information. The data is an 
AVVideoCodingInfo
+     * structure. This is exported by video decoders and can be used by filters
+     * for analysis and visualization.
+     */
+    AV_FRAME_DATA_VIDEO_CODING_INFO,
  };

enum AVActiveFormatDescription {

diff --git a/libavutil/side_data.c b/libavutil/side_data.c
index fa2a2c2a13..b938ef6f52 100644
--- a/libavutil/side_data.c
+++ b/libavutil/side_data.c
@@ -56,6 +56,7 @@ static const AVSideDataDescriptor sd_props[] = {
      [AV_FRAME_DATA_SEI_UNREGISTERED]            = { "H.26[45] User Data 
Unregistered SEI message",  AV_SIDE_DATA_PROP_MULTI },
      [AV_FRAME_DATA_VIDEO_HINT]                  = { "Encoding video hint",    
                      AV_SIDE_DATA_PROP_SIZE_DEPENDENT },
      [AV_FRAME_DATA_3D_REFERENCE_DISPLAYS]       = { "3D Reference Displays 
Information",            AV_SIDE_DATA_PROP_GLOBAL },
+    [AV_FRAME_DATA_VIDEO_CODING_INFO]           = { "Video Coding Info",       
                     AV_SIDE_DATA_PROP_SIZE_DEPENDENT },
  };

const AVSideDataDescriptor *av_frame_side_data_desc(enum AVFrameSideDataType type)

diff --git a/libavutil/video_coding_info.h b/libavutil/video_coding_info.h
new file mode 100644
index 0000000000..17e9345892
--- /dev/null
+++ b/libavutil/video_coding_info.h
@@ -0,0 +1,163 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_VIDEO_CODING_INFO_H
+#define AVUTIL_VIDEO_CODING_INFO_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+/**
+ * @file
+ * @ingroup lavu_frame
+ * Structures for describing block-level video coding information.
+ */
+
+/**
+ * @defgroup lavu_video_coding_info Video Coding Info
+ * @ingroup lavu_frame
+ *
+ * @{
+ * Structures for describing block-level video coding information, to be
+ * attached to an AVFrame as side data.
+ *
+ * All pointer-like members in these structures are offsets relative to the
+ * start of the AVVideoCodingInfo struct to ensure the side data is
+ * self-contained and relocatable. This is critical as the underlying buffer
+ * may be moved in memory.
+ */
+
+/**
+ * Structure to hold inter-prediction information for a block.
+ */
+typedef struct AVBlockInterInfo {
+    /**
+     * Offsets to motion vectors for list 0 and list 1, relative to the
+     * start of the AVVideoCodingInfo struct.
+     * The data for each list is an array of [x, y] pairs of int16_t.
+     * The number of vectors is given by num_mv.
+     * An offset of 0 indicates this data is not present.
+     */
+    size_t mv_offset[2];

int16 is not enough, with growing picture sizes and growing precission of
motion vectors

You are right. I didn't anticipate high resolution videos. I will changeit to int32 in the v2 patch.

also the MV precssion is needed somewhere somehow or they could not be
vissualized by generic code


That's true. I will add something like `uint8_t mv_precision_log2;`

+
+    /**
+     * Offsets to reference indices for list 0 and list 1, relative to the
+     * start of the AVVideoCodingInfo struct.
+     * The data is an array of int8_t. A value of -1 indicates the reference
+     * is not used for a specific partition.
+     * An offset of 0 indicates this data is not present.
+     */
+    size_t ref_idx_offset[2];
+    /**
+     * Number of motion vectors for list 0 and list 1.
+     */
+    uint8_t num_mv[2];
+} AVBlockInterInfo;

weighted bi pred needs the weights too

and for more than 1 MV, the question becomes what the other vectors
are, bipred ?, affine MC ?, ...

It was intended for L0 and L1 vectors as used in H.264, but I see nowthat this doesn't apply to every codec.

Also if you want to be really generic you need to allow blocks
that dont span accross the luma and chroma planes but allow
different block structures (and motion vectors) per plane

Iam not sure how generic we want to be and how useful that is.

But it seemes you want your patch to be quite generic ?

I think its more important to allow this to be extensible
than suporting everything we can think of.

That is maybe store the size of the struct also somewhere so
that elements can be added to their end without breaking
anything. At least for the main block structure

I will add a size field to the main block structure.

I mean a future codec might allow non rectangular blocks but we
dont want to think about that today.

Maybe its best to keep this as simple as possible but extensible

Yes, my goal is for the patch to be as generic as possible, but it ischallenging since I have mostly worked on H.264 and do not know everycodec. Apparently, I've missed a few details.


I will add the following for weighted prediction:

|typedef struct AVBlockWeightInfo { int16_t luma_weight[2]; // For L0and L1 int16_t luma_offset[2]; // For L0 and L1 int16_tchroma_weight[2][2]; // For L0/L1 and Cb/Cr int16_t chroma_offset[2][2];} AVBlockWeightInfo; |


|And add `size_t weight_info_offset;` to AVBlockInterInfo.|

This does not apply for every codec but it will work for some, likeH.264. We can always extend it in the future if other codecs need moreparameters, which the new size field will allow.

+
+/**
+ * Structure to hold intra-prediction information for a block.
+ */
+typedef struct AVBlockIntraInfo {
+    /**
+     * Offset to an array of intra prediction modes, relative to the
+     * start of the AVVideoCodingInfo struct.
+     * The number of modes is given by num_pred_modes.
+     */
+    size_t pred_mode_offset;
+
+    /**
+     * Number of intra prediction modes.
+     */
+    uint8_t num_pred_modes;
+
+    /**
+     * Chroma intra prediction mode.
+     */
+    uint8_t chroma_pred_mode;
+} AVBlockIntraInfo;

classifying the predition in directional, DC, and non directional and
for directional the direction. Could be usefull.

I will look into that for v2.

Otherwise the prediction mode number requires codec specific knowledge
to interpret

Yes, that's why I added `uint32_t codec_specific_type` inAVVideoCodingInfoBlock.


Thanks,

Timothée
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/4] avutil: add generic side data for video coding info

Reply via email to