Re: [FFmpeg-devel] [RFC] Shaping the AVTextFormat API Surface

Stefano Sabatini Wed, 07 May 2025 16:18:57 -0700

On date Monday 2025-05-05 16:32:08 +0200, Nicolas George wrote:
> Stefano Sabatini (HE12025-05-04):
> > I don't understand this claim. There is a root, and each section can
> > have several subsections, so it is a tree in my view, although we set
> > a maximum depth. Where am I wrong?
>


> Are we looking at the same thing? In ffprobe's output, we have sections
> “packets”, “streams”, “format”, etc., and in each section items, but
> that does not go deeper.

The -sections option will show the ffprobe data "schema".

$ ffprobe -sections -hide_banner
Sections:
W... = Section is a wrapper (contains other sections, no local entries)
.A.. = Section contains an array of elements of the same type
..V. = Section may contain a variable number of fields with variable keys
...T = Section contain a unique type
FLAGS NAME/UNIQUE_NAME
----
W...  root
.A..      chapters
....          chapter
..V.              tags/chapter_tags
....      format
..V.          tags/format_tags
.A..      frames
....          frame
..V.              tags/frame_tags
.A..              side_data_list/frame_side_data_list
..VT                  side_data/frame_side_data
.A..                      timecodes
....                          timecode
.A..                      components/frame_side_data_components
..VT                          component/frame_side_data_component
.A..                              pieces/frame_side_data_pieces
..VT                                  piece/frame_side_data_piece
.A..              logs
....                  log
....          subtitle
.A..      programs
....          program
..V.              tags/program_tags
.A..              streams/program_streams
....                  stream/program_stream
....                      disposition/program_stream_disposition
..V.                      tags/program_stream_tags
.A..      stream_groups
....          stream_group
..V.              tags/stream_group_tags
....              disposition/stream_group_disposition
.A..              components/stream_group_components
..VT                  component/stream_group_component
.A..                      subcomponents
..VT                          subcomponent
.A..                              pieces/stream_group_pieces
..VT                                  piece/stream_group_piece
.A..                                      subpieces
..VT                                          subpiece
.A..                                              blocks
..VT                                                  block
.A..              streams/stream_group_streams
....                  stream/stream_group_stream
....                      disposition/stream_group_stream_disposition
..V.                      tags/stream_group_stream_tags
.A..      streams
....          stream
....              disposition/stream_disposition
..V.              tags/stream_tags
.A..              side_data_list/stream_side_data_list
..VT                  side_data/stream_side_data
.A..      packets
....          packet
..V.              tags/packet_tags
.A..              side_data_list/packet_side_data_list
..VT                  side_data/packet_side_data
....      error
....      program_version
.A..      library_versions
....          library_version
.A..      pixel_formats
....          pixel_format
....              flags/pixel_format_flags
.A..              components/pixel_format_components
....                  component

So yes, this should be a tree, although we hardcode a maximum depth
and the maximum number of items per section - it might be possible to
remove this limitation with some effort. In particular, we migth
benefit from employing a dictionary implementation.
 
> And in the source code of ffprobe, I see extremely ad-hoc code.

This is expected, since this is application-level logic - and we need
to instruct the code to convert the internal data to the
sections-based representation. The recently tagged AVTextFormat API
should be pretty generic. Note that in fact in AVTextFormat there is
no mention to "ffprobe" concepts (packets, frames, etc.) since that's
part of the data schema.

> > I agree with softworkz on this. The AVTextFormat functionality is not
> > about a specific format, it's supposed to be a generic way to
> > represent a data tree using different formats. Being able to provide
> > this generic representation is crucial, since we want a single entry
> > point to represent data in a way which can be parsed in various ways,
> > given a data schema.
> 
> Is this API meant to be a generic API for writing structured data, or is
> it meant to be totally specific to ffprobe and usable by one other use
> case that was designed to behave exactly like ffprobe.
> 
> An API that is not generic should not go into libavutil.
> 
> An API that cannot serve all, or at least most of, our currently
> existing use cases cannot be called generic.

One of the use cases I have in mind is to support structured data
coming from filters - there are several different approaches currently
employed, all of them somehow underkill. For example, some filters
print to the log using a custom format, making this unsuitable for
parsing; others print to a file, employing a custom format.

Ideally I'd like to have such filters employ the AVTextFormat API (or
whatever we want to call it) so that we can generate outputs in one of
the supported format with the minimum effort. Most of the data coming
out from the filters should be mostly shallow - two or three levels -
so you define the schema, select an output, and finally write the
logic to convert the internal data to the structured output.

So this should be generic enough to support this case - we need to
define a data schema, an output format, and the custom conversion
logic. This model should be generic enough - in fact it is possibly
even more powerful than needed - mostly to support XML - since for
that purpose we could be done with a simple dictionary/JSON
representation.

> > If we want to add support for a specific format encoder (e.g. XML,
> > JSON), it might be *used* by the AVTextFormat API, not be
> > *implemented* by the AVTextFormat.
> 
> Which is exactly what I told softworkz should start with.

What I mean is that we might implement e.g. an XML encoder (such as
av_xml_add_element(), av_json_add_attribute(), etc.) and this might be
used in the codebase wherever the XML format is used - including the
AVTextFormat API, but this is not the scope of such API - it is mostly
about providing means to generate a multi-format representation of a
data tree. Probably the name should reflect this -
AVStructuredFormat/AVTreeFormat API.

On the other hand I'm not convinced we might really benefit from an
XML encoder, given that custom code is trivial while a generic API is
more difficult.

> Making this API generic is not an easy task, but it is doable. We should
> not settle for an inferior API just because the person who proposed it
> wrote the code before designing it properly and now is in a hurry to get
> it applied.

The plan is to use fftools as the staging area, since we don't impact
external/internal interfaces, and possibly let it mature to cover more
use cases without impacting users before making it public. Also we can
move it to libavutil and mark is as private before making it
public. But we need to start somewhere.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [RFC] Shaping the AVTextFormat API Surface

Reply via email to