On 3/23/2025 6:47 PM, Hendrik Leppkes wrote:
On Sun, Mar 23, 2025 at 9:35 PM James Almer <jamr...@gmail.com> wrote:On 3/23/2025 4:33 PM, Massimo Eynard wrote:On 23/03/2025 20:01, James Almer wrote:On 3/22/2025 2:49 PM, Massimo Eynard wrote:This patch adds support for decoding the fourth MLP substream which contains the 16-channel presentation used for Atmos audio objects. By default only the first three substreams are decoded unless the new extract_objects flag is enabled as the resulting presentation contains audio object feeds instead of classic loudspeaker feeds. As this introduces interpolation of primitive matrices, precision has been increased to 2.18 fixed point. Therefore this requires DSP code upgrade which has been done for C and x86 implementations but not the ARM implementation. Adds two FATE tests using existing atmos.thd sample to reflect changes. Signed-off-by: Massimo Eynard <eynard.mass...@gmail.com> --- libavcodec/arm/mlpdsp_armv5te.S | 2 +- libavcodec/arm/mlpdsp_init_arm.c | 3 +- libavcodec/mlp.h | 10 +- libavcodec/mlp_parse.c | 31 ++- libavcodec/mlp_parse.h | 1 + libavcodec/mlp_parser.c | 11 +- libavcodec/mlpdec.c | 389 +++++++++++++++++++++++++++---- libavcodec/mlpdsp.c | 50 +++- libavcodec/mlpdsp.h | 25 ++ libavcodec/x86/mlpdsp.asm | 19 +- tests/fate/truehd.mak | 10 + 11 files changed, 476 insertions(+), 75 deletions(-)With atmos.thd i get:[aist#0:0/truehd @ 00000209caf3ee00] Guessed Channel Layout: 7.1.4 Input #0, truehd, from '../samples/truehd/atmos.thd': Duration: N/A, start: 0.000000, bitrate: N/A Stream #0:0: Audio: truehd (Dolby TrueHD + Dolby Atmos), 48000 Hz, 7.1.4, s32 (24 bit)Which is unlikely to be correct. The file has 11 (or 12) objects, which is exported as 12 channels in an unspecified layout, and automatically assumed to be a 7.1.4 fixed layout.This is caused by `guess_input_channel_layout` (in `ffmpeg_demux.c`) which tries to assume a layout. Would using `AV_CHANNEL_ORDER_CUSTOM` with all channels set to `AV_CHAN_UNKNOWN` (for unknown position, except LFE if present) be a better solution?Possibly, but it may make the stream undecodable unless you remap the channels (probably with a filter in the filterchain). Is there no better representation for the output? What are these 12 channels the sample exports? 16 channels (as you say the MLP substream contains) would match Ambisonics 3rd order, but i assume that doesn't apply here, unless you should also be outputting something else.Its object-based audio. Every extra "channel" represents an audio object at any arbitrary position in space, as defined by separate metadata, which you are then supposed to mix together for your final speaker configuration. Typically, the "bed" channels (eg. the base 7.1) will contain audio that doesn't require much localization information, music, background noises, and the objects will contain audio which is more relevant to have full spatial localization. A mixer is then tasked based on the spatial metadata and knowledge of the physical speaker configuration to mix the objects for ideal spatial representation. We don't have a channel layout that would identify this sort of setup as of yet, nevermind a mixer that could actually deal with it, or even exporting the metadata from the TrueHD stream, but baby steps I suppose.
So we'd need a new layout (or pseudo-channel) where you set arbitrary coordinates? Sort of like what Apple defined in https://developer.apple.com/documentation/coreaudiotypes/audio-channel-coordinates
FWIW, taking all this into account, I fully agree that it should by default output the 7.1 representation that everyone can actually process, because the bed+objects representation is rather unexpected and unhandleable at this time.
Agree.
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".