On 09/29/2020 09:20 AM, Devin Heitmueller wrote:
Hi Mark,
Hi Devin. Thanks much!
Your response came in while I was composing my previous message. I see (below) that performance is a
major issue. That absolutely makes sense because, after accuracy, speed is the next most important
objective (and for some use cases, may actually be more important).
I imagine that format-to-format conversion is probably the most optimized code in ffmpeg. Is there a
function library dedicated solely to format conversion? I ask so that, in what I write, I can assure
users that the issues are known and addressed.
For my modest purposes, a sketch of planar v. packed is probably all that's needed. I think you've
made "planar" clear. Thank you for that. I can imagine that the structure of packed is
multitudinous. Why is it called "packed"? How is it packed? Are the luma & chroma mixed in one
buffer (analogous to blocks in macroblocks) or split into discrete buffers? How are they spacially
structured? Is there any special sub structures (analogous to macroblocks in slices)? Are the sub
structures, if any, format dependent?
So when you talk about the decoded frames, there is no concept of
macroblocks. There are simple video frames with Y, Cb, Cr samples.
How those samples are organized and their sizes are determined by the
AVFrame format.
"Packed" and "planar", eh? What evidence do you have? ...Share the candy!
Now, I'm not talking about streams. I'm talking about after decoding. I'm
talking about the buffers.
I would think that a single, consistent format would be used.
When dealing with typical consumer MPEG-2 or H.264 video, the decoded
frames will typically have what's referred to as "4:2:0 planar"
format. This means that the individual Y/Cb/Cr samples are not
contiguous. If you look at the underlying data that makes up the
frame as an array, you will typically have W*H Y values, followed by
W*H/4 Cb values, and then there will be W*H/4 Cr values. Note that I
say "values" and not "bytes", as the size of each value may vary
depending on the pixel format.
Unfortunately there is no "single, consistent format" because of the
variety of different ways in which the video can be encoded, as well
as performance concerns. Normalizing it to a single format can have a
huge performance cost, in particular if the original video is in a
different colorspace (e.g. the video is YUV and you want RGB).
Generally speaking you can set up the pipeline to always deliver you a
single format, and ffmpeg will automatically perform any
transformations necessary to achieve that (e.g. convert from packed to
planer, RGB to YUV, 8-bit to 10-bit, 4:2:2 to 4:2:0, etc). However
this can have a severe performance cost and can result in quality loss
depending on the transforms required.
The codec will typically specify its output format, largely dependent
on the nature of the encoding, and then announce AVFrames that conform
to that format. Since you're largely dealing with MPEG-2 and H.264
video, it's almost always going to be YUV 4:2:0 planar. The filter
pipeline can then do conversion if needed, either because you told it
to convert it or because you specified some filter pipeline where the
individual filter didn't support what format it was being given.
See libavutil/pixfmt.h for a list of all the possible formats in which
AVFrames can be announced by a codec.
Devin
--
The U.S. political problem? Amateurs are doing the street fighting.
The Princeps Senatus and the Tribunus Plebis need their own armies.
_______________________________________________
ffmpeg-user mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user
To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".