RFC MPEG encoding and decoding V4L2 API additions
Version 0.1
This RFC adds new functionality to the V4L2 API in order to properly
support MPEG hardware encoders and decoders. This is mostly driven by
the work to get the ivtv driver (www.ivtvdriver.org) into the kernel,
but it can also benefit other hardware encoders and decoders. Which is
why this RFC is cross-posted to the dxr3-devel mailinglist as well.
A general note: while MPEG-1/2/4 is currently the codec most often
found, this RFC should also work for other compressed-stream format,
possibly with some later additions.
This RFC only deals with the encoding and decoding part. The cx23415
also supports and On-Screen Display (OSD). Another RFC will appear for
that later. I need to do some more research on that first before I can
issue that.
This RFC is divided into several sections. The first section describes a
few additional MPEG compression controls. It is followed by a
description of the new Program Index functionality. Then a description
is given of the actual MPEG encoding commands (start, stop, pause,
resume) and the timing query ioctl.
This is followed by a description of new MPEG decompression controls and
a description of the MPEG decoding commands and timing query ioctls.
Finally there is a section on the rationale of the some of the decisions
taken in this RFC.
Part I: MPEG encoding
=====================
MPEG compression controls
-------------------------
V4L2_CID_MPEG_VIDEO_MUTE
Type: integer
Description: Mutes the video to a fixed color when capturing. This is
useful for testing as it creates a fixed and reproducable video
bitstream.
The supplied 32-bit integer has the following value:
0 '0'=video not muted
'1'=video muted, creates frames with the YUV color
defined below
1:7 Unused, set to 0.
8:15 V chrominance information
16:23 U chrominance information
24:31 Y luminance information
V4L2_CID_MPEG_AUDIO_MUTE
Type: bool
Description: Mutes the audio when capturing. This is not done by muting
audio hardware, which can still produce a slight hiss, but in the
encoder itself, guaranteeing a fixed and reproducable audio bitstream.
0 = unmuted, 1 = muted.
V4L2_CID_MPEG_CX2341X_STREAM_INSERT_NAV_PACKETS
Type: bool
Description: this control is specific to the CX23415/6. If set, then it
enables navigation pack insertion for DVD. To be precise: it adds 0xbf
(private stream 2) packets to the MPEG. The size of these packets is
2048 bytes (including the 6-byte header). The payload is zeroed and it
is up to the application to fill them in. These packets are inserted
every four frames.
0 = do not insert, 1 = insert DVD navigation packets.
MPEG Program Index
------------------
#define V4L2_PGMIDX_FRAME_P 0
#define V4L2_PGMIDX_FRAME_I 1
#define V4L2_PGMIDX_FRAME_B 2
#define V4L2_PGMIDX_FRAME_MASK 3
struct v4l2_pgmidx_entry {
u64 offset;
u64 pts;
u32 length;
u32 flags;
u32 reserved[2];
};
#define V4L2_PGMIDX_ENTRIES (64)
struct v4l2_pgmidx {
u32 entries;
u32 entries_cap;
u32 reserved[4];
struct v4l2_pgmidx_entry entry[V4L2_PGMIDX_ENTRIES];
};
#define VIDIOC_G_ENCODER_PGMIDX _IOR('V', 64, struct v4l2_pgmidx)
Return program indices. I.e. at the given offset a frame starts (P/I/B
according to the flags) and with the given PTS (Presentation Time
Stamp) and length. The offset may never exceed the number of bytes
actually read. I.e. it should never return 'future events'.
'entries' is the number of entries filled in the entry
array. 'entries_cap' is the capacity of the index in the driver. This
may be larger or smalled than V4L2_PGMIDX_ENTRIES. 'entries' will
always be less or equal to min(entries_cap, V4L2_PGMIDX_ENTRIES).
If this ioctl is called when no capture is in progress, then 'entries'
is 0 and 'entries_cap' should be set to the capacity. This way
applications can check beforehand how frequently the index should be
obtained.
MPEG Encoding commands
----------------------
#define V4L2_ENC_CMD_START (0)
#define V4L2_ENC_CMD_STOP (1)
#define V4L2_ENC_CMD_PAUSE (2)
#define V4L2_ENC_CMD_RESUME (3)
/* Flags for V4L2_ENC_CMD_STOP */
#define V4L2_ENC_CMD_STOP_AT_GOP_END (1 << 0)
struct v4l2_encoder_cmd {
__u32 cmd;
__u32 flags;
union {
struct {
__u32 data[16];
} raw;
};
};
#define VIDIOC_ENCODER_CMD _IORW('V', 69, struct v4l2_encoder_cmd)
#define VIDIOC_TRY_ENCODER_CMD _IORW('V', 69, struct v4l2_encoder_cmd)
Before calling this ioctl the unused fields of v4l2_encoder_cmd must be
zeroed.
'cmd' is set by the user and is the command for the encoder.
'flags' is currently only used by the STOP command and contains one bit:
If V4L2_ENC_CMD_STOP_AT_GOP_END is set, then the capture continues
until the end of the GOP, otherwise it stops immediately.
These ioctl wills check whether the command is supported (-EINVAL is
returned if not) and modify any arguments if needed to make it a valid
call for the available hardware. The modified arguments are returned.
The VIDIOC_TRY_ENCODER_CMD is identical to VIDIOC_ENCODER_CMD, except
that the TRY ioctl does not actually execute the command.
Note that a read() to a stopped encoder implies a V4L2_ENC_CMD_START. A
close() of an encoder that is currently encoding implies an immediate
V4L2_ENC_CMD_STOP. When the encoder has no more pending data after
issuing a STOP the read() call will return 0 to indicate that the
encoder has stopped. The next read will start the encoder again.
MPEG Timing query
-----------------
struct v4l2_stream_timing {
u32 frame; // frame counter from start of capture/playback.
// starts at 1. 0 = unknown
u64 pts; // MPEG program time stamp. 33 bits, 0 = unknown
u64 clock_ref; // MPEG system clock reference. 42 bits, 0 = unknown.
u32 reserved[8];
};
#define VIDIOC_G_ENCODER_TIMING _IOR('V', 70, struct
v4l2_stream_timing)
Return the timing information of the last read frame.
The unit of the PTS is 1/90000 second.
The clock_ref (also known as a SCR for an MPEG Program Stream or PCR for
an MPEG Transport Stream) consists of two parts: bits 9-41 is the
reference base in units of 1/90000 second. Bits 0-8 form the reference
extension with units of 1/27000000 second. The range of the ref.
extension is 0-299. If unknown, then the reference extension must be
set to 0.
These units come from the MPEG standard. Room is reserved in the timing
struct for other timing information should that be required.
Part II: MPEG decoding
======================
MPEG decompression controls
---------------------------
The MPEG decompression controls all belong to the MPEG decompression
class:
#define V4L2_CTRL_CLASS_MPEG_DEC 0x009a0000 /* MPEG-decompression
controls */
enum v4l2_mpeg_dec_audmode {
V4L2_DECODER_AUDMODE_STEREO = 0,
V4L2_DECODER_AUDMODE_LEFT = 1,
V4L2_DECODER_AUDMODE_RIGHT = 2,
V4L2_DECODER_AUDMODE_MONO = 3,
V4L2_DECODER_AUDMODE_SWAP = 4,
};
V4L2_CID_MPEG_DEC_AUDMODE_STEREO
Type: v4l2_mpeg_dec_audmode enum
Description: Select how an MPEG stereo audio stream should be decoded.
V4L2_CID_MPEG_DEC_AUDMODE_BILINGUAL
Type: v4l2_mpeg_dec_audmode enum
Description: Select how an MPEG bilingual audio stream should be
decoded.
Background information: the ivtv driver detects when the capture source
has bilingual audio and sets the MPEG stream marker that tells the
decoder that the content of the stream contains bilingual audio. The
decoder detects this marker as well and automatically selects the
stereo or bilingual audio mode.
V4L2_CID_MPEG_DEC_STREAM_PID_AUDIO
Type: integer
Description: Select which audio Transport Stream Packet ID should be
used for playback. Default = 256.
V4L2_CID_MPEG_DEC_STREAM_PID_VIDEO
Type: integer
Description: Select which video Transport Stream Packet ID should be
used for playback. Default = 260.
MPEG Decoding commands
----------------------
#define V4L2_DEC_CMD_START (0)
#define V4L2_DEC_CMD_STOP (1)
#define V4L2_DEC_CMD_PAUSE (2)
#define V4L2_DEC_CMD_RESUME (3)
#define V4L2_DEC_CMD_SPEED (4)
#define V4L2_DEC_CMD_REVERSE_SPEED (5)
#define V4L2_DEC_CMD_STEP (6)
#define V4L2_DEC_CMD_REVERSE_STEP (7)
#define V4L2_DEC_CMD_PASSTHROUGH_START (8)
#define V4L2_DEC_CMD_PASSTHROUGH_STOP (9)
/* Flags for V4L2_DEC_CMD_PAUSE */
#define V4L2_DEC_CMD_PAUSE_TO_BLACK (1 << 0)
/* Flags for V4L2_DEC_CMD_STOP */
#define V4L2_DEC_CMD_STOP_TO_BLACK (1 << 0)
#define V4L2_DEC_CMD_STOP_WAIT_FOR_END (1 << 1)
/* Flags for V4L2_DEC_CMD_SPEED/REVERSE_SPEED */
#define V4L2_DEC_CMD_SPEED_MUTE_AUDIO (1 << 0)
/* Speed input formats: */
/* The decoder has no special format requirements */
#define V4L2_DEC_SPEED_FMT_NONE (0)
/* The decoder requires full GOPs */
#define V4L2_DEC_SPEED_FMT_GOP (1)
struct v4l2_decoder_cmd {
__u32 cmd;
__u32 flags;
union {
struct {
__u64 pts;
} stop;
struct {
v4l2_fract factor;
__u32 format;
} speed;
struct {
__u32 data[16];
} raw;
};
};
#define VIDIOC_DECODER_CMD _IORW('V', 69, struct v4l2_decoder_cmd)
#define VIDIOC_TRY_DECODER_CMD _IORW('V', 69, struct v4l2_decoder_cmd)
Before calling this ioctl the unused fields of v4l2_decoder_cmd must
be zeroed.
'cmd' is set by the user and is the command for the decoder.
The PASSTHROUGH commands are probably fairly specific for the cx23415:
if the passthrough mode is start then the video/audio input is routed
straight to the video/audio output. This is done internally on the
cx23415. While PASSTHROUGH is on, it is still possible to record from
the input at the same time. It's basically live TV functionality. The
other commands are self-explanatory.
'flags' is used by several commands:
PAUSE and STOP can either leave the last frame or clear the output to
black at the end.
STOP can also wait for the command to finish, so the ioctl doesn't
return until the decoder has stopped decoding. Useful for waiting until
all buffers are decoded. -EINTR is returned if a signal interrupted
this ioctl. It is also possible to specify a PTS to stop at. If pts ==
0, then the decoder stops accepting new data immediately.
The SPEED commands can mute the audio.
The speed is set using a fraction where 1 is normal speed. The driver
will map this fraction to the next valid speed that is supported by
hardware.
The format is set to the input requirements of the decoder in order to
handle the given speed. Either there are no requirements, or it
requires that full GOPs are passed to the decoder at a time. That is
for example how reverse playback is implemented: a full Group Of
Pictures is passed to the decoder, followed by the previous GOP, etc.
etc. In the future additional formats might be added, such as I-frames
only.
If you want faster playback than is supported by the hardware, then you
need to do so in software by skipping GOPs.
STEP/REVERSE_STEP will step through the mpeg frame-by-frame.
These ioctl wills check whether the command is supported (-EINVAL is
returned if not) and modify any arguments if needed to make it a valid
call for the available hardware. The modified arguments are returned.
The VIDIOC_TRY_DECODER_CMD is identical to VIDIOC_DECODER_CMD, except
that the TRY ioctl does not actually execute the command.
Note that a write() to a stopped decoder implies a V4L2_DEC_CMD_START. A
close() of a decoder that is currently decoding implies an immediate
V4L2_DEC_CMD_STOP. When the decoder stops accepting data after issuing
a STOP the write() call will return 0 to indicate that the decoder has
stopped and accepts no more data. The next write will start the decoder
again.
MPEG Timing query
-----------------
#define VIDIOC_G_DECODER_TIMING _IOR('V', 70, struct
v4l2_stream_timing)
Return timing information of last playbacked frame
#define VIDIOC_G_DECODER_TIMING_SYNC _IOR('V', 70, struct
v4l2_stream_timing)
Wait for next frame to be displayed and return the timing information of
that frame.
The unit of the PTS is 1/90000 second.
The clock_ref (also known as a SCR for an MPEG Program Stream or PCR for
an MPEG Transport Stream) consists of two parts: bits 9-41 is the
reference base in units of 1/90000 second. Bits 0-8 form the reference
extension with units of 1/27000000 second. The range of the ref.
extension is 0-299. If unknown, then the reference extension must be
set to 0.
These units come from the MPEG standard. Room is reserved in the timing
struct for other timing information should that be required.
Part III: Rationale of Encoder/Decoder Commands
===============================================
Encoder/Decoder commands are simple commands to the encoder or decoder:
start, stop, pause, resume, fast forward, etc. Basically the commands
you have on a DVD/CD player.
Not all hardware supports all actions, so the programmer needs to be
able to query somehow what is supported. Just checking whether e.g.
PAUSE returns -EINVAL is not an option: you need to be able to check
the presence of an action without actually executing it.
Each action can have flags and other arguments. E.g. PAUSE has a flag to
say whether the TV-OUT should go to black or if the last frame should
remain. STOP has an optional PTS to postpone stopping until that pts is
reached. The speed settings for FWD/REW are more complicated since it
depends on the hardware what speeds are supported natively, so you need
to be able to query whether a certain speed is supported and if not,
what the closest matching speed is.
There are several options how to implement this:
1) Each action has its own struct and ioctl. One CAP ioctl returns
a bitmask listing the supported actions.
+ simple
- no way to check which flags/values are supported, esp. for possible
speed settings
- lots of ioctls
- new actions -> new ioctls + struct
2) One action ioctl, receiving a struct containing an action enum
and a union where each action has its own struct. One CAP ioctl
returns a bitmask listing the supported actions.
+ simple
+ easily extendable with new actions (although limited by the CAP
bitmask width, but that's unlikely to be a problem)
- cannot check speed ratio this way
3) One action ioctl as 2) and a corresponding TRY ioctl
+ easily extendable with new actions
+ able to check/modify action arguments.
- the TRY is more complicated
4) One action ioctl as 2) and a CAP ioctl with its own struct
containing an action enum and a union with capability settings
for each type of action. For the speed check it allows you to
specify a speed and it will return the closest supported one.
+ simple
+ easily extendable with new actions
+ speed can be checked
- CAP is contrived. Would need its own union to return action
specific capabilities.
Option 3 has IMHO the best balance between extendability and ease of
use. It also matches existing usage of 'TRY' ioctls.
=============================================================
This concludes this RFC. Comments are welcome!
Regards,
Hans Verkuil
_______________________________________________
ivtv-devel mailing list
[email protected]
http://ivtvdriver.org/mailman/listinfo/ivtv-devel