Hello all,

As discussed at the Asterisk Developer Conference, I am attaching a proposal for IAX video packetization. While video is the primary focus of this document, it can be easily extended to cover other types of large, realtime media.

I am really looking forward to hearing your comments and suggestions.

Cheers,
Mihai


                                                                   Mihai Balea
                                                       <mihai AT hates DOT ms>

                        IAX video packetization 

0. Introduction

This proposal pertains to video packetization over IAX, but it could be easily
expanded to transport other kinds of large, realtime media.  

1. Problem statement

Sending video over IAX frames presents a number of unique issues:
- Frames can be larger than the standard MTU.  For a resolution of 320x240,
key frames are larger than the MTU on a regular basis.  Even regular frames 
(p-frames) exceed this limit at times.  As a result, a video-enabled IAX 
implementation must be able to split a video frame over multiple IAX frames
(called slices).  The receiver must be able to reassemble the original video
frame before passing it to the video decoder.
- Some codecs (H.264) have built in packet loss compensation.  Other codecs 
(Theora) do not have any such mechanism. For such codecs, it is imperative 
that video slices are assembled in the right order and the beginning and the
end of a video frame are properly signaled.
- Some applications switch video sources on the fly (conferencing, video on
hold, etc).  Codecs that do not use a fixed code-book (Theora) need to know 
when this happens in order to use the appropriate code-book.  Even for codecs
that use fixed code-books, when a video source change occurs, it is desirable 
to wait until the next key frame is received before continuing to display video
- Some applications can benefit from knowing the type of frame (keyframe, 
p-frame, etc)

Some of these issues are present when sending other types of media, for 
example images.  A solution should be flexible enough to allow for different
types of media.

For reference, I am including the current structure of a video meta-frame 
(from http://www.ietf.org/internet-drafts/draft-guy-iax-03.txt, section
8.1.3.1):

                     1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|         Meta Indicator      |V|      Source Call Number     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|?|          time-stamp         |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
|                                         Data                  |
:                                                               :
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


2. Proposed Media Frame Structure

We separate the header into a generic section and a codec/media specific
extended section.  All media types/codecs will have the generic section with 
identical semantics.  Each media type/codec can define a specific section with
different fields/semantics. 

Since interoperation with RTP media streams is desirable, we should try to 
adopt a model that follows the RTP standards.  One benefit of this would be
that it allow for simple payload transfers between IAX and RTP streams.


2.1 Generic Section
                     1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|         Meta Indicator      |V|      Source Call Number     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Time-Stamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|                       Source/Stream ID                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|        Sequence Number        | Payload Type  |     Flags     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|                                                               |
:                            Data                               :
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Field description:

- The first 32 bits (F, Meta Indicator, V, Source Call Number) have the same 
semantics as in section 8.1.3.1 of the current IAX2 draft.
- Time Stamp: The peer's full 32 bit time stamp.
- Source/Stream ID: 32 bit stream identifier that has similar semantics as 
the SSRC field in RTP.
- Sequence Number:  16 bit sequence number.  Starts at 0 when a video 
stream is initialized and is incremented for each slice.  Each video stream 
will have its own set of sequence numbers.
- Payload Type: codec/media format.  Negotiated during NEW or RENEW 
transactions.  Should be similar to RTP payload type field.
- Flags: Each codec/media type should define specific flags here.  If there
are no flags, the field should be left empty. 

2.2 Video Specific Extensions

The generic header has enough information to successfully encapsulate many 
video codecs.  Some flags need to be defined to cover fragmentation of video
frames over multiple IAX frames as well as Key frame semantics.  For this 
purpose, the flags field should be defined as follows:

xxxx xKFT

- K : 1 bit: set to 1 if the data in the IAX frame belongs to a video key 
frame, 
0 otherwisew
- FT: 2 bits: 
    - 0: This IAX frame contains an entire video frame
    - 1: This IAX frame contains the first slice in the current video frame
    - 2: This IAX frame contains a slice that is not the first nor the last in
         the current video frame
    - 3: This IAX frame contains the last slice in the current video frame

2.3  Issues still TBD:

- Timestamp conundrum: RTP uses a 90Khz clock and the timestamp is the 
presentation time.  IAX uses ms resolution and the timestamp is transmission
time.  How do we reconcile the two?
- Should we expand the K flag to multiple bits so we can differentiate between 
p-frames and b-frames?

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Reply via email to