Re: [Libav-user] Syncing a seperate audio and video stream

Ludwig Ertl Wed, 21 Mar 2012 09:45:41 -0700

Hi,

Thanks for your reply!


Nicolas George wrote:
> Le primidi 1er germinal, an CCXX, Ludwig Ertl a écrit :
>> I want to use ffmpeg to decode Audio and video streams by a
>> Trendnet TV-IP512P IP Camera in order to feed it to ffserver for
>> streaming. I found a document on the Internet which is describing
...
>> variable or some ugly hack like that), I don't have a clue how to
>> sync them. I suspect that it may have something to do with PTS and
>> DTS timestamps, but I don't know how they are used for audio and
>> video sync in seperate streams.
> The PTS of a frame is the timestamp at which this frame
> should start to be
> played or displayed. That is exactly what you need to sync
> audio and video.
I don't know, however, how they are related to each other in seperate
streams that need to sync.
To be able to test different PTS-Configurations, I updated my sources
(see attachment) so that the algorithm can be switched via the TIMER
#define
I had the idea that I possibly could compensate the PTS offsets by saving
the first Timestamp (for testing just in seconds) when initially reading
the stream and then use it as a starting offset for the sequence number
(of course multiplying it with the sample rate) so that I get this as a
starting time for the stream.
This can be tested by setting the USE_TIMESTAMP #define in the attached
code. My idea was that as there now is a different starting offset in PTS,
the streams would sync, but this doesn't seem to work either.
I provided a sample so that you can do some testing regarding sync
algorithm, if you want (just captured video and audio via wget):
http://www.csp.at/ACS/ACVS.cgi
http://www.csp.at/ACS/ACAS.cgi

Using TIMER_SRT for Audio Decoder and TIMER_SEQ for Video decoder works
fine, I can adjust the offset manually with the -ss parameter if the stream
is directly from the camera (ffmpeg just drops a few video frames in the
beginning), but that's not the goal as I want them to sync automatically.
The provided samples are useful for testing the USE_TIMESTAMP #define:

./ffmpeg -i ACVS.cgi -i ACAS.cgi -v 4 test.mpg

[acsv @ 0x8de5a20] Estimating duration from bitrate, this may be inaccurate
Seems stream 0 codec frame rate differs from container frame rate: 30000.00
(30000/1) -> 30.00 (30/1)
Input #0, acsv, from 'ACVS.cgi':
  Duration: N/A, start: 1201105593.033333, bitrate: N/A
    Stream #0.0: Video: mpeg4, yuv420p, 640x480 [PAR 1:1 DAR 4:3], 30 fps,
30 tbr, 30 tbn, 30k tbc
[acs @ 0x8e43120] Estimating duration from bitrate, this may be inaccurate
Input #1, acs, from 'ACAS.cgi':
  Duration: 00:00:15.16, start: 1201105595.000000, bitrate: 128 kb/s
    Stream #1.0: Audio: pcm_s16le, 8000 Hz, 1 channels, s16, 128 kb/s

As you can see, start has different values now, but this doesn't seem to
have any effect on the synchronisation of both streams either.

> In libavdevice, a lot of demuxers (ALSA, V4L on certain kernels, JACK)
> already return a Unix timestamps at microsecond precision, so
> this is a good
> choice. Also, the other possibilities you tried all return a
> redundant information.
I know, that's where I have the timestamps-Code mentioned below from.

> With the ffmpeg command-line tool, sync issues of that kind
> can be solved
> with the -async option, but I do not find it very elegant. In
> custom code,
> or possibly in a future version of ffmpeg, other algorithms could be
> considered.
I already played with -async, but it didn't have any effect for me.

>
>> What I have already tried was using the clock as PTS for both audio
>> and video:
>>
>> av_set_pts_info(st, 64, 1, 1000000);  /* 64 bits pts in us */
>> pkt->pts = ac->hdr.ulTimeSec * 1000000LL + ac->hdr.ulTimeUSec;
>
> That looks right (except the camelCase), and the best option.
>
>> But this just resulted in a totally garbled video stream.
>
> This is rather strange. Can you show your command line and
> console output in
> that case, and describe what kind of "garbled" you get?
I set the TIMER #define to TIMER_TIM in order to use this method.

../ffmpeg -i ACVS.cgi -i ACAS.cgi -v 4 test.mpg

[acsv @ 0x95cca20] Estimating duration from bitrate, this may be inaccurate
Input #0, acsv, from 'ACVS.cgi':
  Duration: N/A, start: 1201105593.757814, bitrate: N/A
    Stream #0.0: Video: mpeg4, yuv420p, 640x480 [PAR 1:1 DAR 4:3], 30k tbr,
1000k tbn, 30k tbc
[acs @ 0x96105a0] Estimating duration from bitrate, this may be inaccurate
Input #1, acs, from 'ACAS.cgi':
  Duration: 00:00:15.16, start: 1201105595.773438, bitrate: 128 kb/s
    Stream #1.0: Audio: pcm_s16le, 8000 Hz, 1 channels, s16, 128 kb/s
File 'test.mpg' already exists. Overwrite ? [y/N] y
[buffer @ 0x95fad20] w:640 h:480 pixfmt:yuv420p
[NULL @ 0x95ce340] Requested sampling rate unsupported using closest
supported (16000)
[mpeg @ 0x95cda80] VBV buffer size not set, muxing may fail
Output #0, mpeg, to 'test.mpg':
  Metadata:
    encoder         : Lavf53.3.0
    Stream #0.0: Video: mpeg1video, yuv420p, 640x480 [PAR 1:1 DAR 4:3],
q=2-31, 200 kb/s, 90k tbn, 60 tbc
    Stream #0.1: Audio: mp2, 16000 Hz, 1 channels, s16, 64 kb/s
Stream mapping:
  Stream #0.0 -> #0.0
  Stream #1.0 -> #0.1
Press ctrl-c to stop encoding
*** 1 dup!
*** 1 dup!
*** 1 dup!
*** 1 dup!
*** 1 dup!
...

So this results in a lot of dups.. With -vsync 2, there are no more dups,
but the garbled video remains.
The problem doesn't seem to be related to the audio track, as it's also
happening when just decoding the video.


>>  * ACS (Advanced ip-Camera Stream) demuxer
>>  * Copyright (c) 2012 DI(FH) Ludwig Ertl / CSP GmbH
> This looks promising. But I believe you may have much less
> work if you try
> to merge both files in a single demuxer that can automatically detect
> whether it is audio or video.
As the code and structs are different for each demuxer and the streams
are also not in a combined format, I thought that it would be cleaner to
seperate them in 2 files.

Regards,
Ludwig

/*
 * ACS (Advanced ip-Camera Stream) demuxer
 * Copyright (c) 2012 DI(FH) Ludwig Ertl / CSP GmbH
 *
 * This file is part of Libav.
 *
 * Libav is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * Libav is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with Libav; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 */

#include "avformat.h"
#include "avio_internal.h"
#include "pcm.h"
#include "riff.h"

/* Various timing options for PTS, don't know what is best, some of them are 
probably nonsense */
#define TIMER_NON       0       // No timing, let ffmpeg estimate...
#define TIMER_SRT       1       // Timing via sample rate, probably most 
compatible
#define TIMER_SEQ       2       // Timing via SequenceNumber in header
#define TIMER_TIM       3       // Timing via timestamp

#define TIMER   TIMER_SRT

/* Activate this, if you want to start pts at the Unixtime offset from camera 
clock
   Requires TIMER to be TIMER_SRT or TIMER_SEQ
 */
//#define USE_TIMESTAMP

/* The ffmpeg codecs we support, and the IDs they have in the file */
static const AVCodecTag codec_acs_tags[] = {
    { CODEC_ID_ADPCM_MS, 0 },
    { CODEC_ID_PCM_MULAW, 1 },
    { CODEC_ID_PCM_ALAW, 2 },
    { CODEC_ID_ADPCM_IMA_ISS, 4},       // Not quite sure if this maps 
correctly to OSS AFMT_IMA_ADPC
    { CODEC_ID_PCM_U8, 8 },
    { CODEC_ID_PCM_S16LE, 0x10 },
    { CODEC_ID_PCM_S16BE, 0x20 },
    { CODEC_ID_PCM_S8, 0x40 },
    { CODEC_ID_PCM_U16LE, 0x80 },
    { CODEC_ID_PCM_U16BE, 0x100 },
    { CODEC_ID_MP2, 0x200 },
    { CODEC_ID_AC3, 0x400 },
    { CODEC_ID_AMR_NB, 0x800 }          // Not quite sure if it's narrow or 
wide band (=AFMT_AMR)
};


typedef struct _ACS_AudioHeader
{
    unsigned long ulHdrID; //Header ID
    unsigned long ulHdrLength;
    unsigned long ulDataLength;
    unsigned long ulSequenceNumber;
    unsigned long ulTimeSec;
    unsigned long ulTimeUSec;
    unsigned long ulDataCheckSum;
    unsigned short usFormat;
    unsigned short usChannels;
    unsigned short usSampleRate;
    unsigned short usSampleBits;
    unsigned long ulReserved;
} ACS_AudioHeader, *PACS_AudioHeader;

typedef struct {
    ACS_AudioHeader hdr;
#if TIMER==TIMER_SEQ
    unsigned long ulFirstFrame;
#elif TIMER==TIMER_SRT
    int64_t llFrame;
#endif
#ifdef USE_TIMESTAMP
    unsigned long ulStartTime;
#endif
    int bReadHeader;
} ACSContext;


static int acs_probe(AVProbeData *p)
{
    /* check file header */
    if (p->buf[0] == 0x00 && p->buf[1] == 0x00 &&
        p->buf[2] == 0x01 && p->buf[3] == 0xF6)
        return AVPROBE_SCORE_MAX;
    else
        return 0;
}

static int acs_read_block_header(AVFormatContext *ctx, AVIOContext *pb)
{
    ACSContext *ac = ctx->priv_data;

    ac->hdr.ulHdrID = avio_rl32(pb);
    if (ac->hdr.ulHdrID != 0xF6010000)
    {
        av_log(ctx, AV_LOG_ERROR, "Incorrect header: %08lX\n", ac->hdr.ulHdrID);
        return -1;
    }
    ac->hdr.ulHdrLength = avio_rl32(pb); /* header size */
    ac->hdr.ulDataLength = avio_rl32(pb); /* data size */

    ac->hdr.ulSequenceNumber = avio_rl32(pb);
    ac->hdr.ulTimeSec = avio_rl32(pb);
    ac->hdr.ulTimeUSec = avio_rl32(pb);
    ac->hdr.ulDataCheckSum = avio_rl32(pb);

    ac->hdr.usFormat = avio_rl16(pb);
    ac->hdr.usChannels = avio_rl16(pb);
    ac->hdr.usSampleRate = avio_rl16(pb);
    ac->hdr.usSampleBits = avio_rl16(pb);
    ac->hdr.ulReserved = avio_rl32(pb);

#if TIMER==TIMER_SEQ
    if (!ac->ulFirstFrame && ac->hdr.ulSequenceNumber) ac->ulFirstFrame = 
ac->hdr.ulSequenceNumber;
#endif
#ifdef USE_TIMESTAMP
    if (!ac->ulStartTime && ac->hdr.ulTimeSec) ac->ulStartTime = 
ac->hdr.ulTimeSec;
#endif
    return 0;
}

static int acs_read_header(AVFormatContext *s,
                          AVFormatParameters *ap)
{
    AVIOContext *pb = s->pb;
    ACSContext *ac = s->priv_data;
    enum CodecID codec;
    AVStream *st;

    if(acs_read_block_header(s, pb) < 0)
        return -1;
    codec = ff_codec_get_id(codec_acs_tags, ac->hdr.usFormat);

    /* now we are ready: build format streams */
    st = av_new_stream(s, 0);
    if (!st)
        return -1;
    st->codec->codec_type = AVMEDIA_TYPE_AUDIO;
    st->codec->codec_tag = ac->hdr.usFormat;
    st->codec->codec_id = codec;
    st->codec->channels = ac->hdr.usChannels;
    st->codec->sample_rate = ac->hdr.usSampleRate;
    st->codec->bits_per_coded_sample = ac->hdr.usSampleBits;
    st->codec->bit_rate = st->codec->sample_rate * 
st->codec->bits_per_coded_sample * st->codec->channels;
    ac->bReadHeader = 1;

#if TIMER==TIMER_SRT || TIMER==TIMER_NON
    av_set_pts_info(st, 64, 1, ac->hdr.usSampleRate);
#elif TIMER==TIMER_TIM
    av_set_pts_info(st, 64, 1, 1000000);  /* 64 bits pts in us */
#elif TIMER==TIMER_SEQ
    av_set_pts_info(st, 64, 1, 16);     // FIXME: Hardcoded 16, calc it: 
(ac->hdr.usSampleRate * (ac->hdr.usSampleBits >> 3) / ac->hdr.ulDataLength)
#endif
    return 0;
}

static int acs_read_packet(AVFormatContext *s,
                          AVPacket *pkt)
{
    ACSContext *ac = s->priv_data;
    int ret, chunklen =  ac->hdr.ulHdrLength +  ac->hdr.ulDataLength;
    int64_t remain;

    if (chunklen && (remain = (avio_tell(s->pb) % chunklen)))
    {
        // Seek may have occured, so we are not aligned properly.
        // So don't read header, just read remaining packet
        // av_log(s, AV_LOG_ERROR, "within a packet, skipping over %lld 
bytes\n", chunklen - remain);

        ret = av_get_packet(s->pb, pkt, chunklen - remain);
        ac->bReadHeader=0;
    }
    else
    {
        if (ac->bReadHeader) ac->bReadHeader=0;
        else if ((ret = acs_read_block_header (s, s->pb))<0) return ret;
        ret= av_get_packet(s->pb, pkt, ac->hdr.ulDataLength);
    }

    if (ret < 0)
        return ret;
    pkt->stream_index = 0;

#if TIMER!=TIMER_NON
    pkt->pts = 
#if defined(USE_TIMESTAMP) && TIMER!=TIMER_TIM
        (int64_t)ac->ulStartTime * s->streams[0]->time_base.den + 
#endif
#if TIMER==TIMER_SRT
        ac->llFrame;
    ac->llFrame += ac->hdr.ulDataLength / ac->hdr.usChannels / 
(ac->hdr.usSampleBits >> 3);
#elif TIMER==TIMER_TIM
        ac->hdr.ulTimeSec * 1000000LL + ac->hdr.ulTimeUSec;
#elif TIMER==TIMER_SEQ
        ac->hdr.ulSequenceNumber - ac->ulFirstFrame;
#endif
#endif

    /* note: we need to modify the packet size here to handle the last
       packet */
    pkt->size = ret;
    return 0;
}

AVInputFormat ff_acs_demuxer = {
    "acs",
    NULL_IF_CONFIG_SMALL("Advanced ip-Camera Stream(ACS) Audio"),
    sizeof(ACSContext),
    acs_probe,
    acs_read_header,
    acs_read_packet,
    NULL,
    pcm_read_seek,
    .codec_tag= (const AVCodecTag* const []){codec_acs_tags, 0},
};

/*
 * ACS (Advanced ip-Camera Stream) demuxer
 * Copyright (c) 2012 DI(FH) Ludwig Ertl / CSP GmbH
 *
 * This file is part of Libav.
 *
 * Libav is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * Libav is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with Libav; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 */

#include "avformat.h"

#define TIMER_NON       0       // No timing, let ffmpeg estimate...
#define TIMER_SRT       1       // Timing via FPS rate (increment by frame)
#define TIMER_SEQ       2       // Timing via SequenceNumber in header, works 
fine
#define TIMER_TIM       3       // Timing via timestamp, somehow this results 
in garbled stream??

#define TIMER   TIMER_SEQ

/* Activate this, if you want to start pts at the Unixtime offset from camera 
clock
   Requires TIMER to be TIMER_SRT or TIMER_SEQ
 */
//#define USE_TIMESTAMP

typedef struct _ACS_VideoHeader
{
    unsigned long ulHdrID; //Header ID
    unsigned long ulHdrLength;
    unsigned long ulDataLength;
    unsigned long ulSequenceNumber;
    unsigned long ulTimeSec;
    unsigned long ulTimeUSec;
    unsigned long ulDataCheckSum;
    unsigned short usCodingType;
    unsigned short usFrameRate;
    unsigned short usWidth;
    unsigned short usHeight;
    unsigned char ucMDBitmap;
    unsigned char ucMDPowers[3];
} ACS_VideoHeader, *PACS_VideoHeader;

typedef struct {
    ACS_VideoHeader hdr;
#if TIMER==TIMER_SEQ
    unsigned long ulFirstFrame;
#elif TIMER==TIMER_SRT
    int64_t llFrame;
#endif
#ifdef USE_TIMESTAMP
    unsigned long ulStartTime;
#endif
    int bReadHeader;
} ACSVContext;

static int acsv_probe(AVProbeData *p)
{
    /* check file header */
    if (p->buf[0] == 0x00 && p->buf[1] == 0x00 &&
        p->buf[2] == 0x01 && p->buf[3] == 0xF5)
        return AVPROBE_SCORE_MAX;
    else
        return 0;
}

static int acsv_read_block_header(AVFormatContext *ctx, AVIOContext *pb)
{
    ACSVContext *ac = ctx->priv_data;
    unsigned long ulReserved;

    ac->hdr.ulHdrID = avio_rl32(pb);
    if (ac->hdr.ulHdrID != 0xF5010000)
    {
        // May be HTTP/1.1 error, inform user about it, but now we are 
basically doomed...
        if (ac->hdr.ulHdrID == 0x50545448)
        {
            int j = 0;
            char szBuf[1024]={0};

            do
            {
                while ((szBuf[j++] = (char)avio_r8(pb))!='\n' && 
j<sizeof(szBuf)-1);
            } while (j<sizeof(szBuf)-1);
            av_log(ctx, AV_LOG_ERROR, "HTTP header in between 
transaction:\nHTTP%s\n", szBuf);
            return -1;
        }
        av_log(ctx, AV_LOG_ERROR, "Incorrect header: %08lX\n", ac->hdr.ulHdrID);
        return -1;
    }
    ac->hdr.ulHdrLength = avio_rl32(pb); /* header size */
    ac->hdr.ulDataLength = avio_rl32(pb); /* data size */

    ac->hdr.ulSequenceNumber = avio_rl32(pb);
    ac->hdr.ulTimeSec = avio_rl32(pb);
    ac->hdr.ulTimeUSec = avio_rl32(pb);
    ac->hdr.ulDataCheckSum = avio_rl32(pb);

    ac->hdr.usCodingType = avio_rl16(pb);
    ac->hdr.usFrameRate = avio_rl16(pb);
    ac->hdr.usWidth = avio_rl16(pb);
    ac->hdr.usHeight = avio_rl16(pb);
    ulReserved = avio_rl32(pb);
    memcpy (&ac->hdr.ucMDBitmap, (void*)&ulReserved, sizeof(ulReserved));
#if TIMER==TIMER_SEQ
    if (!ac->ulFirstFrame && ac->hdr.ulSequenceNumber) ac->ulFirstFrame = 
ac->hdr.ulSequenceNumber;
#endif
#ifdef USE_TIMESTAMP
    if (!ac->ulStartTime && ac->hdr.ulTimeSec) ac->ulStartTime = 
ac->hdr.ulTimeSec;
#endif
    return 0;
}

static int acsv_read_header(AVFormatContext *s, AVFormatParameters *ap)
{
    AVIOContext *pb = s->pb;
    ACSVContext *ac = s->priv_data;
    AVStream *st;

    if(acsv_read_block_header(s, pb) < 0)
        return -1;

    st = av_new_stream(s, 0);
    if (!st)
        return AVERROR(ENOMEM);

    st->codec->codec_type = AVMEDIA_TYPE_VIDEO;
    st->codec->codec_tag  = ac->hdr.usCodingType;
    st->codec->codec_id   = 
ac->hdr.usCodingType==5?CODEC_ID_MJPEG:CODEC_ID_MPEG4;
    st->codec->width      = ac->hdr.usWidth;
    st->codec->height     = ac->hdr.usHeight;
    st->need_parsing = AVSTREAM_PARSE_FULL;


#if TIMER==TIMER_SRT || TIMER==TIMER_SEQ || TIMER==TIMER_NON
    av_set_pts_info(st, 64, 1, ac->hdr.usFrameRate);    // 30 fps
#elif TIMER==TIMER_TIM
    av_set_pts_info(st, 64, 1, 1000000);  /* 64 bits pts in us */
#endif
    ac->bReadHeader = 1;
    return 0;
}

static int acsv_read_packet(AVFormatContext *s, AVPacket *pkt)
{
    ACSVContext *ac = s->priv_data;
    int ret;

    if (ac->bReadHeader) ac->bReadHeader=0;
    else if ((ret = acsv_read_block_header (s, s->pb))<0) return ret;
    ret= av_get_packet(s->pb, pkt, ac->hdr.ulDataLength);
    if (ret < 0)
        return ret;
    pkt->stream_index = 0;

    if (ac->hdr.ulTimeSec)
    {
#if TIMER!=TIMER_NON
        pkt->pts = 
#if defined(USE_TIMESTAMP) && TIMER!=TIMER_TIM
            (int64_t)ac->ulStartTime * (int64_t)ac->hdr.usFrameRate + 
#endif
#if TIMER==TIMER_SRT
            ac->llFrame;
        ac->llFrame++;
#elif TIMER==TIMER_TIM
            ac->hdr.ulTimeSec * 1000000LL + ac->hdr.ulTimeUSec;
#elif TIMER==TIMER_SEQ
            ac->hdr.ulSequenceNumber - ac->ulFirstFrame;
#endif
#endif
    }

    //pkt->pos-=16;
    return ret;
}

AVInputFormat ff_acsv_demuxer = {
    "acsv",
    NULL_IF_CONFIG_SMALL("Advanced ip-Camera Stream(ACS) Video"),
    sizeof(ACSVContext),
    acsv_probe,
    acsv_read_header,
    acsv_read_packet,
    .flags= AVFMT_GENERIC_INDEX,
    .value = CODEC_ID_MPEG4,
};

_______________________________________________
Libav-user mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/libav-user

Re: [Libav-user] Syncing a seperate audio and video stream

Reply via email to