On Tue, Apr 19, 2016 at 10:04 AM, Aman Gupta <a...@tmm1.net> wrote: > This is a tricky one.. I tried your sample in VLC, and it has the same > issue as the latest version of ffmpeg. > > The previous behavior of ffmpeg can be restored with the following patch: > > diff --git a/libavcodec/ccaption_dec.c b/libavcodec/ccaption_dec.c > index 3b15149..9eff843 100644 > --- a/libavcodec/ccaption_dec.c > +++ b/libavcodec/ccaption_dec.c > @@ -712,7 +712,6 @@ static void process_cc608(CCaptionSubContext *ctx, > int64_t pts, uint8_t hi, uint > } else if (hi >= 0x20) { > /* Standard characters (always in pairs) */ > handle_char(ctx, hi, lo, pts); > - ctx->prev_cmd[0] = ctx->prev_cmd[1] = 0; > } else { > /* Ignoring all other non data code */ > ff_dlog(ctx, "Unknown command 0x%hhx 0x%hhx\n", hi, lo); >
Thanks for looking into it. Do you think it will make sense to have a flag to turn on/off such feature? If so, i'm wiling to do the patch. > > However, I've encountered a number of samples where the same character is > legitimately repeated and is not supposed to be skipped. > > For instance, the samples on > http://hackipedia.org/ATSC/EIA-608%20samples/EIA-608%20character%20set%20test/ > (https://www.youtube.com/watch?v=8TZLxPdC3hk) repeat the character "." > multiple times to show correct spacing. > > Similarly, there are an endless number of words in the english language > with repeated characters, such as "ss" in endless and "rr" in correct. > > It is unclear to me how the decoder is supposed to distinguish between > characters that are meant to be displayed twice, vs streams that repeat > every ascii character unconditionally. > > Further, in your sample it appears that every command is repeated not > twice (as is common in many streams for special character sets and other > command, see > http://hackipedia.org/ATSC/EIA-608%20samples/EIA-608%20character%20set%20test/README.TXT), > but three times. > > Aman > > On Mon, Apr 18, 2016 at 1:01 PM, Aman Gupta <a...@tmm1.net> wrote: > >> Please send me the sample and I will try to fix the issue. >> >> Aman >> >> On Mon, Apr 18, 2016 at 1:22 PM Thierry Foucu <tfo...@gmail.com> wrote: >> >>> Hi all >>> >>> On Sun, Feb 14, 2016 at 6:11 PM, Aman Gupta <ffm...@tmm1.net> wrote: >>> >>>> From: Aman Gupta <a...@tmm1.net> >>>> >>>> control codes in a cc stream can be repeated, and must be ignored. >>>> however, repeated characters must not be ignored. the code attempted to >>>> wipe prev_cmd in handle_char to allow repeated characters to be >>>> processed, but prev_cmd would previously get reset _after_ handle_char() >>>> >>>> i also moved the prev_cmd reset out from handle_char() so it can be >>>> re-used for special character sets, which _must_ be ignored when >>>> repeated. >>>> --- >>>> libavcodec/ccaption_dec.c | 19 ++++++++++--------- >>>> 1 file changed, 10 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/libavcodec/ccaption_dec.c b/libavcodec/ccaption_dec.c >>>> index 790f071..5fb2ec6 100644 >>>> --- a/libavcodec/ccaption_dec.c >>>> +++ b/libavcodec/ccaption_dec.c >>>> @@ -484,9 +484,6 @@ static void handle_char(CCaptionSubContext *ctx, >>>> char hi, char lo, int64_t pts) >>>> if (ctx->mode != CCMODE_POPON) >>>> ctx->screen_touched = 1; >>>> >>>> - /* reset prev command since character can repeat */ >>>> - ctx->prev_cmd[0] = 0; >>>> - ctx->prev_cmd[1] = 0; >>>> if (lo) >>>> ff_dlog(ctx, "(%c,%c)\n", hi, lo); >>>> else >>>> @@ -497,8 +494,15 @@ static void process_cc608(CCaptionSubContext *ctx, >>>> int64_t pts, uint8_t hi, uint >>>> { >>>> if (hi == ctx->prev_cmd[0] && lo == ctx->prev_cmd[1]) { >>>> /* ignore redundant command */ >>>> - } else if ( (hi == 0x10 && (lo >= 0x40 && lo <= 0x5f)) || >>>> - ( (hi >= 0x11 && hi <= 0x17) && (lo >= 0x40 && lo <= >>>> 0x7f) ) ) { >>>> + return; >>>> + } >>>> + >>>> + /* set prev command */ >>>> + ctx->prev_cmd[0] = hi; >>>> + ctx->prev_cmd[1] = lo; >>>> + >>>> + if ( (hi == 0x10 && (lo >= 0x40 && lo <= 0x5f)) || >>>> + ( (hi >= 0x11 && hi <= 0x17) && (lo >= 0x40 && lo <= 0x7f) ) ) { >>>> handle_pac(ctx, hi, lo); >>>> } else if ( ( hi == 0x11 && lo >= 0x20 && lo <= 0x2f ) || >>>> ( hi == 0x17 && lo >= 0x2e && lo <= 0x2f) ) { >>>> @@ -559,14 +563,11 @@ static void process_cc608(CCaptionSubContext >>>> *ctx, int64_t pts, uint8_t hi, uint >>>> } else if (hi >= 0x20) { >>>> /* Standard characters (always in pairs) */ >>>> handle_char(ctx, hi, lo, pts); >>>> + ctx->prev_cmd[0] = ctx->prev_cmd[1] = 0; >>>> } else { >>>> /* Ignoring all other non data code */ >>>> ff_dlog(ctx, "Unknown command 0x%hhx 0x%hhx\n", hi, lo); >>>> } >>>> - >>>> - /* set prev command */ >>>> - ctx->prev_cmd[0] = hi; >>>> - ctx->prev_cmd[1] = lo; >>>> } >>>> >>>> static int decode(AVCodecContext *avctx, void *data, int *got_sub, >>>> AVPacket *avpkt) >>>> -- >>>> 2.5.3 >>>> >>>> >>> This commit seems to break some US broadcast CC decoding. (I can provide >>> a 15MB sample file if needed) >>> >>> Before this commit: >>> ffmpeg -f lavfi -i "movie=IhedxzUUxNo.ts[out0+subcc]" -map s "ts.srt" >>> cat ts.srt >>> 1 >>> 00:00:01,035 --> 00:00:02,035 >>> <font face="Monospace">FAR-RANGING IMPACT PLACES IN THE</font> >>> >>> 2 >>> 00:00:02,036 --> 00:00:04,466 >>> <font face="Monospace">FAR-RANGING IMPACT PLACES IN THE >>> CHURCH AND AT THE LEVEL OF</font> >>> >>> 3 >>> 00:00:04,471 --> 00:00:06,811 >>> <font face="Monospace">CHURCH AND AT THE LEVEL OF >>> POLICY ALL ACROSS THE GLOBE.</font> >>> >>> 4 >>> 00:00:06,807 --> 00:00:08,537 >>> <font face="Monospace">POLICY ALL ACROSS THE GLOBE. >>> CERTAINLY IN THE AREA OF</font> >>> >>> 5 >>> 00:00:08,542 --> 00:00:10,382 >>> <font face="Monospace">CERTAINLY IN THE AREA OF >>> PASTORAL CARE, I HOPE THAT IT</font> >>> >>> 6 >>> 00:00:10,377 --> 00:00:14,677 >>> <font face="Monospace">PASTORAL CARE, I HOPE THAT IT >>> WILL LEAD TO LESS DOGMATICTI</font> >>> >>> 7 >>> 00:00:14,682 --> 00:00:16,352 >>> <font face="Monospace">WILL LEAD TO LESS DOGMATICTI >>> INTERACTION WITH PEOPLE ACROSS A</font> >>> >>> 8 >>> 00:00:16,350 --> 00:00:16,920 >>> <font face="Monospace">INTERACTION WITH PEOPLE ACROSS A >>> THE BOARD.</font> >>> >>> 9 >>> 00:00:16,917 --> 00:00:18,287 >>> <font face="Monospace">THE BOARD. >>> I HOPE THAT, YOU KNOW, THERE'S</font> >>> >>> 10 >>> 00:00:18,285 --> 00:00:20,785 >>> <font face="Monospace">I HOPE THAT, YOU KNOW, THERE'S >>> MORE OF A SENSE THAT THE CHURCH</font> >>> >>> >>> >>> After that commit, >>> cat ts.srt >>> 1 >>> 00:00:01,035 --> 00:00:02,035 >>> <font face="Monospace">FFFARARAR-R-R-RANANANGIGIGINGNGN</font> >>> >>> 2 >>> 00:00:02,036 --> 00:00:04,466 >>> <font face="Monospace">FFFARARAR-R-R-RANANANGIGIGINGNGN >>> CCCHUHUHURCRCRCHHH A A ANNND D D</font> >>> >>> 3 >>> 00:00:04,471 --> 00:00:06,811 >>> <font face="Monospace">CCCHUHUHURCRCRCHHH A A ANNND D D >>> POPOPOLILILICYCYCY A A ALLLL L L</font> >>> >>> 4 >>> 00:00:06,807 --> 00:00:08,537 >>> <font face="Monospace">POPOPOLILILICYCYCY A A ALLLL L L >>> CCCERERERTATATAINININLYLYLY I I </font> >>> >>> 5 >>> 00:00:08,542 --> 00:00:10,382 >>> <font face="Monospace">CCCERERERTATATAINININLYLYLY I I >>> PAPAPASSSTOTOTORARARALLL C C CAA</font> >>> >>> 6 >>> 00:00:10,377 --> 00:00:14,677 >>> <font face="Monospace">PAPAPASSSTOTOTORARARALLL C C CAA >>> WIWIWILLLLLL LELELEADADAD TO</font> >>> >>> 7 >>> 00:00:14,682 --> 00:00:16,352 >>> <font face="Monospace">WIWIWILLLLLL LELELEADADAD TO >>> INININTETETERARARACTCTCTIOIOION </font> >>> >>> 8 >>> 00:00:16,350 --> 00:00:16,920 >>> <font face="Monospace">INININTETETERARARACTCTCTIOIOION >>> THTHTHE E E BOBOBOAAARDRDRD...</font> >>> >>> 9 >>> 00:00:16,917 --> 00:00:18,287 >>> <font face="Monospace">THTHTHE E E BOBOBOAAARDRDRD... >>> I I I HHHOPOPOPEEE T T THHHATATA</font> >>> >>> 10 >>> 00:00:18,285 --> 00:00:20,785 >>> <font face="Monospace">I I I HHHOPOPOPEEE T T THHHATATA >>> MOMOMORRRE E E OFOFOF A A A SE</font> >>> >>> >>> >>>> _______________________________________________ >>>> ffmpeg-devel mailing list >>>> ffmpeg-devel@ffmpeg.org >>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >>>> >>> >>> > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel