On Sat, Aug 08, 2015 at 09:24:03PM -0700, Yayoi Ukai wrote: > On Sat, Aug 8, 2015 at 8:23 AM, Clément Bœsch <u...@pkh.me> wrote: > > On Fri, Aug 07, 2015 at 11:03:29PM -0700, Yayoi wrote: > >> --- > >> libavcodec/samidec.c | 59 > >> +++++++++++++++++++++++++++++-------------------- > >> tests/ref/fate/sub-sami | 18 +++++++-------- > >> 2 files changed, 44 insertions(+), 33 deletions(-) > >> > >> diff --git a/libavcodec/samidec.c b/libavcodec/samidec.c > >> index 47850e2..41826a7 100644 > >> --- a/libavcodec/samidec.c > >> +++ b/libavcodec/samidec.c > >> @@ -27,6 +27,7 @@ > >> #include "ass.h" > >> #include "libavutil/avstring.h" > >> #include "libavutil/bprint.h" > >> +#include "htmlsubtitles.h" > >> > >> typedef struct { > >> AVBPrint source; > >> @@ -41,11 +42,12 @@ static int sami_paragraph_to_ass(AVCodecContext > >> *avctx, const char *src) > >> char *tag = NULL; > >> char *dupsrc = av_strdup(src); > >> char *p = dupsrc; > >> + char *pcopy = NULL; > >> + int index = 0; > >> + int second_paragraph = 0; > >> > >> - av_bprint_clear(&sami->content); > >> for (;;) { > >> char *saveptr = NULL; > >> - int prev_chr_is_space = 0; > >> AVBPrint *dst = &sami->content; > >> > >> /* parse & extract paragraph tag */ > >> @@ -77,37 +79,46 @@ static int sami_paragraph_to_ass(AVCodecContext > >> *avctx, const char *src) > >> goto end; > >> } > >> > > > >> - /* extract the text, stripping most of the tags */ > >> + /* check for the second paragrph */ > > > > Why change the comment? What does "check" mean here? What is the "second > > paragraph"? > > > I answer it below with the other questions you have because you are > basically asking the same things. > And I will document better too. > > > > >> + pcopy = av_strdup(p); > >> while (*p) { > >> if (*p == '<') { > >> - if (!av_strncasecmp(p, "<P", 2) && (p[2] == '>' || > >> av_isspace(p[2]))) > >> + if (!av_strncasecmp(p, "<p", 2) && (p[2] == '>' || > >> av_isspace(p[2]))) { > >> + second_paragraph = 1; > >> break; > >> - if (!av_strncasecmp(p, "<BR", 3)) > >> - av_bprintf(dst, "\\N"); > >> - p++; > >> - while (*p && *p != '>') > >> - p++; > >> - if (!*p) > >> - break; > >> - if (*p == '>') > >> - p++; > >> - continue; > >> + } > >> } > >> - if (!av_isspace(*p)) > >> - av_bprint_chars(dst, *p, 1); > >> - else if (!prev_chr_is_space) > >> - av_bprint_chars(dst, ' ', 1); > >> - prev_chr_is_space = av_isspace(*p); > >> p++; > >> + index++; > >> + } > >> + p = p - index; > >> + if (second_paragraph) { > >> + p[index] = 0; > >> } > >> - } > >> > >> - av_bprint_clear(&sami->full); > >> - if (sami->source.len) > >> - av_bprintf(&sami->full, "{\\i1}%s{\\i0}\\N", sami->source.str); > >> - av_bprintf(&sami->full, "%s", sami->content.str); > >> + ff_htmlmarkup_to_ass(avctx, dst, p); > >> + > >> + /* add the source if there are any. */ > >> + av_bprint_clear(&sami->full); > >> + if (sami->source.len) { > >> + av_bprintf(&sami->full, "{\\i1}%s{\\i0}\\N", > >> sami->source.str); > >> + av_bprintf(&sami->full, "%s", sami->content.str); > >> + if (second_paragraph) { > >> + second_paragraph = 0; > >> + p = pcopy; > >> + p += index; > >> + index = 0; > >> + continue; > >> + } > >> + } else { > >> + av_bprintf(&sami->full, "%s", sami->content.str); > >> + } > >> + av_bprint_clear(&sami->content); > >> + > >> + } > >> > > > > This looks clumsy at best: you are finalizing the subtitle event inside > > the paragraph loop when it should be outside. It also seems there is a > > duplicating "second paragraph" logic even though the loop is supposed to > > handle one paragraph at a time. > > I know. It is clumsy.. I explain it below as well. > > > > > If you are uncomfortable with the current logic or believe it's badly > > designed for the logic you're trying to mangle, feel free to rewrite it; > > it's better than trying to inhibit the old behaviour with hacks. > > It's not that your code was badly designed to begin with. I feel like > it is the nature > of this format. > > So let me explain what was going on your old code and what I had changed. > Here is the example: (This one is from current fate sample.) > > <SYNC Start=73000> > <P Class=ENUSCC ID=Source>End of: > <P Class=ENUSCC>President John F. Kennedy Speech > > So this is one subtitle event and there are two paragraphs. One > paragraph has a new source and > the other paragraph has its content. > Since it is decided that this code uses html > parser(ff_htmlmarkup_to_ass) and use it to parse the contents of the > paragraph. Each paragraph content needs to go to to html parser, > regardless whether it is source or subtitle content. > > So I need to tell html_parser, ff_htmlmarkup_to_ass that where the > destination is.. (whether it is source or content). > I can't no longer specify the destination which I was able to do in > your old code. I mean maybe I can change the function header and add > additional flag to tell our parser where to put in the dst but I don't > want to because it may require to change srtdec and I do not want to > do that.. > > Anyways, so that's the basic idea. I honestly think it is good that > subtitle event handle is also in loop because one loop iteration per > one subtitle event and subtitle event does not guarantee that it > contains only one paragraph to begin with. It was a possibility either > adding additional loop or having goto to honor your original logic but > after I noticed that subtitle event contains more than one paragraph, > I gave up on the idea to keep your logic. The comment says "check > second paragraph" but it should be able to handle multiple paragraphs > more than 2 in one subtitle event. (I definitely tested while I was > working on this part.. I can modify sample or attach test file if that > makes you feel happier. And I make a better comment then.) > > So I want to leave it as it is if you do not see major flaws or you do > not have very very strong opinion about how it needs to be modified. > Well, given the information, if you have some ideas, you can tell me > too. Besides, I can't cross out the possibility if you or me or > someone wants to add more enhancement (maybe one day you decided to > add support for img tag? who knows..) , and it may need to change code > structure again. And as it is now, it is still enhancement that works. > ;-). >
I think it's breakable one way or another without that much effort. Here is a suggestion based on the original code: - make sure you have 5 bprint buffers: source_sami, content_sami, source_encoded, content_encoded, full (you already have 3 of them in the context) - remove the current markup processing from the loop and make sure the loop just adds the content of each paragraph where it belongs (that is, in either source_sami or content_sami). The code already does that, so there is probably no change to do (code already concats in each bprint buffer if more than 2 paragraphs happen). - after the loop ends, you now have the SAMI data of your packet split into content_sami and eventually in source_sami, without the SAMI crap (such as <P ...>). So what you need to do is just call the markup on content_sami (and eventually source_sami if there is data) to create content_encoded (and eventually source_encoded). - the existing code will then finish things up by concatenating source & content as it is already doing This is probably way more solid than trying to interlace another partial sub-loop in a loop of the same purpose like you proposed. If you do not want to do that, I will have to push your version and do the above or similar before pushing the whole set. [...] -- Clément B.
pgpjbmfOb3VV5.pgp
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel