This was intended for ml...

---------- Forwarded message ----------
From: Dan Dennedy <d...@dennedy.org>
Date: Sun, Jan 22, 2012 at 3:25 PM
Subject: Re: [Mlt-devel] Xml output is currenty broken
To: Brian Matherly <pez4br...@yahoo.com>


On Sun, Jan 22, 2012 at 2:22 PM, Dan Dennedy <d...@dennedy.org> wrote:
> On Sun, Jan 22, 2012 at 1:48 PM, Dan Dennedy <d...@dennedy.org> wrote:
>> On Sun, Jan 22, 2012 at 1:36 PM, Brian Matherly <pez4br...@yahoo.com> wrote:
>>> JB,
>>>
>>>
>>>
>>>> Creating xml files from the avformat producer is currently broken with
>>>> FFmpeg's recent git. The problem comes from avformat's
>>>> "handler_name"
>>>> metadata, whose value contains invalid characters.
>>>>
>>>> The problem only appears when creating xml files, not when outputting to
>>>> stdout.
>>>>
>>>> For example:
>>>>
>>>> melt test.mov -consumer xml
>>>
>>> You might have to provide your "test.mov" file. I tried to recreate the 
>>> problem with some of my media files, but none of them have the 
>>> "handler_name" metadata. And if I could find one with "handler_name", I 
>>> don't know that it would necessarily have an invalid character. I don't 
>>> know if Dan has a file lying around that recreates the problem. I guess you 
>>> could wait for him to check before you upload something.
>>>
>>> I'm guessing that the best solution might be to fix the xml producer to 
>>> handle the xml with the invalid character. In that case, we might be able 
>>> to make some progress if you provide your "broken" XML file.
>>>
>>
>> I have some files that reproduce the problem. I have a flagged email
>> to followup on a bug report related to xml char encoding. Currently,
>> the xml consumer assumes the string data you provide it is already
>> UTF-8 encoded, but some people are providing, for example, KOI8 file
>> names in Russian locales. I can get the environment's locale info to
>> determine that. Now, I need to figure out what the character encoding
>> rules or API are for libav and its version history.
>
> OK, av_dict (and its predecessor av_metadata from avformat.h) has
> neither rules or API for character encoding. Demuxers quite often
> simply pass up whatever appears in the file. And the offending
> character in this example is a 0x0b vertical tab. So, I think we need
> some string filter function. I am looking around for a good practice
> regarding string filtering. Then, the next question is whether to
> filter the output of av_dict or filter the input to xmlNewTextChild()
> and xmlNewProp().
>

We may need to add a UTF-8 filter in some places, and we can use iconv
for that. However, XML has a more restricted set of characters:

Char       ::=      #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
| [#x10000-#x10FFFF]  /* any Unicode character, excluding the
surrogate blocks, FFFE, and FFFF. */

So, we need something specific to XML here instead of a UTF-8 filter
on the av_dict output. I put in a quick fix. I will come back to the
wchar solution soon.

I thought of a new policy to add to docs/policies.txt and somewhere in
doxygen comments:

The standard for strings in MLT is UTF-8. Applications must provide
valid UTF-8. That means, melt would be responsible for converting from
environment locale's encoding to UTF-8. For dependent libraries, if
their API or documentation discloses character the encoding we need to
convert it to UTF-8 (and filtered by icon along the way), and if it
unknown (e.g. av_dict), then we should assume UTF-8 and filter it.
Comments accepted.

--
+-DRD-+


-- 
+-DRD-+

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Mlt-devel mailing list
Mlt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlt-devel

Reply via email to