This was intended for ml... ---------- Forwarded message ---------- From: Dan Dennedy <d...@dennedy.org> Date: Sun, Jan 22, 2012 at 3:25 PM Subject: Re: [Mlt-devel] Xml output is currenty broken To: Brian Matherly <pez4br...@yahoo.com>
On Sun, Jan 22, 2012 at 2:22 PM, Dan Dennedy <d...@dennedy.org> wrote: > On Sun, Jan 22, 2012 at 1:48 PM, Dan Dennedy <d...@dennedy.org> wrote: >> On Sun, Jan 22, 2012 at 1:36 PM, Brian Matherly <pez4br...@yahoo.com> wrote: >>> JB, >>> >>> >>> >>>> Creating xml files from the avformat producer is currently broken with >>>> FFmpeg's recent git. The problem comes from avformat's >>>> "handler_name" >>>> metadata, whose value contains invalid characters. >>>> >>>> The problem only appears when creating xml files, not when outputting to >>>> stdout. >>>> >>>> For example: >>>> >>>> melt test.mov -consumer xml >>> >>> You might have to provide your "test.mov" file. I tried to recreate the >>> problem with some of my media files, but none of them have the >>> "handler_name" metadata. And if I could find one with "handler_name", I >>> don't know that it would necessarily have an invalid character. I don't >>> know if Dan has a file lying around that recreates the problem. I guess you >>> could wait for him to check before you upload something. >>> >>> I'm guessing that the best solution might be to fix the xml producer to >>> handle the xml with the invalid character. In that case, we might be able >>> to make some progress if you provide your "broken" XML file. >>> >> >> I have some files that reproduce the problem. I have a flagged email >> to followup on a bug report related to xml char encoding. Currently, >> the xml consumer assumes the string data you provide it is already >> UTF-8 encoded, but some people are providing, for example, KOI8 file >> names in Russian locales. I can get the environment's locale info to >> determine that. Now, I need to figure out what the character encoding >> rules or API are for libav and its version history. > > OK, av_dict (and its predecessor av_metadata from avformat.h) has > neither rules or API for character encoding. Demuxers quite often > simply pass up whatever appears in the file. And the offending > character in this example is a 0x0b vertical tab. So, I think we need > some string filter function. I am looking around for a good practice > regarding string filtering. Then, the next question is whether to > filter the output of av_dict or filter the input to xmlNewTextChild() > and xmlNewProp(). > We may need to add a UTF-8 filter in some places, and we can use iconv for that. However, XML has a more restricted set of characters: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ So, we need something specific to XML here instead of a UTF-8 filter on the av_dict output. I put in a quick fix. I will come back to the wchar solution soon. I thought of a new policy to add to docs/policies.txt and somewhere in doxygen comments: The standard for strings in MLT is UTF-8. Applications must provide valid UTF-8. That means, melt would be responsible for converting from environment locale's encoding to UTF-8. For dependent libraries, if their API or documentation discloses character the encoding we need to convert it to UTF-8 (and filtered by icon along the way), and if it unknown (e.g. av_dict), then we should assume UTF-8 and filter it. Comments accepted. -- +-DRD-+ -- +-DRD-+ ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Mlt-devel mailing list Mlt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mlt-devel