Yes, if the function was to be somewhat more generic, handling various ways to declare encoding in (X)HTML would be needed. I have just rewritten what was originally a workaround for HTTParser being unable to deal with non-ASCII content. get_xml_encoding function seems to be only used on files generated by asciidoc and defaults to utf-8 which seemed like a safe enough way to do it. But since we are patching/cleaning up I don't mind making it a bit more generic.
According to http://www.w3.org/International/questions/qa-html-encoding-declarations I should care about following declarations: - <?xml version="1.0" encoding="XXX"?> (already done) - <meta charset="XXX"> - <meta http-equiv="Content-type" content="text/html;charset=XXX"> Finding either of those declarations will be good enough. I'm not going to assume contradictory declarations either. Anything I am missing? On Wednesday, June 5, 2013 1:52:43 AM UTC+2, Lex Trotman wrote: > > Hi, > > You should look for the HTML charset= as well as the XML encoding markup. > HTML doesn't have to have the xml one, and in fact asciidoc generated html > does not. > > Cheers > Lex > > > > > On 4 June 2013 21:04, Stanislav Ochotnický > <[email protected]<javascript:> > > wrote: > >> When handling epub manifest with UTF-8 characters, a2x would crash with >> UnicodeEncodeError. This is because a2x would try to read the manifest >> with >> default encoding (usually ASCII) and fail on unicode characters. >> >> This patch changes behaviour so that during reading/writing we work with >> encodings >> and produce UTF-8 encoded files by default. When handling HTML we first >> look at >> encoding specified and decode contents before always passing unicode to >> HTMLParser. >> >> >> For reproducer see: https://bugzilla.redhat.com/show_bug.cgi?id=968308 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "asciidoc" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected]<javascript:> >> . >> Visit this group at http://groups.google.com/group/asciidoc?hl=en. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- You received this message because you are subscribed to the Google Groups "asciidoc" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/asciidoc?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
