Yes, if the function was to be somewhat more generic, handling various ways 
to declare encoding in (X)HTML would be needed. I have just rewritten what 
was originally a workaround for HTTParser being unable to deal with 
non-ASCII content. get_xml_encoding function seems to be only used on files 
generated by asciidoc and defaults to utf-8 which seemed like a safe enough 
way to do it. But since we are patching/cleaning up I don't mind making it 
a bit more generic. 

According 
to http://www.w3.org/International/questions/qa-html-encoding-declarations 
I should care about following declarations:

   - <?xml version="1.0" encoding="XXX"?> (already done)
   - <meta charset="XXX">
   - <meta http-equiv="Content-type" content="text/html;charset=XXX">

Finding either of those declarations will be good enough. I'm not going to 
assume contradictory declarations either. Anything I am missing? 


On Wednesday, June 5, 2013 1:52:43 AM UTC+2, Lex Trotman wrote:
>
> Hi,
>
> You should look for the HTML charset= as well as the XML encoding markup. 
>  HTML doesn't have to have the xml one, and in fact asciidoc generated html 
> does not.
>
> Cheers
> Lex
>
>
>
>
> On 4 June 2013 21:04, Stanislav Ochotnický 
> <[email protected]<javascript:>
> > wrote:
>
>> When handling epub manifest with UTF-8 characters, a2x would crash with
>> UnicodeEncodeError. This is because a2x would try to read the manifest 
>> with
>> default encoding (usually ASCII) and fail on unicode characters.
>>
>> This patch changes behaviour so that during reading/writing we work with 
>> encodings
>> and produce UTF-8 encoded files by default. When handling HTML we first 
>> look at
>> encoding specified and decode contents before always passing unicode to
>> HTMLParser.
>>
>>
>> For reproducer see: https://bugzilla.redhat.com/show_bug.cgi?id=968308
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "asciidoc" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/asciidoc?hl=en.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"asciidoc" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/asciidoc?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to