Bugs item #21658, was opened at 2008-08-24 11:01
You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21658&group_id=494

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 3
Submitted By: Nobody (None)
>Assigned to: Charlie Savage (cfis)
Summary: failure to parse and obey encoding when creating document

Initial Comment:
The following appeared on comp.ruby.lang:

===== quoted material follows

I have an XML request,
using the following code as an example:

require "rubygems"
require "xml/libxml"

movie = "sin+city"
search_url = 'http://www.movie-xml.com/interfaces/getmovie.php?moviename='
url = search_url+movie
doc = XML::Document.file(url)

Here's the response I get:

Input is not proper UTF-8, indicate encoding !

The source XML has an encoding declared as such:

<?xml version="1.0" encoding="ISO-8859-1"?>

===== end quoted material

Tested and confirmed, plus I tried the same operation with REXML and there was 
no problem. It looks like we are not examining the encoding attribute up front 
and obeying it when parsing the body of the doc.

----------------------------------------------------------------------

>Comment By: Charlie Savage (cfis)
Date: 2008-11-24 12:12

Message:
No response - closing the issue.

----------------------------------------------------------------------

Comment By: Charlie Savage (cfis)
Date: 2008-11-15 17:47

Message:
This url is no longer valid.  Do you have another test case?

----------------------------------------------------------------------

Comment By: Eric Ivancich (ivancich)
Date: 2008-08-24 17:29

Message:
Twice in the XML data retrieved from the URL generated in the detailed 
description, the word "verg?enza" appears, where the "?" has hex code 0xFC that 
encodes a lower case "u" with umlaut in ISO-8859-1.  0xFC cannot appear in 
UTF-8 data due to RFC-3629.

So that adds further evidence that it's trying to parse the file as UTF-8 
rather than ISO-8859-1.

----------------------------------------------------------------------

Comment By: Erik Hollensbe (erikh)
Date: 2008-08-24 14:24

Message:
>From this thread on ruby-talk: http://www.ruby-forum.com/topic/163524

----------------------------------------------------------------------

You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21658&group_id=494
_______________________________________________
libxml-devel mailing list
libxml-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/libxml-devel

Reply via email to