I'm trying to retrieve an RSS feed using CFHTTP. The problem is that
the feed uses an extended character set (it's a French feed) and the
extended characters aren't being returned properly in the
cfhttp.fileContent variable unless the charset is specified as
"iso-8859-1". This is the character set specified within the feed
XML, but it's not specified in the response header. The response
charset is an empty string.
Even specifying UTF-8 as the charset (I know it's the default, but it
was worth trying explicitly) does not return the characters properly.
My code:
<cfhttp url="#form.feedURL#"
method="GET"
throwonerror="yes"
[charset="(utf-8|iso-8859-1)"]
></cfhttp>
To explain the notation in the code above, I've tried leaving out the
charset attribute as well as explicitly setting it to utf-8 and
iso-8859-1.
The feed I'm trying to retrieve is
http://www.lemonde.fr/rss/sequence/0,2-3208,1-0,0.xml. I'd really
prefer to use UTF-8 as the charset because it gives me the most
flexibility. What I'm wondering is:
1. Why doesn't UTF-8 return the characters properly? I thought that,
for most content, UTF-8 would handle the vast majority of characters -
certainly the French language's accented "e", etc.
2. Do I have any options for returning these characters properly and,
if any, what are they?
I'm familiar with character encoding, but hardly an expert. Any
guidance would be appreciated.
Thanks.
--
Rob Wilkerson
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Message: http://www.houseoffusion.com/lists.cfm/link=i:4:240755
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe:
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54