Yep, cheers Jeremy, you're absolutely correct.

The result is in UTF-8 and I can see where I was going wrong.

To readers of this thread:

Google does return results in UTF-8 and please don't let my original
post confuse you!

I hope this clears it up a bit:

"%C2%BB" does in fact decode to the single multi-byte UTF-8 character
"»". I was making the mistake of not realising the "»" as a multibyte
character (hence the "»").

Apologies!
Tristen

On Aug 22, 9:08 pm, Jeremy Geerdes <[email protected]> wrote:
> Everything should be in UTF-8.
>
> Jeremy R. Geerdes
> Effective website design & development
> Des Moines, IA
>
> For more information or a project quote:http://jgeerdes.home.mchsi.com
> [email protected]
>
> If you're in the Des Moines, IA, area, check out Debra Heights Wesleyan 
> Church!
>
> On Aug 21, 2010, at 8:33 PM, tristen wrote:
>
>
>
> > I've noticed some of the results contain funny characters which seem
> > to be related to encoding.
>
> > For instance, to search from the google search page:
>
> >http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=ajaxian+javas...
>
> > The first result of that search returns a few results with those funny
> > right-bracket characters (chevrons I think). They appear fine on the
> > google site, ie:
>
> > Ajaxian » Multi-threaded JavaScript?
>
> > But if I do a search using the same term using the search API and I
> > inspect the title field - I can see it's been URL encoded as follows:
>
> > Ajaxian%20%C2%BB%20Multi-threaded%20JavaScript%3F
>
> > If I decode that using javascript uridecodecomponent then it appears
> > like this:
>
> > Ajaxian » Multi-threaded JavaScript?
>
> > With that funny "A" character appearing before the chevron. If I
> > decode it server-side using the UTF-8 character set then it appears
> > the same, ie:
>
> > Ajaxian » Multi-threaded JavaScript?
>
> > Clearly, in this case, the %C2 character is that funny "A", it's
> > listed here on this page.
>
> >http://www.w3schools.com/TAGS/ref_urlencode.asp
>
> > I haven't tried any other character sets mainly because I searched the
> > API reference to see if there was a field that indicated which
> > character set an individual result was encoded in. I couldn't see one,
> > but maybe I've missed something?
>
> > I can happily use iconv to decode if I know which character set to
> > use. Am I wrong to assume everything returned by the Search API is in
> > UTF-8? Should I instead assume everything returned by the Google
> > Search API is in a different character set (eg: ISO-8859-1?)
>
> > Firstly: may I apologise in advance for any naivete on my part?
>
> > I'd really appreciate some advice on how I should deal with this
> > issue!
>
> > Regards
> > Tristen
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Google AJAX APIs" group.
> > To post to this group, send email to 
> > [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/google-ajax-search-api?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google AJAX APIs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-ajax-search-api?hl=en.

Reply via email to