Yep, cheers Jeremy, you're absolutely correct. The result is in UTF-8 and I can see where I was going wrong.
To readers of this thread: Google does return results in UTF-8 and please don't let my original post confuse you! I hope this clears it up a bit: "%C2%BB" does in fact decode to the single multi-byte UTF-8 character "»". I was making the mistake of not realising the "»" as a multibyte character (hence the "»"). Apologies! Tristen On Aug 22, 9:08 pm, Jeremy Geerdes <[email protected]> wrote: > Everything should be in UTF-8. > > Jeremy R. Geerdes > Effective website design & development > Des Moines, IA > > For more information or a project quote:http://jgeerdes.home.mchsi.com > [email protected] > > If you're in the Des Moines, IA, area, check out Debra Heights Wesleyan > Church! > > On Aug 21, 2010, at 8:33 PM, tristen wrote: > > > > > I've noticed some of the results contain funny characters which seem > > to be related to encoding. > > > For instance, to search from the google search page: > > >http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=ajaxian+javas... > > > The first result of that search returns a few results with those funny > > right-bracket characters (chevrons I think). They appear fine on the > > google site, ie: > > > Ajaxian » Multi-threaded JavaScript? > > > But if I do a search using the same term using the search API and I > > inspect the title field - I can see it's been URL encoded as follows: > > > Ajaxian%20%C2%BB%20Multi-threaded%20JavaScript%3F > > > If I decode that using javascript uridecodecomponent then it appears > > like this: > > > Ajaxian » Multi-threaded JavaScript? > > > With that funny "A" character appearing before the chevron. If I > > decode it server-side using the UTF-8 character set then it appears > > the same, ie: > > > Ajaxian » Multi-threaded JavaScript? > > > Clearly, in this case, the %C2 character is that funny "A", it's > > listed here on this page. > > >http://www.w3schools.com/TAGS/ref_urlencode.asp > > > I haven't tried any other character sets mainly because I searched the > > API reference to see if there was a field that indicated which > > character set an individual result was encoded in. I couldn't see one, > > but maybe I've missed something? > > > I can happily use iconv to decode if I know which character set to > > use. Am I wrong to assume everything returned by the Search API is in > > UTF-8? Should I instead assume everything returned by the Google > > Search API is in a different character set (eg: ISO-8859-1?) > > > Firstly: may I apologise in advance for any naivete on my part? > > > I'd really appreciate some advice on how I should deal with this > > issue! > > > Regards > > Tristen > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google AJAX APIs" group. > > To post to this group, send email to > > [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group > > athttp://groups.google.com/group/google-ajax-search-api?hl=en. -- You received this message because you are subscribed to the Google Groups "Google AJAX APIs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-ajax-search-api?hl=en.
