I've noticed some of the results contain funny characters which seem to be related to encoding.
For instance, to search from the google search page: http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=ajaxian+javascript+multi The first result of that search returns a few results with those funny right-bracket characters (chevrons I think). They appear fine on the google site, ie: Ajaxian » Multi-threaded JavaScript? But if I do a search using the same term using the search API and I inspect the title field - I can see it's been URL encoded as follows: Ajaxian%20%C2%BB%20Multi-threaded%20JavaScript%3F If I decode that using javascript uridecodecomponent then it appears like this: Ajaxian » Multi-threaded JavaScript? With that funny "A" character appearing before the chevron. If I decode it server-side using the UTF-8 character set then it appears the same, ie: Ajaxian » Multi-threaded JavaScript? Clearly, in this case, the %C2 character is that funny "A", it's listed here on this page. http://www.w3schools.com/TAGS/ref_urlencode.asp I haven't tried any other character sets mainly because I searched the API reference to see if there was a field that indicated which character set an individual result was encoded in. I couldn't see one, but maybe I've missed something? I can happily use iconv to decode if I know which character set to use. Am I wrong to assume everything returned by the Search API is in UTF-8? Should I instead assume everything returned by the Google Search API is in a different character set (eg: ISO-8859-1?) Firstly: may I apologise in advance for any naivete on my part? I'd really appreciate some advice on how I should deal with this issue! Regards Tristen -- You received this message because you are subscribed to the Google Groups "Google AJAX APIs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-ajax-search-api?hl=en.
