Hello, Although I'm not sure why urllib2 resulted in successful retrieval of URLs containing braces I have since come to discover that the issue here lies not with Google App Engine but rather at dbpedia. I have since come to realize that I misdiagnosed the issue and thought I'd update this thread indicating that.
If anyone is interested, the root cause seems to be a redirect (303 see other) taking place where the url being redirected to is not encoded. A direct request to the redirect URL, with encoding, retrieves the intended document. Matthew On May 5, 11:07 pm, Matt Trinneer <[email protected]> wrote: > Having some luck... By using urllib2 instead of urlfetch I am able to > load the same URLs on the production server without any issue. Not > really a solution per say but it gets the job done. Appreciate > everyone's feedback. > > On May 5, 10:29 pm, Matt Trinneer <[email protected]> wrote: > > > > > Hi George, > > > Thanks for the response. I've done some additional testing and am not > > getting much further. Unfortunately in this case I do not have > > control of the endpoint and am stuck with braces in the URL. > > > Some additional notes which may be of use to anyone who happens upon > > this: > > > 1. The URLs being requested in this example return xml/rdf > > documents. > > 2. In the case of requesting a resource without braces in it's URL a > > response similar to the following is received (truncated for brevity) > > > <?xml version="1.0" encoding="utf-8" ?> > > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > > xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> > > <rdf:Description rdf:about="http://dbpedia.org/resource/Companion_ > > %28manga%29">.....</rdf:Description> > > </rdf:RDF> > > > 3. On the GAE production environment the response to a request for a > > URL with braces is not an error, but rather an empty rdf document. > > > <?xml version="1.0" encoding="utf-8" ?> > > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > > xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> > > </rdf:RDF> > > > 4. This lead me to speculate that the request being received by the > > remote host was not for the same resource as I believe I am making a > > request for. So, with the help of another non-GAE endpoint I have > > been logging requests generated via urlfetch and am not able to see > > any appreciable difference between those sent by the development > > version, where these requests work, and the production version, where > > they don't. > > > Continuing to investigate.... > > > On May 5, 5:31 am, George <[email protected]> wrote: > > > > Ivan, Your problem looks like a common encoding problem. The default > > > encoding used in server of GAE is ASCII, but something else such as > > > UTF-8 on your computer. So the code works in your development > > > environment but not on Google server. > > > > To deal with this problem you need to declare the encoding in file > > > header and decode your string to unicode with the proper charset > > > before using it. If you don't do this, the Python interpreter will > > > help you to do it with the system default one. I agree this is a > > > little confusing. Python should do it more elegantly. > > > > For Matthew's problem, sorry I also have no idea about it. urlfetch is > > > a mystery in GAE libs. I found several examples working good in local > > > but throwing error on server. So I can only suggest you avoid touching > > > the dangerous zone like braces in url. :-) > > > > -- > > > George > > > > App Engine Unit Test Frameworkhttp://code.google.com/p/gaeunit/ > > > > On May 4, 5:35 pm, Ivan Maslov <[email protected]> wrote: > > > > > I have similar problem. On development server function urlencode works > > > > correctly with unicode string. In production error occurs: > > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in > > > > position > > > > 2: ordinal not in range(128). It occurs when i pass russian strings as > > > > parameter. > > > > > 2009/5/4 Matt Trinneer <[email protected]> > > > > > > To further that post... > > > > > > It seems to me that URLs containing characters such as ( and ) are not > > > > > being fetched properly on the production environment. I've attempted > > > > > escaping the characters, as per RFC 3986. However the escaped url > > > > > (http://dbpedia.org/resource/Companion_%28manga%29) doesn't fair any > > > > > better. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
