Hi

Have a look at webob 
http://pythonpaste.org/webob/reference.html#unicode-variables
and note if your running as an API server your
api consumers should probably be specifying the encoding in the their
headers.

Also its difficult to blindly encode something to UTF-8 as what they
send may not in fact be possible to encode
without stripping or translating some values to something else
entirely (for instance if some one send you UCS4)
which is something you will have to deal with on a case by case basis
for different encodings.

You could then document which encoding schemes you directly support in
your api, and then get the consumers to
set the content type charsets correctly.

Just my 2c worth

Rgds

T


On Apr 1, 5:36 am, Brian <[email protected]> wrote:
> The problem with that is that our system is an API server, so we can't
> assume that submitters are actually sending UTF. They usually are, but
> sometimes not.
>
> On Mar 29, 4:07 pm, Joshua Smith <[email protected]> wrote:
>
>
>
> > If you specify UTF-8 on the form page with a meta tag, you should only get 
> > UTF-8 in the input you receive.  At least that's been my experience.
>
> > On Mar 29, 2010, at 5:40 PM, Brian wrote:
>
> > > Hello,
>
> > > I am looking for a library or function that does the following (my one
> > > complaint about Python /GAE is that it does not provide an easy way to
> > > sanitize and transcode input to UTF). I have a function that does this
> > > pretty reliably, except when it breaks, and was wondering who else has
> > > dealt with this issue.
>
> > > HINT TO FRIENDLY GOOGLE PEOPLE: it would be really nice if you offered
> > > an option to sanitize incoming form data so your app does not need to
> > > worry about encodings. You'd just assume you're being given properly
> > > decoded utf-8, with placeholder characters where decoding failed.
> > > Failing that, it'd be nice to have a sanitizer function you can call
> > > that knows how to test for and transcode from the most common
> > > encodings into utf-8. I know Python supports a lot of different
> > > encodings, but it can be very time consuming to track this type of bug
> > > because it tends to happen sporadically when an usual string shows up
> > > in a request.
>
> > > Thanks,
>
> > > Brian McConnell
>
> > > --
> > > You received this message because you are subscribed to the Google Groups 
> > > "Google App Engine" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to 
> > > [email protected].
> > > For more options, visit this group 
> > > athttp://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to