On 10/24/06, Ian Bicking <[EMAIL PROTECTED]> wrote:
> Jack Tihon wrote:
> > Hi,
> >
> > I've had some issues which initially seemed to be related to FormBuild
> > but now seem to fall squarely into paste/request.py.
> >
> > parse_formvars() from file paste/request.py is trying to add to the
> > formvars MultiDict in the folowing loop:
> > ...
> > if isinstance(fs.value, list):
> > for name in fs.keys():
> > values = fs[name]
> > if not isinstance(values, list):
> > values = [values]
> > for value in values:
> > if not value.filename:
> > value = value.value
> > formvars.add(name, value.decode('utf-8')) #the submitted
> > CGI was UTF-8
> > print "formvars.add invoked on (name, value)", name, value
> > ...
> >
> > My proposed modification above decodes the CGI form value as UTF-8. From
> > my reading of the code, file uploads are handled differently, so that
> > won't be an issue. What _is_ an issue, however, is that I'm assuming
> > UTF-8 input. Is there a cleaner way to do this? From my searching I've
> > learned that the 'accept-charset' attribute of the FORM tag can be used
> > to specify the allowed input charset. Is there a way to programmatically
> > get that value and decode appropriately?
>
> Sorry, I missed this before. The encoding of forms can be a little
> tricky. The form is generally encoded in the same character set as the
> page it is on. Since the form can come from other sites served up with
> different character sets, it becomes tricky to figure out.
>
> Is there something in the request that shows the encoding?
> (CONTENT_TYPE == 'application/x-www-url-encoded-form; charset=utf8'?)
>
> I haven't done much testing around this, so I don't know. This could
> potentially be done with a wrapper around MultiDict too, that lazily
> decodes the values.
This can be a frustrating subject. Did you know if you set
accept-encodings="US-ASCII" in the form, but the user tries to submit
Japanese characters, Firefox will send them as HTML entities like
Ӓ whereas IE will ignore you and send UTF-8 (assuming your page
was originally in UTF-8)? Bleh!
-jj
--
The one who gets the last laugh isn't the one who did the laughing,
but rather the one who did the writing.
_______________________________________________
Paste-users mailing list
[email protected]
http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users