I'd be interesting in having some sort of flag on request, that indicated if 
the incoming query was bad.  I can't do a die here for legacy reasons.
jnap 


     On Sunday, August 2, 2015 9:39 AM, Bill Moseley <mose...@hank.org> wrote:
   

 BTW -- I wonder about the Catalyst behavior here.

On Sat, Aug 1, 2015 at 10:36 PM, Bill Moseley <mose...@hank.org> wrote:



On Sat, Aug 1, 2015 at 6:31 AM, Stefan <maill...@s.profanter.me> wrote:

Hi,if a URL parameter contains a Unicode character (e.g. 
www.example.com/?param=%D6lso%DF which stands for param=Ölsoße), the parameter 
is not correctly parsed as Unicode.


One note here -- data over the wire must be encoded into octets.   So, all 
Unicode characters must be encoded and then decoded when received.  (You can't 
send "Unicode characters".)   UTF-8 is used now (for obvious reasons).  
http://tools.ietf.org/html/rfc3986.
You are specifying %D6 -- although the Unicode characters is U+00D6, the UTF-8 
octet sequence is 0xC3 0x96. See: 
http://www.fileformat.info/info/unicode/char/00D6/index.htm
Unless otherwise instructed, Catalyst uses UTF-8 as the encoding for decoding 
query parameters -- query parameters are decoded from UTF-8 octets to Perl 
characters.
As your example showed, if you use invalid UTF-8 sequences then 
Encode::decode() as used by Catalyst will replace those with the U+FFFD 
substitution character "�".
This may or may not be what you want.   Personally, I think it's not correct to 
silently modify user input.   You intended to pass "Ölsoße" but ended up with 
"�lso�e" -- is that really the data you would want to process/store for the 
request?   Seems unlikely.
If "param" is suppose to be passed as textual, UTF-8-encoded octets, and it 
isn't, then maybe returning a 400 is a better way of handling that.   That 
probably would have helped you see what is wrong in this case.
i.e. use "eval { decode( $default_query_encoding, $str, FB_CROAK | LEAVE_SRC ); 
}" to catch invalid data and return to the client the "$str" that failed and 
why.
Of course, it is also possible that you have some query parameters that you 
want decoded as UTF-8 and some that might represent something else (a raw 
sequence of bytes), and want more manual control.  In that case 
$c->config->{do_not_decode_query} could be used to bypass the decoding.   But 
then, you must manually decode() yourself.
-- 
Bill Moseley
mose...@hank.org
_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


  
_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/

Reply via email to