Hi Peter,

   I think uri-generic does not silently mangle input upon receiving UTF-8,
it just returns #f. I think it is not a bad idea to raise an exception
instead.
I have not yet had the chance to thoroughly test the UTF-8 mapping
constructor, but will try to do this during the weekend.

    Ivan


On Thu, Jan 17, 2013 at 5:45 PM, Peter Bex <[email protected]> wrote:

> On Thu, Jan 17, 2013 at 09:35:36AM +0900, Ivan Raikov wrote:
> > Hi Peter,
> >
> >     I think that allowing raw UTF-8 sequences in uri-generic breaks
> > compatibility with RFC 3986. In other words, if you construct a URI with
> a
> > UTF-8 sequence that happens to include reserved ASCII characters, those
> > ASCII characters will not get escaped, and you could potentially be
> sending
> > an invalid URI to a legacy system that does not understand UTF-8.
>
> Hi Ivan,
>
> I agree with your assessment, but the way it currently silently mangles
> input isn't ideal.  I think it would be good if all constructors raised
> an exception when receiving octets with the high bit set (this is
> non-ASCII, which means it falls outside the scope of RFC 3986 so it's
> acceptable to raise an exception).  What are your thoughts on this?
> If we do this, of course the error message should include a pointer to
> the new UTF conversion API so people know what to do.
>
> >   My proposed solution is to include a UTF-8 aware constructor to
> > uri-generic and prevent percent decoding of UTF-8 sequences. I believe
> that
> > this solution is compatible with the IRI to URI mapping scheme described
> in
> > Section 3.1 of RFC 3987, but indeed I need to extend the uri-generic test
> > suite with more UTF-8 examples to ensure that nothing is broken. I think
> > that any solution will have to give the user choice whether to use ASCII
> or
> > UTF-8, and not just default to UTF-8.
>
> This seems like a good compromise.  Unfortunately it means the API will
> grow quite a bit and make it less easy to use.  I'll need to consider
> what to do with http-client's "implicit" URI conversion though
> (it accepts either strings or URI objects).  I guess for now I'll keep
> it the way it is.  If people need UTF8 they can use the new conversion
> procedures.  Maybe later I can change it, this should not cause any
> breakage (unless talking to legacy systems, but those don't accept UTF
> anyway so if you have UTF-8 input, there's a problem anyway)
>
> Cheers,
> Peter
> --
> http://sjamaan.ath.cx
>
> _______________________________________________
> Chicken-users mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/chicken-users
>
_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to