Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Alex Shinn Tue, 15 Jan 2013 02:30:28 -0800

On Tue, Jan 15, 2013 at 6:23 PM, Peter Bex <[email protected]> wrote:


> On Tue, Jan 15, 2013 at 06:07:06PM +0900, Alex Shinn wrote:
> > On Tue, Jan 15, 2013 at 3:03 PM, Ivan Raikov <[email protected]
> >wrote:
> >
> > >
> > > Percent-encoded sequences of more than one octet will not get touched
> by
> > > pct-decode in the current implementation, so you will not get double
> > > escaping. Percent-encoded sequences of one octet will get decoded if
> they
> > > fall in the "unstructured" char-set, as per RFC 3986.
> > >
> >
> > OK, now I'm thoroughly confused.  The percent-encoding is context
> sensitive?
> > How can this not be broken?
> >
> > We need to make the design clear:
> >
> >   * What can be constructed directly with make-uri.
> >   * What can be parsed, and how this is passed to make-uri.
> >   * How URIs are represented internally.
> >   * How URIs are encoded on output.
> >
> > It sounds like uri-common and uri-generic are doing different things
> here.
>
> uri-generic is agnostic about specific encodings and types.
> uri-common is designed to make life simpler in the case of "common" URIs
> like HTTP where we know what types of characters are to be decoded.
>
> RFC3986 "special characters" cannot be decoded unless we know they have
> no special meaning.  uri-common just decodes everything fully because
> there is generally no deeper nested encoding involved.  It's also smart
> enough to know that port 80 belongs to http, so it can be omitted,
> whereas uri-generic can't make such assumptions.
>
> uri-common also makes the assumption that query args are
> x-www-form-urlencoded.  This is the main reason to prefer it for web
> programming; uri-generic doesn't know about form-encoding because that
> is really only used in the context of HTML (it's strictly not even a
> HTTP thing), so this messy stuff should stay out of the generic URI
> library.
>
> Yes, the web is evil and must die.
>

Right, I'm familiar with the evil standards :)  I'm also hoping that we can
have some basic compatibility between Chicken's uri module and Chibi's
(and whatever R7RS WG2 comes up with).

It seems to me the sane thing to do is represent URIs unencoded
internally, which can be generated directly with make-uri or decoded
on parsing.  The decoding might be schema-specific, although
really the only difference is the space-to-+ and query args encoding.

Then, on output we would encode as needed.

I was confused because the uri-generic change Ivan suggests
seems to be putting encoded characters directly in the representation,
whereas uri-common is encoding only on output.

[It also looks like the uri-common encoding is broken - why were bytes
getting lost?]

Finally, regarding parsing I still don't understand why %AB is decoded
into the corresponding octet but %AB%CD is not?

-- 
Alex

_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Reply via email to