On Fri, Nov 30, 2007 at 10:11:31AM +0100, Jan Kundr?t wrote:
> > - See also RFC1738: 'Within the <path> and <searchpart> components, "/",
> >   ";", "?" are reserved.'
> My copy of RFC1738 says (end of section 2.2):
...
> I wasn't able to find your quote in that file.
My quote was from the first sentence of RFC1738, sec 3.3 (HTTP), para 4.

> What is source of your definition of "valid query argument separator"?
<searchpath> is also better defined in RFC2396, section 3.4:
   Within a query component, the characters ";", "/", "?", ":", "@",
   "&", "=", "+", ",", and "$" are reserved.
Reserved because they have special meanings.

> > - Having a single valid URL for a given resource greatly improves cache
> >   hit rates (and we do use caching heavily on the new site, 60% hit rate
> >   at the moment, see further down as well).
> Redirecting clients to new URLs would give you perfect caching as well.
That's why I say i'm willing to do redirection at the cache level.
I do NOT want lots of users with old links to hit the actually web application
if it's just going to redirect all of them to a page that is already in the
cache.

> > - The old parsing and variable usage code was the source of multiple
> >   bugs as well as the security issue that shuttered the site.
> Only because it passed the raw, unescaped values directly to shell,
> which is of course badly broken.
Have a look at the recent discussion about HTML5 issues
(http://www.crockford.com/html/), which also applies to web applications:
"HTML 5 is strict in the formulation of HTML entities. In the past, some
browsers have been too forgiving of malformed entities, exposing users to
security exploits. Browsers should not perform heroics to try to make bad
content displayable. Such heroics result in security vulnerabilities."

> > - I _want_ old sites to change to using the new form, which I do
> >   advertise as being permanent resource URLs (as well as being much
> >   easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the
> >   base URL, and you are done).
> Which isn't a reason for breaking old links, IMHO.
Visitors to the old /ebuilds/ or /packages/ links get a redirect to the
frontpage. While that isn't the content they were after, it's find to help them
find it.

> > That said, if somebody wants to point me to something decent so that
> > Squid can rewrite the URLs WITH the query parameters (the built-in squid
> > stuff seems to ignore them) and hit the cache, and that can add a big
> > warning at the top of the page, I'd be happy to use it for a transition
> > period, just like the RSS URLs (which are redirected until January 2008,
> > but only because they are automated, and not browsed by humans).
> Now that's something that sound reasonable. Why limit the period and
> don't provide it forever?
Time limited to force everybody to move over, and to not have to support
the redirections for the old version of the site forever, when they
weren't advertised as permanent URLs.

I did a quick hack up of some statistics, and I see that only 6.7% (5001 out of
(69434+5001)) of the overall visitors were arriving at the old locations and
not receiving the content they were originally interested in.

Based on these stats, I'd say we are doing well in getting users to
update their links for the new site already, since it's been up for 2
weeks now.

Successful page loads (2xx, 304), by section, for November 29th.
     60 /verbump
    114 /newpackage
    167 /faq
    645 /robots.txt
    779 /categories
   1037 /arch
   2348 /category
   3329 /favicon.ico
   9084 /
   9292 /media
  20491 /package
  35354 /feed
-----------------------------
  69434 Total of data pages (no robots, css, images, favicon)
  13266 Total of rotos, images, favicon.

Failed page loads (4xx, 5xx, 3xx excluding 304), by section and code, for
November 29th. Slew of 404 codes for PHP exploits excluded, and grouped by 
how it was handled:
- Specific redirect for usage of an old RSS path:
     25 /feed 301
     91 /archs 301
- Redirected because requested object not found (invalid package, etc):
     25 /arch 302 
     30 /category 302
     44 /feed 406
    164 /feed 302
    632 /package 302
- Error or general redirect for an old URL:
     11 /similar 404
     22 /main 404
     24 ///x86%20stable 404
     44 /daily 404
    222 /search 404
    347 /images 404 (excluded from total)
   2096 /ebuilds 302
   2582 /packages 302
-----------------------------
   5001 Total (no images)

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : [EMAIL PROTECTED]
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

Attachment: pgplzwcdL41BC.pgp
Description: PGP signature

Reply via email to