On Fri, Nov 30, 2007 at 10:11:31AM +0100, Jan Kundr?t wrote: > > - See also RFC1738: 'Within the <path> and <searchpart> components, "/", > > ";", "?" are reserved.' > My copy of RFC1738 says (end of section 2.2): ... > I wasn't able to find your quote in that file. My quote was from the first sentence of RFC1738, sec 3.3 (HTTP), para 4.
> What is source of your definition of "valid query argument separator"? <searchpath> is also better defined in RFC2396, section 3.4: Within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved. Reserved because they have special meanings. > > - Having a single valid URL for a given resource greatly improves cache > > hit rates (and we do use caching heavily on the new site, 60% hit rate > > at the moment, see further down as well). > Redirecting clients to new URLs would give you perfect caching as well. That's why I say i'm willing to do redirection at the cache level. I do NOT want lots of users with old links to hit the actually web application if it's just going to redirect all of them to a page that is already in the cache. > > - The old parsing and variable usage code was the source of multiple > > bugs as well as the security issue that shuttered the site. > Only because it passed the raw, unescaped values directly to shell, > which is of course badly broken. Have a look at the recent discussion about HTML5 issues (http://www.crockford.com/html/), which also applies to web applications: "HTML 5 is strict in the formulation of HTML entities. In the past, some browsers have been too forgiving of malformed entities, exposing users to security exploits. Browsers should not perform heroics to try to make bad content displayable. Such heroics result in security vulnerabilities." > > - I _want_ old sites to change to using the new form, which I do > > advertise as being permanent resource URLs (as well as being much > > easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the > > base URL, and you are done). > Which isn't a reason for breaking old links, IMHO. Visitors to the old /ebuilds/ or /packages/ links get a redirect to the frontpage. While that isn't the content they were after, it's find to help them find it. > > That said, if somebody wants to point me to something decent so that > > Squid can rewrite the URLs WITH the query parameters (the built-in squid > > stuff seems to ignore them) and hit the cache, and that can add a big > > warning at the top of the page, I'd be happy to use it for a transition > > period, just like the RSS URLs (which are redirected until January 2008, > > but only because they are automated, and not browsed by humans). > Now that's something that sound reasonable. Why limit the period and > don't provide it forever? Time limited to force everybody to move over, and to not have to support the redirections for the old version of the site forever, when they weren't advertised as permanent URLs. I did a quick hack up of some statistics, and I see that only 6.7% (5001 out of (69434+5001)) of the overall visitors were arriving at the old locations and not receiving the content they were originally interested in. Based on these stats, I'd say we are doing well in getting users to update their links for the new site already, since it's been up for 2 weeks now. Successful page loads (2xx, 304), by section, for November 29th. 60 /verbump 114 /newpackage 167 /faq 645 /robots.txt 779 /categories 1037 /arch 2348 /category 3329 /favicon.ico 9084 / 9292 /media 20491 /package 35354 /feed ----------------------------- 69434 Total of data pages (no robots, css, images, favicon) 13266 Total of rotos, images, favicon. Failed page loads (4xx, 5xx, 3xx excluding 304), by section and code, for November 29th. Slew of 404 codes for PHP exploits excluded, and grouped by how it was handled: - Specific redirect for usage of an old RSS path: 25 /feed 301 91 /archs 301 - Redirected because requested object not found (invalid package, etc): 25 /arch 302 30 /category 302 44 /feed 406 164 /feed 302 632 /package 302 - Error or general redirect for an old URL: 11 /similar 404 22 /main 404 24 ///x86%20stable 404 44 /daily 404 222 /search 404 347 /images 404 (excluded from total) 2096 /ebuilds 302 2582 /packages 302 ----------------------------- 5001 Total (no images) -- Robin Hugh Johnson Gentoo Linux Developer & Infra Guy E-Mail : [EMAIL PROTECTED] GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
pgplzwcdL41BC.pgp
Description: PGP signature