2011/12/5 Chí-Thanh Christopher Nguyễn <[email protected]>: > Alec Warner schrieb: >>> Seriously, what do we gain from crawlers accessing sources.gentoo.org? I >>> cant >>> really remember seeing it once in a google query result... >> >> We want the site searchable. > >>>> The majority of the expensive requests are related to package.mask and >>>> use.local.desc queries by crawlers. Like crawling the entire 13000 rev >>>> history for package.mask (or similar.) > > Would it be feasible to use mod_rewrite to direct the most expensive > requests to a static copy, which is re-generated every > ${REASONABLE_TIMEFRAME}?
For now user-agents that look like a bot get sent to sources2.gentoo.org (via HTTP-302, not a perm redirect) and humans are good on sources.gentoo.org. Assuming the crawlers and indexing systems follow the spec; hopefully all our search resutls do not get rewritten to sources2.gentoo.org (that would surprise me greatly...wait no it wouldn't ;p) Robin added a caching layer for some segments of the application; I am looking at cprofile dumps and discussing pain points with upstream. -A > > > Best regards, > Chí-Thanh Christopher Nguyễn >
