On Friday, December 13, 2002, at 06:23 AM, Jeremy Tapp wrote:
(c) I did wonder if the spiders just aren't getting to our content pages?Hi Jeremy,
I know they have difficulty with database driven sites, but I had heard
Google could now trawl ASP sites, and I have noticed them indexing some of
our pages. How does this URL rewriting scheme you mention work? Should I
invest time getting our guys to build a spider page which links in simple
HTML to all our article pages and submit this to the engines?
How our system works (and it looks like it's the same for many others), is that there is a single file -- the "content server" -- that handles all requests. This file in our system is named 'index' (no extension) but we tell Apache to treat it as a PHP file (or a JSP, or a CGI).
When a visitor requests a page, it's always /index/something, which looks like an ordinary directory path, even though it's not. The "something" part of the URL is customizable, so you can say "the first directory maps to the variable 'page', but others map as key/value pairs". For example:
/index/news/section.sports
Maps to:
page => news
section => sports
This eliminates the ?, the &, the =, and the .php/.asp/.jsp/.etc from the URL, which can fool the search spider into thinking the page is okay to index. Some systems even add a fake "file.html" to the end of it, which ours can do as well if so configured, which would make the previous URL look like this:
/index/news/section.sports/index.html
How's Alta Vista to know if the page is dynamic or not? It's left to assume it's not at this point.
Some systems accomplish the same thing with Apache's mod_rewrite, but I've always found that even though I'm quite good with regular expressions, mod_rewrite's syntax is still confusing. And if it's confusing for me, if one site using the system were to want to change the behaviour -- good luck! :) Plus, you can only express so many possibilities in regular expressions, as powerful as they are.
As for Google, I've seen them indexing dynamic pages for a while now, so I don't think this stuff will help much on their site. Maybe you should try getting slashdotted a few times. ;)
Tim's advice in his last message is excellent stuff. That alone (more so than any URL spoofing scheme) should help you get some extra visitors off Google et al.(d) Are we missing any obvious tricks?Has anyone got any advice please?
Cheers and good luck!
Lux
I'll include links to a couple of our
articles below so people can see our page HTML / URL structure etc...
http://www.bikemagic.com/
An example article "Setting up handlebars" (for those who are interested ;-)
those who aren't there are plenty more sites on the same CMS linked at the
bottom of the page)
http://www.bikemagic.com/news/article.asp?SP=&v=1&UAN=3057
Many thanks once again. I wish I knew enough about CMS's to be able to
contribute in return! Some beers in the pub at the next London meet
perhaps!
Jeremy
--
http://cms-list.org/
trim your replies for good karma.
-- John Luxford President and Chief Developer ______________________________ SIMIAN systems Driving Web Content Management ______________________________ web : http://www.simian.ca/ email : [EMAIL PROTECTED] phone : 204.946.5955 -- http://cms-list.org/ trim your replies for good karma.
