On Sat, 11 Mar 2000, Jeremy C. Reed wrote:

> On Fri, 10 Mar 2000, Marc Slemko wrote:
>
> > Suppose I have a site.  A fairly large and popular site that is some sort
> > of message board type site, with several million or so unique and
> > legitimate messages.
> >
> > Suppose the URLs for the messages are all in the form
> > http://site/foo/showme.foo?msgid=6666 where 6666 identififes the message.
> >
> > Suppose I want common robots to index it, since the messages do
> > contain useful content, so it is to their advantage because it
> > gives them real and useful content that gives them better results
> > than other engines that don't index them, and to my advantage to
> > have it indexed since it brings people to the site.
>
> I am guessing that these generated-on-the-fly webpages do not have
> changing content. So why don't you have some programmer write a quick
> script to locally fetch every one of these pages and resave them as plain
> old static html web pages? Your webserver will have a lot less work to do
> and your pages can all be indexed by the major search engines.

Yes, they do have lots of dynamic content that changes based on who views
them, and many of them include other information that changes from second
to second for most of the page views.

> If the "msgid" (in your example) has a standard format, this should be
> quite trivial. Someone could write a routine to do this in just a few
> minutes. (Of course, it may take hours to run and to verify the results.)
> Plus you could automate the routine to run weekly to convert new dynamic
> webpages to static.

Except it would also require the changing of a whole bunch of other pages
on the site to use the new form of URLs, and there are a good thousand or
so other pages.  I'm quite familiar with all the options, and for a lot of
reasons, making all the older message pages into static HTML pages just
isn't an option.

As I said before, there is nothing technically stopping me from changing
the URLs to not have a "?" in them without changing the underlying way
they are generated.  It is simply a very significant effort, and it is a
bit archaic that there is no way to tell a search engine to include such
pages.

Reply via email to