On Mon, 28 Jan 2002, Neal Richter wrote:

> The more user-side interactive the page is, generally the worse off you
> will be.

This essentially sums up the problem. A spider cannot emulate a human
user.

Even if you make the assumption that a spider might be able to parse
and/or run the JavaScript or whatever, that doesn't mean it can actually
use it for navigation. Is it supposed to know that one drop-down menu is
supposed to be relative URLs and another is something else?

> After reading the Retriever & HTML parsing code, htdig pretty much treats
> web pages as documents-to-parse, and not programs-to-run.  So without good
> default behavior, it may not be too sucessfull on pages with dynamic
> content for navigation purposes.

Beyond everything that's been discussed so far, I'll interject that there
are perfectly good ways of pointing a spider at URLs even if you use some
sort of dynamic navigation. For example, the <LINK> tag:
<http://www.w3.org/TR/html4/struct/links.html#h-12.3>

In particular, the HTML specifications have sections about "Links and
search engines." :-)

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to