Colin Viebrock wrote:
>
> Thus spake Geoff Hutchison (at 01:37 PM 9/17/98 -0400) ...
> >I guess the "problem" is this: ht://Dig interprets JavaScript in HTML
> >files as text. So if we can take the code Muffin uses to strip JavaScript
> >and add it to a "remove JavaScript" pass over the HTML files before
> >ht://Dig begins the real indexing, we'd be set.
>
> What about the "problem" of people using JS to pop up windows and other
> URLs and such? If you simply strip all the JS code from a document, you'll
> lose these links (and the info in them).
And your problem with this is.... :-) (Did I mention I don't like
Javascript?)
> And I haven't even mentioned JS that creates URL references on the fly, or
> based on other variables. Good luck coding a parser for that!
Exactly. This is definately non-trivial.
For this reason there is not a single search engine that I know of that will
find any pages at http://www.htmlguru.com/ except the front page...
> The only complete solution I can see is to write a program that emulates a
> browser and follows every possible link, button, image map, etc. possible
> from that page.
There is that GPL'd javascript interpreter... Believe me, I've thought about
it...
> [or do the digging on the server side ... but then what URL do you present
> to the user?]
Yup.
Just say "no" to javascript. :-)
P.S.: The best part of all this javascript stuff is that marketing normally
wants all the fancy stuff on their web pages but they *also* want all their
pages to be found by all the search engines. Try to explain that to them.
(What? Me bitter? Ha!)
--
Andrew Scherpbier <[EMAIL PROTECTED]>
Contigo Software <http://www.contigo.com/>
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.