On Mon, 28 Jan 2002, Tod Thomas wrote:

> Can someone please set the record straight on this?
> 
> Everything I have come across says that web spiders cannot index javascript since 
>they cannot parse and interpret
> it.  My impression is this is particularly true for sites that use JavaScript almost 
>exclusively for things like
> navigation, drop down menus, and the like.
> 
> Is this true, or is a solution in place that I don't know about?  Can anybody point 
>me to documentation that
> discusses Javascript and how search indexing is affected?  Are there commercial 
>solutions available?

        That would depend on what exactly you are doing.  As far as the
javascript code goes, that won't get indexed as javascript code is
encapsulated in HTML comment tags... assuming the web-spider's HTML
parsing code isn't brain-dead.

        If you are talking about a fully dynamic page with lots of
parameters and dynamic functionality it becomes very hard.  The spider
would need to have lots of web-browser functionality incorporated into it
to load up a javascript environment and treat the page as a program-to-run,
 rather than a file-to-parse.  The spider would have to understand http
GET/POST params how those parameters influenced the page.

        This is the DHTML analog of spidering classical CGI apps, most
spiders can't 'go through' the forms.  There was a company recently that
actually got some momentary notoriety by announcing that it had developed
spidering technology that would fill in web forms and submit them to
spider the pages 'behind' the forms.

        Javascript dependent navigation also would need a kind of
browser-like spider that treats the page as a program-to-run.

        Dynamic PHP pages are a bit easier, especially if all the
parameters are on the URL and no forms are used.  A spider sees them as
standard URLs and follows them, assuming the spider's URL parser doesn't
kill of the parameters before it fetches the page.

        Of course nothing stops you from using PHP to supply javascript
menus... using PHP won't help spidering for that situation.

Summary:
        CGI with forms: 
                NO (for the most part)

        DHTML with Javascript:  
                Depends on the level of usage, the more
                dynamic (dependent on parameters and javascript code
                output) the page is, the less successful spidering will
                be.

        PHP (or other server-side scripting lang) w/o hidden parameters:
                Spiders can do pretty well here.

The more user-side interactive the page is, generally the worse off you
will be.

Of course a lot of these drawbacks can be somewhat compensated for by
designing a web-site/web-app with spidering in mind.  That involves lots
of testing and good default behavior.. a lot like making your pages
visible/usable on now ancient web-browsers like Mozilla 1.0, early
versions of Netscape, etc.

After reading the Retriever & HTML parsing code, htdig pretty much treats
web pages as documents-to-parse, and not programs-to-run.  So without good
default behavior, it may not be too sucessfull on pages with dynamic
content for navigation purposes.

Feel free to correct me if I missed something. ;-)

Thanks.

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site



_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to