Re: [htdig] Cold Fusion (.cfm) files and HT://Dig

Gilles Detillieux Fri, 25 Jan 2002 08:50:10 -0800

According to "Augeri, Jim (NM75)":
> (I hope it is well understood just how ColdFusions <CFINCLUDE>
> tags assembles individual chunks of HTML and ultimately delivers a
> complete HTML page back to the browser. The page that the viewer sees
> within his or her browser does not actually exist anywhere on the WEB
> server itself!)


Yes, that is well understood.  What doesn't seem to be well understood is
the fact that htdig, as a web _client_, sees what the browser sees, not
what exists on the server.  So, htdig will see, and parse, the same HTML
page that your browser would display.  While parsing, it will pick up and
queue up any links it finds, *in the HTML*.

> Since most of the links to "other stuff" within the site is navigated
> to via the cascading JavaScript menus (contained in the "header.cfm"
> file), I tried making it one of the URL's in the "htdig.conf" file.
> This seems to make no difference whatsoever in the outcome.

That's because htdig never has, and in all likelyhood never will, parse
JavaScript menus.  See http://www.htdig.org/FAQ.html#q5.18

Indeed, Geoff and I have both made the rather bold claim that no indexing
spider in existance will parse JavaScript looking for links, and we
have yet to see someone come forward on this list to prove us wrong.
So, why do so many people not even consider the possibility that htdig
is no different?

> Under the BEFORE config, when "rundig" was run, it would take about
> 15-30 minutes to index the entire site.  There are about 20K documents
> on the site in question.  Now, indexing takes all of about 5-10 seconds!

Yes, htdig can be quite speedy when it doesn't find many parsable links.

> If I run "rundig" with the "-v" switch, I of course get some additional
> diagnostics about the dig itself, but not much that is worthwhile.
> And there is nothing in the output that would give me a clue as to
> why it isn't doing the same job it did before.

But with -vvv, you'd see information on all the links it does find.
See http://www.htdig.org/FAQ.html#q5.27

The complete absense in htdig -vvv's output of any mention of
links that are buried deep inside chunks of JavaScript code might
suggest the possibility that htdig doesn't actually have an embedded
JavaScript parser, prompting one to explore that further in the FAQ
and documentation.

> I have started going back to reread some of the documentation, but
> haven't seen anything thus far that is too encouraging.  Within the
> Config file, there is something about being able to create a file with
> a list of URL's that should be indexed, but I don't see anything about
> the format such a file would take.

See http://www.htdig.org/FAQ.html#q5.25

> Would I enter one URL per line?  Would I want to reference each "top
> level" page of my site in order to pick up all the links contained
> therein?

One per line is easiest, but actually any white space characters can act
as separators.  Include any URL that you want indexed that can't be reached
by spidering through HTML links.

> If it hasn't been obvious up to now, this is something of an urgent
> plea for help, as it now appears that with our newly formatted,
> ColdFusion based site, that HT-Dig is dead in the water!  I will be
> most appreciative of anyone who can provide the "magic wand" to get
> this all working again.

Well, the closest thing we have to a magic wand is the FAQ.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] Cold Fusion (.cfm) files and HT://Dig

Reply via email to