Mike,

Thanks a lot for getting back to me and providing some useful info here. You are right in assuming I am working on an intranet site.

For (i), there doesn't seem to be an easy way of inserting robots tags as I think these are the automatically generated files by Apache to give the directory structure that htdig can follow. If I stop htdig looking at the files, will I also lose the indexing for the directories they represent? Is there a way of telling Apache to enter a relevant code in the file as it generated to get htdig to not display it as part of the results?

With regard to (ii), I think I may pursue the options you suggest with a non-htdig solution. However, giving users just one interface to check for files may be an advantage, which is why I think I would still like to try an htdig approach if it were easily possible. Unfortunately too few documents would have a consistent string that could be used to search for.

Cheers,
Richard

At 12:37 2006-01-16 +0000, you wrote:
Richard,
I think that both of these are solvable.
(i) If you have access to modify these indexes, then try adding a
<robots noindex,follow> tag (check the syntax, I'm sure that's not
correct)
or try adding <!-- htdig no-index --> tags, provided you have links
elsewhere.
I believe that such indexes are served as a redirect to:
whatever/folder/index.html  in which case you should be able to set up
an exclude for them, again assuming that you don't need to index their
links.

(ii) The first thing that comes to mind is to implement this without
htdig. PERL, PHP or Java would all be able to produce what is
effectively just a directory listing, and if carefully written should be
more efficient than an htdig search.
Second idea would be to try and find a common term, such as your site
name, that appears on every page. If this is a public site then that
should normally be added to the bad-words list, but this sounds like an
Intranet?


Good luck,
Mike



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
ht://Dig general mailing list: <htdig-general@lists.sourceforge.net>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to