Mike,
Thanks a lot for getting back to me and providing some useful info
here. You are right in assuming I am working on an intranet site.
For (i), there doesn't seem to be an easy way of inserting robots tags as I
think these are the automatically generated files by Apache to give the
directory structure that htdig can follow. If I stop htdig looking at the
files, will I also lose the indexing for the directories they
represent? Is there a way of telling Apache to enter a relevant code in
the file as it generated to get htdig to not display it as part of the results?
With regard to (ii), I think I may pursue the options you suggest with a
non-htdig solution. However, giving users just one interface to check for
files may be an advantage, which is why I think I would still like to try
an htdig approach if it were easily possible. Unfortunately too few
documents would have a consistent string that could be used to search for.
Cheers,
Richard
At 12:37 2006-01-16 +0000, you wrote:
Richard,
I think that both of these are solvable.
(i) If you have access to modify these indexes, then try adding a
<robots noindex,follow> tag (check the syntax, I'm sure that's not
correct)
or try adding <!-- htdig no-index --> tags, provided you have links
elsewhere.
I believe that such indexes are served as a redirect to:
whatever/folder/index.html in which case you should be able to set up
an exclude for them, again assuming that you don't need to index their
links.
(ii) The first thing that comes to mind is to implement this without
htdig. PERL, PHP or Java would all be able to produce what is
effectively just a directory listing, and if carefully written should be
more efficient than an htdig search.
Second idea would be to try and find a common term, such as your site
name, that appears on every page. If this is a public site then that
should normally be added to the bad-words list, but this sounds like an
Intranet?
Good luck,
Mike
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
ht://Dig general mailing list: <htdig-general@lists.sourceforge.net>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general