On 6 Sep, Jeff Breidenbach wrote:
>
>
> Hi htdig folks,
>
> I'm having a bit of a problem getting what I want from the htdig
> configuration options. Lots of people, myself included, use htdig in
> conjunction with MHonArc. In the current release version of MHonArc
> (2.4.3, which I recently upgraded to) attachments may be stored in
> subdirectories as following:
>
> The first URL is the message, while the second is the attachment.
> No need to follow the links, just look at their structure.
>
> http://mail-archive.com/sinister%40majordomo.net/1997-month-08/msg00174.html
>
>http://mail-archive.com/sinister%40majordomo.net/1997-month-08/msg00174/The_state_i_am_in.txt
>
> My question is, using the current stable version of htdig, how
> can I configure it to ONLY index messages, and not index attachments?
> If I could say "Ignore everything that does not end in .html" or
> "only index URLs with a certain regexp" that would do the trick.
> But with the current configuration options, I just don't see how to do
> this.
>
> Thanks in advance for enlightenment.
>
> Jeff
Just at a quick glance, assuming you _only_ want to dig your Mhonarc
stuff, or are happy to dig it separately from the rest of your site, it
appears that you might be able to use a judicious mixture of start_url
and limit_urls_to (or perhaps exclude_urls if you know what the
extensions of all your attachment files are)
Set your start_url to the top level of the Mhonarc structure, and set
limit_urls_to: .html or
exclude_urls: .txt
That should keep most of them out - of course, if you have an
attachment file whose name contains the string in limit_urls_to, it'll
get picked up.
And with luck Geoff or Gilles will know a better way :-)
--
David Robley
WEBMASTER | Phone +61 8 8374 0970
RESEARCH CENTRE FOR INJURY STUDIES | http://www.nisu.flinders.edu.au/
AusEinet | http://auseinet.flinders.edu.au/
Flinders University, ADELAIDE, SOUTH AUSTRALIA
Visit the PHP mirror at http://au.php.net:81/
<<<<<<<<<<<<< WARNING * END OF TEXT * STOP READING HERE >>>>>>>>>>>>>>
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.