Ah, ok, that's different.

I once had a somewhat similar problem with a big customer.
Most of the files in his dir structure where not referred
to by any other file/href (several thousands). I wrote a little
shell script that reads all dirs recursivly and builds one big
index.html file with nothing but empty links and no further
content whatsoever:
<html><body>
<a href=/dir1/dir2/file1.html></a>
<a href=/dir3/file2.html></a>
<a href=/dir4/dir5/dir6/file3.html></a>
</body></html>
Basically the script pipes the result of a find operation
through a simple regex to rewrite the files found into URLs.
Then I conf'd htdig to use this file as start point for indexing.
The index file itself does not show up in any htsearches as it has no
content, but htdig _does_ index every file referred to.
The script certainly needs to be run right before every htdig run.

Hope it helps.
Marcel


--On Dienstag, 18. Juni 2002 17:26 +0200 "Albl, Thomas" 
<[EMAIL PROTECTED]> wrote:

> Hi Marcel,
>
> in short I want the dig to crawl over a mounted novell-share (our filesys
> for the docs) but to exclude the autogenerated pages from apache with the
> dir-listings (the index of - Pages with all files an dirs inside). ah, the
> novell-share get to the dig by a apache-webserver which has it's document
> root at the start of the exported novell-volume.
>
> But when the dig crawls through the dir-tree it uses the autogenerated
> Index Of - Pages as "real" documents with the files an dirs as
> matchwords. By now we don't want to index the dirs but the files :-|
> seems tricky to solve this i think... if this didn't work so we want to
> exclude the dirnames and filenames showed and only index all files by
> fulltext...
>
> --
> Mit freundlichem Gru�
> Thomas Albl
> Deutscher St�dtetag
> Tel. : 0221/3771-210
> FAX  : 0221/3771-128
> eMail: mailto:[EMAIL PROTECTED]
> Web  : http://www.staedtetag.de
>
>
>
>> -----Urspr�ngliche Nachricht-----
>> Von: Marcel Hicking [mailto:[EMAIL PROTECTED]]
>> Gesendet: Dienstag, 18. Juni 2002 16:50
>> An: [EMAIL PROTECTED]
>> Cc: Albl, Thomas
>> Betreff: Re: AW: [htdig] How to set the sorting order of a
>> webserver-export of a plain fil esystem?
>>
>>
>> Not sure exatcly what you try to do (missed the prior posting)
>> but maybe Apache/PHP's "autoprepend" and "autoappend" might help.
>> They include a static or PHP file at the top/bottom of other files.
>> Can be configured through Apache-conf, IIRC within <dir> or <files>
>> etc. sections also.
>>
>> HIH,
>> Marcel
>>
>>
>> --On Dienstag, 18. Juni 2002 16:36 +0200 "Albl, Thomas"
>> <[EMAIL PROTECTED]> wrote:
>>
>> > Dear Geoff,
>> >
>> > thanks for your help. It solves my problem half the way but
>> anyway one
>> > schould look forward :*)
>> >
>> > While the filesystem is a mounted novell-share with all our
>> docs from our
>> > company we can't put a file in each directory (massive amount of
>> > directories) so the part to transform the http-header is
>> the first step.
>> >
>> > Is it possible to use *one* htrobots-file for *all* directories? My
>> > apache-doc says that the Directive HeaderName can be used even in
>> > virtual-host-statements and the file should be placed
>> relativly in the
>> > directories. But I havn't managed to access one central file with
>> > /htrobots.html
>> >
>> > I think the key to solve my problem is to get the line
>> <META NAME="robots"
>> > CONTENT="noindex, follow"> in the header somehow - but how, without
>> > hundreds of htrobots-files?
>> >
>> > :) I try another few experiments and next to this hope for help :)
>> >
>> > Thanks a lot for your help so far!
>> >
>> > --
>> > Mit freundlichem Gru�
>> > Thomas Albl
>> > Deutscher St�dtetag
>> > Tel. : 0221/3771-210
>> > FAX  : 0221/3771-128
>> > eMail: mailto:[EMAIL PROTECTED]
>> > Web  : http://www.staedtetag.de
>> >
>> >
>> >
>> >> -----Urspr�ngliche Nachricht-----
>> >> Von: Geoff Hutchison [mailto:[EMAIL PROTECTED]]
>> >> Gesendet: Dienstag, 18. Juni 2002 15:40
>> >> An: Albl, Thomas
>> >> Cc: [EMAIL PROTECTED]
>> >> Betreff: Re: [htdig] How to set the sorting order of a
>> >> webserver-export
>> >> of a plain fil esystem?
>> >>
>> >>
>> >> > If the dig crawls through this exported filesys it finds
>> often used
>> >> > searchwords in these generated pages from filenames or
>> >> dirnames. Though
>> >>
>> >> See the FAQ:
>> >> <http://www.htdig.org/FAQ.html#q4.23>
>> >>
>> >> Regards,
>> >>
>> >> --
>> >> -Geoff Hutchison
>> >> Williams Students Online
>> >> http://wso.williams.edu/
>> >>
>> >
>> >
>> --------------------------------------------------------------
>> -----------
>> > ---                    Bringing you mounds of caffeinated joy
>> >                       >>>     http://thinkgeek.com/sf    <<<
>> >
>> > _______________________________________________
>> > htdig-general mailing list <[EMAIL PROTECTED]>
>> > To unsubscribe, send a message to
>> > <[EMAIL PROTECTED]> with a subject of
>> > unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html
>>
>>
>>
>>
>
> -------------------------------------------------------------------------
> ---                    Bringing you mounds of caffeinated joy
>                       >>>     http://thinkgeek.com/sf    <<<
>
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
> <[EMAIL PROTECTED]> with a subject of
> unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html



-- 
 Marcel Hicking

 VIA NET.WORKS Deutschland GmbH
 www.vianetworks.de
 Bismarckstrasse 120,47057 Duisburg

 -----------------------------------------------------------------
 Gesch�ftsf�hrung: Matt Nydell, HRB 7672
 Alle Angebote sind unverbindlich. Auftraege erledigen wir zu unseren
 allgemeinen Geschaeftsbedingungen.


----------------------------------------------------------------------------
                   Bringing you mounds of caffeinated joy
                      >>>     http://thinkgeek.com/sf    <<<

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to