you have to use only the server-name for allow, that means
Server http://www.foo.com/bar/index.htm
allow http://www.foo.com/bar
disallow .*
would be the correct solution.
the Server command only sets the starting url from which the indexing will be started, in this version of aspseek it will also index all other pages on this server, even when they are outside the bar directory.
with Allow you can set a url-filter to use while indexing, so that index will first try to match against all the allow rules and look if there is a rule that will allow the index. if you set http://www.foo.com/bar/index.htm all pages have to match against this rule, so http://www.foo.com/bar/index.htm/123 will match (and be indexed) but http://www.foo.com/bar/menue.htm don't match (the index.htm part).
i just see that i also have to check my rules... ;-)
at the moment i use a small perl-script
open (site, ">url_site.url");
open (part, ">url_part.url");
while (<>) {
$_ =~ m!^Server http://([^/]+)/(.*)!;
if ($2) {
print part $_;
} else {
print site $_;
}
}
close (site);
close (part);
that will generate two lists:
- url_site.url => includes all sites that should be indexed at all (http://www.bar.com)
- url_part.url => includes those sites where only parts that should be indexed
(http://www.foo.com/bar)
Markus Rietzler
* kommunikation & online service
* RZF NRW
* Tel: 0211.4572-130
-----Urspr�ngliche Nachricht-----
Von: Fabrice VALERE [mailto:[EMAIL PROTECTED]]
Gesendet am: Dienstag, 3. Juli 2001 16:43
An: [EMAIL PROTECTED]; [EMAIL PROTECTED]; Kir
Kolyshkin
Cc: [EMAIL PROTECTED]
Betreff: Re: AW: SERVER command in aspseek.conf
hi
when I'am using your solution with an html page to begin.
it don't follows the links on this page as I want
It's seems to work with a directory
Server http://www.foo.com/bar/index.htm
allow http://www.foo.com/bar/index.htm
disallow .*
it's indexing only index.htm, then I want to index all links of index.htm which
are in http://www.foo.com/bar/
fabrice
En r�ponse � [EMAIL PROTECTED]:
> the problem is: Server Command only means start from given url and index
> all
> pages on this server.
> it does not mean index only urls from that server that must inlcude the
> path
> (bar in our example).
>
> i had the same problem, and my solution was
>
> Server http://www.foo.com/bar
> Server http://www.domain.com
> (...)
>
> Allow http://www.foo.com/bar
> Allow http://www.domain.com
> (...)
>
> Disallow .*
>
> means for every Server command i have an Allow Command and at the end
> i
> disallow everything else.
> after a few tries i haven't found the right way with setting
> disallownomatch, but as disallonomatch is
> (nearly) the same like allow i used it in this way.
>
> Markus Rietzler
> * kommunikation & online service
> * RZF NRW
> * Tel: 0211.4572-130
>
>
>
> -----Urspr�ngliche Nachricht-----
> Von: Fabrice VALERE [mailto:[EMAIL PROTECTED]]
> Gesendet am: Donnerstag, 28. Juni 2001 12:41
> An: [EMAIL PROTECTED]; [EMAIL PROTECTED];
> [EMAIL PROTECTED]
> Betreff: SERVER command in aspseek.conf
>
> hi!
>
> I have the same problem "I find in my result urls I don't ask to
> index"
>
> I'am not sure to understand the Kir Kolyshkin's solution.
> Please correct me:
>
> if I want to index only this directory http://www.foo.com/bar/ ( not
> http://www.bar.com/everything/ )
>
> Server http://www.foo.server.com/bar/
> DisallowNoMatch http://www.foo.server.com/bar/
>
>
> In my case ther is a lot of urls to index so I use a file : which is
> compose of
> all the urls like it
>
> http://www.jeu.ru/
> http://www.jza.org/
> http://www.alode.com/
> http://www.frok.net/
>
> The second part of my question is the exact importance of the word
> 'Server'
> before htt://www.foo.com/
>
> If I really understand the kir's solution where can I write
>
> DisallowNoMatch http://www.foo.server.com/bar/
>
> for all the urls ???
>
> In the same file ( "url_file.list" ) ????
>
>
>
> Fabrice
> .~.
> /V\ L I N U X
> // \\ >Fear the Penguin<
> /( )\
> ^^-^^
>
MARRE DE JETER VOTRE ARGENT PAR LES FEN�TRE$(tm)(c)(r) ?
PASSEZ � LINUX !
.~.
/V\ L I N U X
// \\ >Fear the Penguin<
/( )\
^^-^^
