there was a mail or entry in the forum which described the difference between
(web)space and subset. webspaces take different servers and subsets can be used
to limit the search on parts of one server. but the problem here would be that
you can only use a filter like www.domain.de/german/% or www.domain.de/english/% for
subsets, there is only a url-matching-filter. no logical one that is able to search
for pages that were linked from a special starting url...
your right: Server only sets the url to start with. but index will crawl all the pages
that can be followed by links on this Server url -
it does not mean: search all pages, that must include the pathname
of given server entry (like /german for www.domain.de/german/start.htm) so it will index
all pages on that server. i found the problem over the weekend while trying to index
~3000 url. ~800 of them include that important path-part (like www.geocities.com/xxxx/yyyy),
i do not want to index geocities... ;-)
i solved it by adding
Server www.geocities.com/xxx/yyyy
Server ...
Allow www.geocities.com/xxx/yyyy
Allow ...
Disallow ...
Disallow .*
in the aspseek.conf
Markus Rietzler
* kommunikation & online service
* RZF NRW
* Tel: 0211.4572-130
-----Urspr�ngliche Nachricht-----
Von: Thomas 'Balu' Walter [mailto:[EMAIL PROTECTED]]
Gesendet am: Mittwoch, 27. Juni 2001 15:14
An: ASPseek
Betreff: [aseek-users] finding a link & Sites
Is there a way to determine why a specific page gets added to the index?
(or - which page links to that page?)
I am asking because it looks like there is a backlink to my index-page
where you are able to choose the language - so all pages get indexed,
and not only the language-one...
In addition it looks like "sites" that are needed for webspaces are just
the machine-name, so I can not differ using webspaces between the
structures below
http://roadrunner.bswp.de/gerstel/de_mainframe.html and
http://roadrunner.bswp.de/gerstel/en_mainframe.html
correct?
The problem is that the content-management system those people use does
not put the different languages in differend subdirs or the like :(
How could I manage that - any ideas?
Balu
