Title: AW: [aseek-users] finding a link & Sites

there was a mail or entry in the forum which described the difference between
(web)space and subset. webspaces take different servers and subsets can be used
to limit the search on parts of one server. but the problem here would be that
you can only use a filter like www.domain.de/german/% or www.domain.de/english/% for
subsets, there is only a url-matching-filter. no logical one that is able to search
for pages that were linked from a special starting url...

your right: Server only sets the url to start with.  but index will crawl all the pages
that can be followed by links on this Server url -
it does not mean: search all pages, that must include the pathname
of given server entry (like /german for www.domain.de/german/start.htm) so it will index
all pages on that server. i found the problem over the weekend while trying to index
~3000 url. ~800 of them include that important path-part (like www.geocities.com/xxxx/yyyy),
i do not want to index geocities... ;-)

i solved it by adding

Server www.geocities.com/xxx/yyyy
Server ...

Allow www.geocities.com/xxx/yyyy
Allow ...

Disallow ...

Disallow .*

in the aspseek.conf


Markus Rietzler
* kommunikation & online service
* RZF NRW
* Tel: 0211.4572-130



-----Urspr�ngliche Nachricht-----
Von: Thomas 'Balu' Walter [mailto:[EMAIL PROTECTED]]
Gesendet am: Mittwoch, 27. Juni 2001 15:14
An: ASPseek
Betreff: [aseek-users] finding a link & Sites

Is there a way to determine why a specific page gets added to the index?
(or - which page links to that page?)

I am asking because it looks like there is a backlink to my index-page
where you are able to choose the language - so all pages get indexed,
and not only the language-one...

In addition it looks like "sites" that are needed for webspaces are just
the machine-name, so I can not differ using webspaces between the
structures below

        http://roadrunner.bswp.de/gerstel/de_mainframe.html and
        http://roadrunner.bswp.de/gerstel/en_mainframe.html

correct?

The problem is that the content-management system those people use does
not put the different languages in differend subdirs or the like :(

How could I manage that - any ideas?

        Balu

Reply via email to