thomas --> you lucky one,
kir --> i still try to figure out how to set the correct options in aspseek.conf to limit the search to parts of servers,
i have tried to use:
Server http://www.wuppertal-forum.de/wuth/
# Server http://www.sparkasse-wuppertal.de/
# Server http://www.wuppertal-navigator.de/
Server http://www.wuppertal-navigator.de/suche-wuppertal/
Period 30d
# Include url.lst
# stadtplan (insb. www.wuppertal.de/applikationen/stadtplan)
Disallow /stadtplan/
# Include excl.url
DisallowNoMatch www.wuppertal-navigator.de/suche-wuppertal/
Allow \/$|\.htm$|\.html$
but this will result 3 pages found (although there should be several more...)
think that my disallownomatch will
Charset [iso88591] already loaded skipped
Loading configuration from /opt/aspseek/etc/charsets.conf
Loading configuration from /opt/aspseek/etc/stopwords.conf
Loading configuration from /opt/aspseek/etc/aspseek.conf
Adding URL: http://www.wuppertal-navigator.de/robots.txt
Adding URL: http://www.wuppertal-forum.de/robots.txt
Adding URL: http://www.wuppertal-navigator.de/suche-wuppertal/
Adding URL: http://www.wuppertal-forum.de/wuth/
Unsupported protocol in mailto:[EMAIL PROTECTED]
Adding URL: http://www.wuppertal-navigator.de/suche-wuppertal/index-inhalt.html
Unsupported protocol in mailto:[EMAIL PROTECTED]
Adding URL: http://www.wuppertal-navigator.de/suche-wuppertal/addurl_meta.html
Saving real-time database ... done.
Saving delta files [..................................................] done.
Saving real-time ... done
Saving redirects ... done
Splitting href delta file ... done
Saving href delta files ... done
Saving direct href delta files ... done
Calculating ranks [............................] done.
Saving lastmods ... done
Generating word site ... done
index process finished.
what i want to do with aspseek:
we run an internet protal called
http://www.wuppertal-navigator.de
this site collects information about the city of wuppertal and tries to link to (all) websites that are connected
with the subject "Wuppertal" (sites that were created by people living in wuppertal, business sites of companies
located in wuppertal, and so on).
to do this we have a search-engine "Suche Wuppertal", with at about 2800 different URLs to search. i now want
to setup aspseek to index those sites. there are several sites with a FQDN (like www.xxxx.de) but on the other
hand - and this is the problem - there are also sites where we only want to index part of it, eg. members.aol.com/yyy
or geocities or tripod or such things. so my question: how do i have to setup aspseek.conf to do this?
is it possible in the end to do with aspseek?
i have tried to use different options but so far i could get it to work...
Markus Rietzler
* kommunikation & online service
* RZF NRW
* Tel: 0211.4572-130
-----Urspr�ngliche Nachricht-----
Von: Thomas 'Balu' Walter [mailto:[EMAIL PROTECTED]]
Gesendet am: Montag, 25. Juni 2001 16:16
An: [EMAIL PROTECTED]
Betreff: Re: [aseek-users] SERVER command in aspseek.conf
+-Kir Kolyshkin-([EMAIL PROTECTED])-[25.06.01 16:11]:
> Thomas 'Balu' Walter �����(�):
> >
> > But it will start crawling on a specified file e.g.:
> > Server http://dom.ain/index_en.html
> > and if there are no links into other parts of the site it will not
> > index e.g. the german part too, correct?
>
> Right, but usually this is not the case.
Actually it is :) *phew*
The pages are generated by a content-management-system - there is only
one index.html that points to the two languages:
index_de.html and
index_en.html
using there structures. No way back - nor a cross-link.
Balu
