Well, the setup is pretty simple actually... it's a standard database like
any other search engine (based on the Google structure), but I've added some
extra status tables. Each page is in a certain status (0 to 4) :
0 = not retrieved
1 = retrieved, not explored for links
2 = retrieved, links retrieved, not indexed
3 = indexed
4 = finalized

8 servers take pages from 0 to 1
5 servers take pages from 1 to 2
5 servers take pages from 2 to 3
1 server takes pages from 3 to 4 after all pages have been processed to
status 3

The script is completely custom-built and currently not available for
download. We haven't decided yet whether it will be GPL'ed... since I'm not
using any GPL'ed-code, there is no real need to do it right away.

The site will appear at www.explore.be but the whole system is still under
development.

Greetings,

Wim



Paul Stewart wrote:

> Would you be willing to share some more details on this PHP setup?  What
> is the backend comprised of for spidering etc?
>
> How many machines and is there a URL that we could check things out at?
>
> Just curious...:)
>
> Paul
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]] On Behalf Of Wim Godden
> Sent: Monday, August 12, 2002 9:05 AM
> To: [EMAIL PROTECTED]
> Subject: Re: [aseek-users] More then 1 index server?
>
> Kir Kolyshkin wrote:
>
> > > I think he needs it for the same reason many of us would like that
> > > feature : one indexer is way too slow. If you want to index a whole
> > > ccTLD, it'll take you several months with aspseek.
> >
> > Hmm have you tried higher number after -N together with upgrading your
>
> > server to have more RAM and higher disk I/O throughput? Also, moving
> > MySQL to separate box, and searchd to another separate box helps a
> > lot. Actually s.cgi can be put on "yet another" box (I'm not sure if
> > this will help), so you will end up with four machines.
>
> Well, even that won't do... I tried running up to 500 threads, but that
> simply slowed everything down even more. MySQL is on a seperate box and
> s.cgi is not required yet, because I'm still indexing and not providing
> search access yet.
>
> > Also, PageRanks will be computed separately
> > for two indexes, which is not a good thing.
>
> Indeed... might as well tell your visitors the search results aren't
> good.
>
> Anyway, not a problem for me anymore, since I've stopped using aspseek
> and built my own system in PHP, so I can spread the load over our
> webserver farm... works like a charm !
>
> Greetings,
>
> Wim
> --
> ------
> 11 EURO (incl. BTW) voor een .be domein ! Tijdelijk aanbod tot 1
> september ! Snel naar http://domain.firstlinknetworks.com !
> --
> Adverteren.be - 100% Nederlandstalig adverteren op kwalitatief
> hoogstaande sites !

--
------
11 EURO (incl. BTW) voor een .be domein ! Tijdelijk aanbod tot 1 september !
Snel naar http://domain.firstlinknetworks.com !
--
Adverteren.be - 100% Nederlandstalig adverteren op kwalitatief hoogstaande
sites !


Reply via email to