Well, the setup is pretty simple actually... it's a standard database like any other search engine (based on the Google structure), but I've added some extra status tables. Each page is in a certain status (0 to 4) : 0 = not retrieved 1 = retrieved, not explored for links 2 = retrieved, links retrieved, not indexed 3 = indexed 4 = finalized
8 servers take pages from 0 to 1 5 servers take pages from 1 to 2 5 servers take pages from 2 to 3 1 server takes pages from 3 to 4 after all pages have been processed to status 3 The script is completely custom-built and currently not available for download. We haven't decided yet whether it will be GPL'ed... since I'm not using any GPL'ed-code, there is no real need to do it right away. The site will appear at www.explore.be but the whole system is still under development. Greetings, Wim Paul Stewart wrote: > Would you be willing to share some more details on this PHP setup? What > is the backend comprised of for spidering etc? > > How many machines and is there a URL that we could check things out at? > > Just curious...:) > > Paul > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]] On Behalf Of Wim Godden > Sent: Monday, August 12, 2002 9:05 AM > To: [EMAIL PROTECTED] > Subject: Re: [aseek-users] More then 1 index server? > > Kir Kolyshkin wrote: > > > > I think he needs it for the same reason many of us would like that > > > feature : one indexer is way too slow. If you want to index a whole > > > ccTLD, it'll take you several months with aspseek. > > > > Hmm have you tried higher number after -N together with upgrading your > > > server to have more RAM and higher disk I/O throughput? Also, moving > > MySQL to separate box, and searchd to another separate box helps a > > lot. Actually s.cgi can be put on "yet another" box (I'm not sure if > > this will help), so you will end up with four machines. > > Well, even that won't do... I tried running up to 500 threads, but that > simply slowed everything down even more. MySQL is on a seperate box and > s.cgi is not required yet, because I'm still indexing and not providing > search access yet. > > > Also, PageRanks will be computed separately > > for two indexes, which is not a good thing. > > Indeed... might as well tell your visitors the search results aren't > good. > > Anyway, not a problem for me anymore, since I've stopped using aspseek > and built my own system in PHP, so I can spread the load over our > webserver farm... works like a charm ! > > Greetings, > > Wim > -- > ------ > 11 EURO (incl. BTW) voor een .be domein ! Tijdelijk aanbod tot 1 > september ! Snel naar http://domain.firstlinknetworks.com ! > -- > Adverteren.be - 100% Nederlandstalig adverteren op kwalitatief > hoogstaande sites ! -- ------ 11 EURO (incl. BTW) voor een .be domein ! Tijdelijk aanbod tot 1 september ! Snel naar http://domain.firstlinknetworks.com ! -- Adverteren.be - 100% Nederlandstalig adverteren op kwalitatief hoogstaande sites !
