Yes, WebSphinx is not very scalable, as it stores data about each page in memory, and even stores parent-child page relationships in memory. Why not use the web crawler in Lucene Sandbox? The link is on Lucene's home page.
Otis --- Mike Tinnes <[EMAIL PROTECTED]> wrote: > > I've been using webSphinx with Lucene by simply extending the Crawler > class > and placing my Lucene code in the overridden 'visit' method. Seems to > work, > but I've encountered problems with 'OutOfMemory' errors when crawling > large > sites with 512mb and also using the -Xmx VM args. The sphinx faq > mentions > the problem, but the recommened fixes don't seem to help. Alas I've > resorted > to implementing a custom crawler. > > > ----- Original Message ----- > From: "A Rambocus" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Thursday, June 27, 2002 7:55 AM > Subject: SPIDER /CRAWLERS /ROBOTS with lucene > > > > > > Hello all does anyone know how to integrate th eWebSphinx with > lucene... > > - the code previous distributed on this list does not work! > > > > I am currently trying spindle....... > > > > but does anyone know if lucene could be used to support image > indexing > > since this would be very helpful!! > > > > Cheers > > > > Ajay R > > > > > > -- > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
