RE: distributed deployment

2005-05-18 Thread Rajendra Patil
Yes. We are using the same version. ~Rajendra -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 17, 2005 11:49 PM To: nutch-user@incubator.apache.org Subject: Re: distributed deployment Rajendra Patil wrote: I am trying to deploy nutch on multiple

Charset encoding

2005-05-18 Thread k-team
hi guys, we have indexed some pages and noticed that the results of the search are not interpreted correctly by our browser. the encoding in search.jsp is utf-8 and the browser is set to utf-8 encoding, but we obtain strange chars. we have also set parser.character.encoding.default

Re: Charset encoding

2005-05-18 Thread Andy Liu
Sometimes web pages do not identify the encoding the page is in. In these cases, the client has to guess the encoding. Nutch currently does not have a guessing algorithm, so if it encounters one of these pages, it just decodes the page using the parser.character.encoding.default parameter.

Re: Distributed installation

2005-05-18 Thread [EMAIL PROTECTED]
Dear Users! Firstly sorry my bad English. I read Stephans great documentation at http://wiki.media-style.com/display/nutchDocu/. I maked a frontend (P4 3 GByte RAM, Tomcat 5.5.7 java 1.4.08) with 3 backend with 12 million pages ( 4million / backend AMD64 4 GByte RAM 64 bit linux with jdk

Re: [Nutch-general] Re: Pre MapReduce Nutch release?

2005-05-18 Thread ogjunk-nutch
It all sounds reasonable and makes sense, thanks. Otis --- Doug Cutting [EMAIL PROTECTED] wrote: Otis Gospodnetic wrote: Would it be good to make one last release of Nutch before starting the MapReduce effort? This would give people the chance to grab this last, pre-MapReduce version

crawling PDF file with page links?

2005-05-18 Thread Jason Manfield
Can nutch (with its out-of-box PDFBox plugin) crawl PDF files, where each page is link (e.g. the URL appends PGN=pageNumber to go to the specific page)? On the browser, each page in the pdf file is loaded on demand basis. However when the content is fetched from the URL (from the code), it

How to fit index database in ram?

2005-05-18 Thread smith learner
Is there a way to fit whole index database in ram, if ram is big enough? If yes, how to do it? Regards, smith __ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/

Re: [Nutch-general] Re: Pre MapReduce Nutch release?

2005-05-18 Thread Byron Miller
Can't wait to try out the mapread stuff. Good luck in getting that branch up and running :) -Original Message- From: [EMAIL PROTECTED] To: nutch-user@incubator.apache.org Date: Wed, 18 May 2005 09:31:03 -0700 (PDT) Subject: Re: [Nutch-general] Re: Pre MapReduce Nutch release? It all

Re: Distributed installation

2005-05-18 Thread Stefan Groschupf
I notice similar behaviors. I guess the backend servers does not answering fast enough. I was thinking about to have multiple search server groups that have identical content and then query groups in a round robbing style. What people think about this idea? It is already easy to setup multiple