Problem about method Query#query()

2005-11-10 Thread Game Now
Hi all, I pass some strings to org.apache.nutch.searcher.Query#parse() method, but I got difference result like below: parameter string: area:XX, returnedQuery.toString() is: area:XX. parameter string: subarea:YY, returnedQuery.toString() is: subarea YY. (Note: the ':' disappearred) parameter

Do nutch help me?

2005-11-10 Thread Arun Kumar Sharma
Hi All, I want to know how nutch fits into my requirements and how best I can expolit its features? Requirement: Nutch is designed to be crawl the information system on internet and intranet. My requirement is that it crawl information present anywhere? Do nutch suitable for me ? What I

Re: Lucene or Nutch

2005-11-10 Thread Jérôme Charron
I would be disappointed by this move - language identifier is an important component in Nutch. Now the mere fact that it's bundled with Nutch encourages its proper maintenance. If there is enough drive in terms of willingness and long-term commitment it would make sense to move it to a

Re: Lucene or Nutch

2005-11-10 Thread Andrzej Bialecki
Jérôme Charron wrote: I would be disappointed by this move - language identifier is an important component in Nutch. Now the mere fact that it's bundled with Nutch encourages its proper maintenance. If there is enough drive in terms of willingness and long-term commitment it would make

Re: What is suitable environment?

2005-11-10 Thread Stefan Groschupf
Hi, see http://wiki.apache.org/nutch/GettingNutchRunningWithWindows HTH Stefan Am 10.11.2005 um 06:44 schrieb KAAS INFOTECH: Hi All, I am new to nutch. I have downloaded latest nutch-0.7.1. I have Microsoft window install on my PC with Java home Set. I came to know that cgywin is require

Re: Problem about method Query#query()

2005-11-10 Thread Stefan Groschupf
Hi, Do you have any query filter installed? Stefan Am 10.11.2005 um 09:37 schrieb Game Now: Hi all, I pass some strings to org.apache.nutch.searcher.Query#parse() method, but I got difference result like below: parameter string: area:XX, returnedQuery.toString() is: area:XX. parameter

Re: Do nutch help me?

2005-11-10 Thread Arun Kaundal
Hi I want to crawl local files, internet/intranet documents/files. Do u think nutch help me in this case? Do I need some additions/extension in the functionality of nutch? On 11/10/05, Stefan Groschupf [EMAIL PROTECTED] wrote: Yes, nutch can crawl webpages and you can soemhow limit the

Max Per Host and topN

2005-11-10 Thread Rod Taylor
It seems maxPerHost could cause us not to fill each segment to topN even when there are more than enough URLs for this job. We should only count URLs we keep instead of all URLs considered. There were also two variables named count which is probably bad form (not a Java person, but it certainly

Re: Max Per Host and topN

2005-11-10 Thread Stefan Groschupf
+1 Am 10.11.2005 um 19:03 schrieb Rod Taylor: Generator.java.patch --- company:http://www.media-style.com forum:http://www.text-mining.org blog:http://www.find23.net

Re: [Nutch Wiki] Update of PluginCentral by JakeVanderdray

2005-11-10 Thread Stefan Groschupf
Hi Jake, take a look here http://wiki.media-style.com/display/nutchDocu/Why+nutch+has+a+plugin +system This short text already mentioned why a nutch as a plugin system :) Stefan Am 10.11.2005 um 20:04 schrieb Apache Wiki: Dear Wiki user, You have subscribed to a wiki page or wiki category on

Re: Do nutch help me?

2005-11-10 Thread Paul Baclace
Arun Kaundal wrote: Hi I want to crawl local files, internet/intranet documents/files. Do u think nutch help me in this case? Although the tutorial describes these separately, conf/crawl-urlfilter.txt can allow any combination of Internet, Intranet, and local filesystem crawling.

Re: problem with inject url on mapred

2005-11-10 Thread Paul Baclace
[regarding mapred ver 0.8] Anton Potehin wrote: I tried to launch mapred on 2 machines: 192.168.0.250 and 192.168.0.111. 051123 053136 task_m_xaynqo -14885.741% /user/root/seeds/urls:31+31 Please help me to find out what the problem is? And what I did wrong? Is the problem the negative

Re: Max Per Host and topN

2005-11-10 Thread Doug Cutting
Rod Taylor wrote: It seems maxPerHost could cause us not to fill each segment to topN even when there are more than enough URLs for this job. We should only count URLs we keep instead of all URLs considered. There were also two variables named count which is probably bad form (not a Java

Re: Lucene or Nutch

2005-11-10 Thread Sami Siren
Jérôme Charron wrote: jar. A short-term solutions could be to move the core classes (which have no dependencies on nutch) to a new lib-plugin (lib-lang for instance and adding a dependecy to this plugin in the language-identifier), so that this code could be used as a standalone lib. Are you

[jira] Commented: (NUTCH-99) ports are hardcoded or random

2005-11-10 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12357291 ] Doug Cutting commented on NUTCH-99: --- I cannot get patch on linux to accept this. The absolute DOS paths seem to cause problems. Can you please regenerate this with relative

Re: Lucene or Nutch

2005-11-10 Thread Doug Cutting
Andrzej Bialecki wrote: I would be disappointed by this move - language identifier is an important component in Nutch. Now the mere fact that it's bundled with Nutch encourages its proper maintenance. If there is enough drive in terms of willingness and long-term commitment it would make sense

[jira] Commented: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-11-10 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=comments#action_12357300 ] [EMAIL PROTECTED] commented on NUTCH-110: - Scrub NUTCH-110-version2.patch. This patch double-encode certain entities (First by the new toValidXmlText method, second by

Re: Fetch not finishing everything in its list?

2005-11-10 Thread Rod Taylor
On Thu, 2005-11-10 at 21:58 -0500, Rod Taylor wrote: As you scan see from the below the %age complete is very low until all of a sudden it jumps to fully complete. This started happening with some segments about a week ago. Others go through their full list of ~10 000 urls. It appears to occur

Re: Do nutch help me?

2005-11-10 Thread Arun Kaundal
Is it possible for u provide me link for that tutorial? How I can modify the conf/crawl-urlfilter.txt file to allow local filesystem crwaling ? On 11/11/05, Paul Baclace [EMAIL PROTECTED] wrote: Arun Kaundal wrote: Hi I want to crawl local files, internet/intranet documents/files. Do u

Re: Problem about method Query#query()

2005-11-10 Thread Game Now
Oh yes, I do not create any query filter for subarea and publishdate. Thank you, Stefan! On 11/10/05, Stefan Groschupf [EMAIL PROTECTED] wrote: Hi, Do you have any query filter installed? Stefan Am 10.11.2005 um 09:37 schrieb Game Now: Hi all, I pass some strings to