Hi all,
I pass some strings to org.apache.nutch.searcher.Query#parse() method,
but I got difference result like below:
parameter string: area:XX, returnedQuery.toString() is: area:XX.
parameter string: subarea:YY, returnedQuery.toString() is: subarea YY.
(Note: the ':' disappearred)
parameter
Hi All,
I want to know how nutch fits into my requirements and how best I can expolit
its features?
Requirement:
Nutch is designed to be crawl the information system on internet and intranet.
My requirement is that it crawl information present anywhere? Do nutch suitable
for me ? What I
I would be disappointed by this move - language identifier is an
important component in Nutch. Now the mere fact that it's bundled with
Nutch encourages its proper maintenance. If there is enough drive in
terms of willingness and long-term commitment it would make sense to
move it to a
Jérôme Charron wrote:
I would be disappointed by this move - language identifier is an
important component in Nutch. Now the mere fact that it's bundled
with Nutch encourages its proper maintenance. If there is enough
drive in terms of willingness and long-term commitment it would
make
Hi,
see
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows
HTH
Stefan
Am 10.11.2005 um 06:44 schrieb KAAS INFOTECH:
Hi All,
I am new to nutch. I have downloaded latest nutch-0.7.1. I have
Microsoft
window install on my PC with Java home Set. I came to know that
cgywin is
require
Hi,
Do you have any query filter installed?
Stefan
Am 10.11.2005 um 09:37 schrieb Game Now:
Hi all,
I pass some strings to org.apache.nutch.searcher.Query#parse() method,
but I got difference result like below:
parameter string: area:XX, returnedQuery.toString() is: area:XX.
parameter
Hi
I want to crawl local files, internet/intranet documents/files. Do u think
nutch help me in this case?
Do I need some additions/extension in the functionality of nutch?
On 11/10/05, Stefan Groschupf [EMAIL PROTECTED] wrote:
Yes, nutch can crawl webpages and you can soemhow limit the
It seems maxPerHost could cause us not to fill each segment to topN even
when there are more than enough URLs for this job.
We should only count URLs we keep instead of all URLs considered.
There were also two variables named count which is probably bad form
(not a Java person, but it certainly
+1
Am 10.11.2005 um 19:03 schrieb Rod Taylor:
Generator.java.patch
---
company:http://www.media-style.com
forum:http://www.text-mining.org
blog:http://www.find23.net
Hi Jake,
take a look here
http://wiki.media-style.com/display/nutchDocu/Why+nutch+has+a+plugin
+system
This short text already mentioned why a nutch as a plugin system :)
Stefan
Am 10.11.2005 um 20:04 schrieb Apache Wiki:
Dear Wiki user,
You have subscribed to a wiki page or wiki category on
Arun Kaundal wrote:
Hi
I want to crawl local files, internet/intranet documents/files. Do u think
nutch help me in this case?
Although the tutorial describes these separately,
conf/crawl-urlfilter.txt can allow any combination of
Internet, Intranet, and local filesystem crawling.
[regarding mapred ver 0.8]
Anton Potehin wrote:
I tried to launch mapred on 2 machines: 192.168.0.250 and 192.168.0.111.
051123 053136 task_m_xaynqo -14885.741% /user/root/seeds/urls:31+31
Please help me to find out what the problem is? And what I did wrong?
Is the problem the negative
Rod Taylor wrote:
It seems maxPerHost could cause us not to fill each segment to topN even
when there are more than enough URLs for this job.
We should only count URLs we keep instead of all URLs considered.
There were also two variables named count which is probably bad form
(not a Java
Jérôme Charron wrote:
jar. A short-term solutions could be to move the core classes (which have no
dependencies on
nutch) to a new lib-plugin (lib-lang for instance and adding a dependecy to
this plugin in the
language-identifier), so that this code could be used as a standalone lib.
Are you
[
http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12357291 ]
Doug Cutting commented on NUTCH-99:
---
I cannot get patch on linux to accept this. The absolute DOS paths seem to
cause problems. Can you please regenerate this with relative
Andrzej Bialecki wrote:
I would be disappointed by this move - language identifier is an
important component in Nutch. Now the mere fact that it's bundled with
Nutch encourages its proper maintenance. If there is enough drive in
terms of willingness and long-term commitment it would make sense
[
http://issues.apache.org/jira/browse/NUTCH-110?page=comments#action_12357300 ]
[EMAIL PROTECTED] commented on NUTCH-110:
-
Scrub NUTCH-110-version2.patch. This patch double-encode certain entities
(First by the new toValidXmlText method, second by
On Thu, 2005-11-10 at 21:58 -0500, Rod Taylor wrote:
As you scan see from the below the %age complete is very low until all
of a sudden it jumps to fully complete. This started happening with some
segments about a week ago. Others go through their full list of ~10 000
urls. It appears to occur
Is it possible for u provide me link for that tutorial? How I can modify the
conf/crawl-urlfilter.txt file to allow local filesystem crwaling ?
On 11/11/05, Paul Baclace [EMAIL PROTECTED] wrote:
Arun Kaundal wrote:
Hi
I want to crawl local files, internet/intranet documents/files. Do u
Oh yes, I do not create any query filter for subarea and publishdate.
Thank you, Stefan!
On 11/10/05, Stefan Groschupf [EMAIL PROTECTED] wrote:
Hi,
Do you have any query filter installed?
Stefan
Am 10.11.2005 um 09:37 schrieb Game Now:
Hi all,
I pass some strings to
20 matches
Mail list logo