robots.txt redirect (NUTCH-124)

2009-03-20 Thread Mathijs Homminga
Hi everybody, Can someone shine a light on NUTCH-124: RobotRulesParser.java doesn't follow redirects when requesting the robots.txt file. Doug patched this, but that didn't make it to the trunk. What is the wished behavior here? For example, when requesting the following url:

[jira] Commented: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum

2009-03-20 Thread Edwin Chu (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683772#action_12683772 ] Edwin Chu commented on NUTCH-702: - I have encountered OutOfMemoryError in CrawlDBReducer

Re: [DISCUSS] contents of nutch release artifact

2009-03-20 Thread Doğacan Güney
On Thu, Mar 19, 2009 at 23:46, Sami Siren ssi...@gmail.com wrote: Sami Siren wrote: Andrzej Bialecki wrote: How about the following: we build just 2 packages: * binary: this includes only base hadoop libs in lib/ (enough to start a local job, no optional filesystems etc), the *.job and

[jira] Commented: (NUTCH-728) Improve nutch release packaging

2009-03-20 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683813#action_12683813 ] Doğacan Güney commented on NUTCH-728: - Is there a particular reason that repository is

[jira] Commented: (NUTCH-728) Improve nutch release packaging

2009-03-20 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683814#action_12683814 ] Sami Siren commented on NUTCH-728: -- not really, it just happens to be the mirror I use.

[jira] Commented: (NUTCH-728) Improve nutch release packaging

2009-03-20 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683831#action_12683831 ] Doğacan Güney commented on NUTCH-728: - OK. I tested it, it works fine. +1 Improve

Nutch on Eclipse How To?

2009-03-20 Thread Sherjeel Niazi
Hi there, I want to configure nutch on Eclipse. Can you plz help me that how can I do so? From where can I download the code, jar files etc. Thanks, Sherjeel.

Re: Nutch on Eclipse How To?

2009-03-20 Thread Bartosz Gadzimski
Sherjeel Niazi pisze: Hi there, I want to configure nutch on Eclipse. Can you plz help me that how can I do so? From where can I download the code, jar files etc. Thanks, Sherjeel. Windows or linux ?

Re: Nutch on Eclipse How To?

2009-03-20 Thread Sherjeel Niazi
I am working on Windows.

Re: Nutch on Eclipse How To?

2009-03-20 Thread Bartosz Gadzimski
Sherjeel Niazi pisze: I am working on Windows. Ok, so you have to download: cygwin: http://www.cygwin.com/setup.exe nutch (from trunk) http://hudson.zones.apache.org/hudson/job/Nutch-trunk/758/artifact/trunk/build/nutch-2009-03-20_04-01-47.tar.gz Install cygwin and set PATH variable for it.

[Nutch Wiki] Update of RunNutchInEclipse0.9 by BartoszGadzimski

2009-03-20 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The following page has been changed by BartoszGadzimski: http://wiki.apache.org/nutch/RunNutchInEclipse0%2e9 The comment on the change is: added description for Windows users

Problems compiling Nutch in Eclipse

2009-03-20 Thread Rodrigo Reyes C.
Hi I have configured my eclipse project as stated here http://wiki.apache.org/nutch/RunNutchInEclipse0.9 Still, I am getting the following errors: - The return type is incompatible with Parser.getParse(Content) RTFParseFactory.java

Re: Problems compiling Nutch in Eclipse

2009-03-20 Thread Ninad Raut
Check out my blog : http://j2eewebsearch.blogspot.com/ Check out the third point... Let me know if you you get it all right. Your comments will be appreciated. Regards, Ninad On Sat, Mar 21, 2009 at 6:32 AM, Rodrigo Reyes C. rre...@corbitecso.comwrote: Hi I have configured my eclipse