Re: [VOTE] Move 2.0 out of trunk

2011-09-19 Thread Alexis
to implement search. They use HBase which is, by the way, Nutch 2.0 compatible. Take at look: http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#22 (sorry I don't think any video of the summit is available yet, not sure why) Alexis On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche

Re: InvocationTargetException with Nutch 2.0 Gora 0.2 and Cassandra 0.8.4

2011-08-30 Thread Alexis
Hi Tom, I'm having the same issue. The two missing jars in the nutch-2.0-dev.job, cassandra-all-0.8.0.jar and hector-core-0.8.0-1.jar, have been manually uploaded for the Gora build to work into gora-cassandra/lib-ext SVN directory, because for some reason I did not get them downloaded through

Re: Nutch 2 and Cassandra

2011-08-01 Thread Alexis
Hi, libthrift is a dependency of cassandra-thrift, as listed here: http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8.1 During Nutch build, you have to manually tweak the Ivy configuration depending on your choice of the Gora store, in this case Cassandra. Basically you

Re: Nutch 2 and Cassandra

2011-08-01 Thread Alexis
the hector dependency:        dependency org=me.prettyprint name=hector-core rev=0.8.0-2 conf=*-default/ -Original Message- From: Alexis [mailto:alexis.detregl...@gmail.com] Sent: Monday, August 01, 2011 2:28 PM To: dev@nutch.apache.org Subject: Re: Nutch 2 and Cassandra Hi

[jira] [Updated] (NUTCH-956) solrindex issues

2011-07-12 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-956: - Attachment: solr.patch2 - NPE related to content-type field - tld field in Solr schema - string comparison

[jira] Updated: (NUTCH-965) Skip parsing for truncated documents

2011-02-10 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-965: - Summary: Skip parsing for truncated documents (was: Parsing takes up 100% CPU) Skip parsing for truncated

[jira] Updated: (NUTCH-965) Parsing takes up 100% CPU

2011-02-08 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-965: - Attachment: parserJob.patch In the parser mapper, compare Content-Length header to the size of the content

[jira] Commented: (NUTCH-955) Ivy configuration

2011-01-18 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983125#action_12983125 ] Alexis commented on NUTCH-955: -- Sorry please disregard the nutch.root first bullet

[jira] Created: (NUTCH-956) soldindex issues

2011-01-13 Thread Alexis (JIRA)
soldindex issues Key: NUTCH-956 URL: https://issues.apache.org/jira/browse/NUTCH-956 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 2.0 Reporter: Alexis I ran into a few

[jira] Updated: (NUTCH-956) soldindex issues

2011-01-13 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-956: - Attachment: solr.patch Here are the changes: - Avoid multiple values for id field. (NUTCH-819) - Allow multiple

[jira] Created: (NUTCH-955) Ivy configuration

2011-01-10 Thread Alexis (JIRA)
Ivy configuration - Key: NUTCH-955 URL: https://issues.apache.org/jira/browse/NUTCH-955 Project: Nutch Issue Type: Improvement Components: build Affects Versions: 2.0 Reporter: Alexis As mentioned

[jira] Updated: (NUTCH-955) Ivy configuration

2011-01-10 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-955: - Attachment: ivy.patch In the patch, the required dependencies for MySQL and HBase are included in the Ivy config

[jira] Created: (NUTCH-950) Content-Length limit, URL filter and few minor issues

2011-01-01 Thread Alexis (JIRA)
Reporter: Alexis 1. crawl command (nutch1.patch) The class was renamed to Crawler but the references to it were not updated. 2. URL filter (nutch2.patch) This avoids a NPE on bogus urls which host do not have a suffix. 3. Content-Length limit (nutch3.patch) This is related to NUTCH-899

[jira] Updated: (NUTCH-950) Content-Length limit, URL filter and few minor issues

2011-01-01 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-950: - Attachment: nutch4.patch Content-Length limit, URL filter and few minor issues

[jira] Updated: (NUTCH-899) java.sql.BatchUpdateException: Data truncation: Data too long for column 'content' at row 1

2010-12-18 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-899: - Attachment: httpContentLimit.patch We stick with the default gora schema for the MySQL backend, which says

Re: Does Nutch 2.0 in good enough shape to test?

2010-12-18 Thread Alexis
for it). The whole patch is here: https://issues.apache.org/jira/secure/attachment/12466548/httpContentLimit.patch Alexis

[jira] Commented: (NUTCH-899) java.sql.BatchUpdateException: Data truncation: Data too long for column 'content' at row 1

2010-12-10 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970336#action_12970336 ] Alexis commented on NUTCH-899: -- I ran into the exact same issue, with MySQL. The blob column

Fetch command returns immediately

2010-12-05 Thread Alexis
) @@ -174,6 +174,7 @@ } else { currentJob.setNumReduceTasks(numTasks); } +currentJob.waitForCompletion(true); ToolUtil.recordJobStatus(null, currentJob, results); return results; } Alexis