to implement search. They use HBase which is, by
the way, Nutch 2.0 compatible.
Take at look:
http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#22 (sorry I
don't think any video of the summit is available yet, not sure why)
Alexis
On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche
Hi Tom,
I'm having the same issue.
The two missing jars in the nutch-2.0-dev.job, cassandra-all-0.8.0.jar
and hector-core-0.8.0-1.jar, have been manually uploaded for the Gora
build to work into gora-cassandra/lib-ext SVN directory, because for
some reason I did not get them downloaded through
Hi, libthrift is a dependency of cassandra-thrift, as listed here:
http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8.1
During Nutch build, you have to manually tweak the Ivy configuration
depending on your choice of the Gora store, in this case Cassandra.
Basically you
the hector dependency:
dependency org=me.prettyprint name=hector-core rev=0.8.0-2
conf=*-default/
-Original Message-
From: Alexis [mailto:alexis.detregl...@gmail.com]
Sent: Monday, August 01, 2011 2:28 PM
To: dev@nutch.apache.org
Subject: Re: Nutch 2 and Cassandra
Hi
[
https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-956:
-
Attachment: solr.patch2
- NPE related to content-type field
- tld field in Solr schema
- string comparison
[
https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-965:
-
Summary: Skip parsing for truncated documents (was: Parsing takes up 100%
CPU)
Skip parsing for truncated
[
https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-965:
-
Attachment: parserJob.patch
In the parser mapper, compare Content-Length header to the size of the content
[
https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983125#action_12983125
]
Alexis commented on NUTCH-955:
--
Sorry please disregard the nutch.root first bullet
soldindex issues
Key: NUTCH-956
URL: https://issues.apache.org/jira/browse/NUTCH-956
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 2.0
Reporter: Alexis
I ran into a few
[
https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-956:
-
Attachment: solr.patch
Here are the changes:
- Avoid multiple values for id field. (NUTCH-819)
- Allow multiple
Ivy configuration
-
Key: NUTCH-955
URL: https://issues.apache.org/jira/browse/NUTCH-955
Project: Nutch
Issue Type: Improvement
Components: build
Affects Versions: 2.0
Reporter: Alexis
As mentioned
[
https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-955:
-
Attachment: ivy.patch
In the patch, the required dependencies for MySQL and HBase are included in the
Ivy config
Reporter: Alexis
1. crawl command (nutch1.patch)
The class was renamed to Crawler but the references to it were not updated.
2. URL filter (nutch2.patch)
This avoids a NPE on bogus urls which host do not have a suffix.
3. Content-Length limit (nutch3.patch)
This is related to NUTCH-899
[
https://issues.apache.org/jira/browse/NUTCH-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-950:
-
Attachment: nutch4.patch
Content-Length limit, URL filter and few minor issues
[
https://issues.apache.org/jira/browse/NUTCH-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-899:
-
Attachment: httpContentLimit.patch
We stick with the default gora schema for the MySQL backend, which says
for it). The whole patch
is here:
https://issues.apache.org/jira/secure/attachment/12466548/httpContentLimit.patch
Alexis
[
https://issues.apache.org/jira/browse/NUTCH-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970336#action_12970336
]
Alexis commented on NUTCH-899:
--
I ran into the exact same issue, with MySQL. The blob column
)
@@ -174,6 +174,7 @@
} else {
currentJob.setNumReduceTasks(numTasks);
}
+currentJob.waitForCompletion(true);
ToolUtil.recordJobStatus(null, currentJob, results);
return results;
}
Alexis
18 matches
Mail list logo