[
https://issues.apache.org/jira/browse/NUTCH-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978832#action_12978832
]
Julien Nioche commented on NUTCH-950:
-------------------------------------
Have committed the first 3 sub-issues.
Regarding the last one, I haven't tested the first point (version changes) but
here are a few comments about the other issues :
* Hbase + MySQL : these backends should not be provided by default, same for
the MySQL connector. One option would be to add them to the ivy file but
comment them out and give a bit of an explanation e.g. "uncomment this if you
want to use xxx as a GORA backend"
* the dependency com.jcraft/jsch should be placed in the ivy file of the
corresponding plugin, not in the main one
Alexis, could you please create a new issue for this then mark this issue as
resolved? Having a single JIRA number for completely separated issues is a bad
idea and does not help keeping things in sync with the svn commits.
Thanks a lot for your contributions
Julien
> Content-Length limit, URL filter and few minor issues
> -----------------------------------------------------
>
> Key: NUTCH-950
> URL: https://issues.apache.org/jira/browse/NUTCH-950
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 2.0
> Reporter: Alexis
> Attachments: nutch1.patch, nutch2.patch, nutch3.patch, nutch4.patch
>
>
> 1. crawl command (nutch1.patch)
> The class was renamed to Crawler but the references to it were not updated.
> 2. URL filter (nutch2.patch)
> This avoids a NPE on bogus urls which host do not have a suffix.
> 3. Content-Length limit (nutch3.patch)
> This is related to NUTCH-899.
> The patch avoids the entire flush operation on the Gora datastore to crash
> because the MySQL blob limit was exceeded by a few bytes. Both protocol-http
> and protocol-httpclient plugins were problematic.
> 4. Ivy configuration (nutch4.patch)
> - Change xercesImpl and restlet versions. These 2 version changes are
> required. The first one currently makes a JUnit test crash, the second one is
> missing in default Maven repository.
> - Add gora-hbase, zookeeper which is an HBase dependency. Add MySQL
> connector. These jars are necesary to run Gora with HBase or MySQL
> datastores. (more a suggestion that a requirement here)
> - Add com.jcraft/jsch, which is a protocol-sftp plugin dependency.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.