Content-Length limit, URL filter and few minor issues -----------------------------------------------------
Key: NUTCH-950 URL: https://issues.apache.org/jira/browse/NUTCH-950 Project: Nutch Issue Type: Bug Affects Versions: 2.0 Reporter: Alexis 1. crawl command (nutch1.patch) The class was renamed to Crawler but the references to it were not updated. 2. URL filter (nutch2.patch) This avoids a NPE on bogus urls which host do not have a suffix. 3. Content-Length limit (nutch3.patch) This is related to NUTCH-899. The patch avoids the entire flush operation on the Gora datastore to crash because the MySQL blob limit was exceeded by a few bytes. Both protocol-http and protocol-httpclient plugins were problematic. 4. Ivy configuration (nutch4.patch) - Change xercesImpl and restlet versions. These 2 version changes are required. The first one currently makes a JUnit test crash, the second one is missing in default Maven repository. - Add gora-hbase, zookeeper which is an HBase dependency. Add MySQL connector. These jars are necesary to run Gora with HBase or MySQL datastores. (more a suggestion that a requirement here) - Add com.jcraft/jsch, which is a protocol-sftp plugin dependency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.