lewismc opened a new pull request, #1380: URL: https://github.com/apache/bigtop/pull/1380
### Description of PR [https://issues.apache.org/jira/browse/BIGTOP-284](BIGTOP-284) seeks to introduce [Apache Nutch](https://nutch.apache.org) smoke tests into the Bigtop ecosystem. I commented on the original ticket way back in 2011 and never did anything about it. This PR seeks to address that. Nutch is a highly extensible, highly scalable, matured, production-ready Web crawler which enables fine grained configuration and accommodates a wide variety of data acquisition tasks. Nutch relies on Apache Hadoop data structures, Nutch is great for batch processing large data volumes via MapReduce jobs but can also be tailored to smaller jobs. ### How was this patch tested? Testing is ongoing. The goal is for the Nutch community to test this patch and hopefully update this thread with feedback. More details to follow. ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'BIGTOP-3638. Your PR title ...')? - [X] Make sure that newly added files do not have any licensing issues. When in doubt refer to https://www.apache.org/licenses/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
