Lewis John McGibbney created BIGTOP-4518:
--------------------------------------------
Summary: Integrate Apache Nutch into Bigtop
Key: BIGTOP-4518
URL: https://issues.apache.org/jira/browse/BIGTOP-4518
Project: Bigtop
Issue Type: New Feature
Reporter: Lewis John McGibbney
I'd like to introduce [Apache Nutch|https://nutch.apache.org/] into the Bigtop
ecosystem so it can be built, packaged, deployed, and smoke-tested as a stack
component that runs on a Hadoop cluster using the deploy runtime and HDFS.
Nutch is a highly extensible, highly scalable, matured, production-ready Web
crawler which enables fine grained configuration and accommodates a wide
variety of data acquisition tasks. Nutch relies on Apache Hadoop data
structures, Nutch is great for batch processing large data volumes via
MapReduce jobs but can also be tailored to smaller jobs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)