Lewis John McGibbney created BIGTOP-4518:
--------------------------------------------

             Summary: Integrate Apache Nutch into Bigtop
                 Key: BIGTOP-4518
                 URL: https://issues.apache.org/jira/browse/BIGTOP-4518
             Project: Bigtop
          Issue Type: New Feature
            Reporter: Lewis John McGibbney


I'd like to introduce [Apache Nutch|https://nutch.apache.org/] into the Bigtop 
ecosystem so it can be built, packaged, deployed, and smoke-tested as a stack 
component that runs on a Hadoop cluster using the deploy runtime and HDFS.
Nutch is a highly extensible, highly scalable, matured, production-ready Web 
crawler which enables fine grained configuration and accommodates a wide 
variety of data acquisition tasks. Nutch relies on Apache Hadoop data 
structures, Nutch is great for batch processing large data volumes via 
MapReduce jobs but can also be tailored to smaller jobs.
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to