Hi Lewis,

thanks! Since Common Crawl runs Nutch on a Bigtop cluster
any deeper integration into the Bigtop ecosystem is very welcome.
The smoke tests are maybe the most useful part. Every time
Bigtop is updated, it takes a while to verify that Nutch and all
it's plugins are running smoothly.

But I have no good idea about packaging. All the Bigtop packages
are infrastructure providing core components or services. The
way how Nutch is used and deployed on a Hadoop cluster is all
in the "user space": jar and configuration files are specific
for this particular Nutch job setup and their classpath does not
interfere with that of other jobs.

"Nutch server" would go easier as a Bigtop package. But Nutch
server was never designed to run on a cluster, just in local mode.

~Sebastian

Reply via email to