[ 
https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782415#comment-16782415
 ] 

Sebastian Nagel commented on NUTCH-2292:
----------------------------------------

Hi [~lewismc], hi [~thammegowda],
I've tried to rebase the branch to the recent master, it wasn't successful: 
there are more than 40 conflicts already for the first commit on NUTCH-2292. I 
gave up after one hour trying to resolve them, I expect more conflicts coming 
up with the next commits applied on top of the current master. Even if the 
conflicts are resolved we would need to port manually all changes to ant and 
ivy files (all dependency upgrades and additions). Given that the last 
merge/rebase of master and NUTCH-2292 dates 2 years back and there have been 
significant changes to master since then (mainly NUTCH-2375, NUTCH-2583 and 
many new plugins), it might be less painful to pick the pom.xml and other new 
files and do the remaining changes again, mainly move source files into the 
directories required by Maven.

I have a couple of questions:
- [10b23cd0](https://github.com/commoncrawl/nutch/commit/10b23cd0) removes all 
calls of {{System.exit(jobResult)}}. How are the results of the Nutch jobs 
(inject, generate, fetch, parse, index, etc.) signalized to the calling script 
(bin/crawl) to control the crawl flow? If a step fails the workflow needs to be 
stopped. Some tools even signalize by the exit value that there is nothing to 
do, e.g. if the fetch list is empty.
- are there any notes how to proceed? Are the following points are still open, 
resp. is there a draft/plan/implementation how to address them?
-- NUTCH-2293 (integration test), e.g.,  cannot seriously test Fetcher without 
a protocol plugin already built
-- selection of plugins for installation
-- build the binary package
-- assemble the job file (required if run on Hadoop)

Thanks, great work! Would be good to have it finally, esp. given that the ant 
builds on Jenkins are failing most of the time (cf. NUTCH-2669, NUTCH-2672, 
NUTCH-2697)

> Mavenize the build for nutch-core and nutch-plugins
> ---------------------------------------------------
>
>                 Key: NUTCH-2292
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2292
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build
>            Reporter: Thamme Gowda
>            Assignee: Thamme Gowda
>            Priority: Major
>             Fix For: 1.16
>
>
> Convert the build system of  nutch-core as well as plugins to Apache Maven.
> *Plan :*
> Create multi-module maven project with the following structure
> {code}
> nutch-parent
>   |-- pom.xml (POM)
>   |-- nutch-core
>   |       |-- pom.xml (JAR)
>   |       |--src    : sources
>   |-- nutch-plugins
>           |-- pom.xml (POM)
>           |-- plugin1
>           |    |-- pom.xml (JAR)
>           | .....
>           |-- pluginN
>                |-- pom.xml (JAR)
> {code}
> NOTE: watch out for cyclic dependencies bwteen nutch-core and plugins, 
> introduce another POM to break the cycle if required.
>          



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to