[ https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Micah Whitacre updated CRUNCH-272: ---------------------------------- Attachment: CRUNCH-272.patch So here is a patch that extends the prototype but also includes a custom CrunchActionExecutor + schema for configuring Crunch in a workflow. The schema is pretty much a copy of the Java Action schema. This is still probably a prototype vs something that will be merged immediately. Some things to note with the patch: * Oozie does not deploy endstates to a Maven repository (https://issues.apache.org/jira/browse/OOZIE-1842), which means if we want to depend on Oozie we need to pick a distribution (rolled with Cloudera only b/c that is what I had access to right now). If we didn't own the CrunchActionExecutor then we wouldn't need that dependency. * There isn't a formal contract between the CrunchActionExecutor and what actually launches the Crunch Pipeline(s). They can use convenience methods to do the reporting or extend the CrunchOozieLauncher for convenience. This doesn't exactly align with most Oozie executors which have a more formal contract is seems. Do we want to start controlling how pipelines are launched? [~rkanter], this is obviously my first stab at a Crunch Action in Oozie. if you have any suggestions or alternate routes to go with this I'd be interested in hearing them. > Unable to correlate crunch jobs within Oozie > -------------------------------------------- > > Key: CRUNCH-272 > URL: https://issues.apache.org/jira/browse/CRUNCH-272 > Project: Crunch > Issue Type: Improvement > Reporter: Mike Zimmerman > Assignee: Micah Whitacre > Attachments: CRUNCH-272.patch, CRUNCH-272_prototype.patch > > > I'm not really sure if this should be logged to Oozie or to Crunch, so please > feel free to move as needed. > I would like to request a way to decorate map/reduce jobs that are spawned by > a Crunch pipeline so that I can programmatically determine their origin. The > primary use case for this is integration with Oozie. Oozie launches a single > map job to run a java action (in our case this java action runs a crunch > job). Traceability from this original "launcher" job to the jobs created by > the crunch job is impossible without trolling logs. This leaves a big black > hole for the system operator to assess the performance/impact of these jobs. > My initial thought was to provide a simple way to indicate a correlationId or > similar on a map/reduce job and then make it accessible within Oozie to query > for. Obviously, that request would have to come after the correlation > feature was available within map/reduce. -- This message was sent by Atlassian JIRA (v6.2#6252)