[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

Mike Zimmerman (JIRA) Wed, 02 Oct 2013 13:14:21 -0700

    [ 
https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784369#comment-13784369
 ]


Mike Zimmerman commented on CRUNCH-272:
---------------------------------------

Josh, I think that is a great first step.  Micah and I had a conversation 
offline about this JIRA after I logged it and we walked through my use cases in 
more detail.  My targeted users are system administrators and developers that 
are trying to monitor and tune oozie workflows running on the Hadoop cluster.  
The first part of the problem is figuring out a way to mark which jobs are 
involved in a higher level operation like a crunch job launched through an 
oozie workflow.  (Your suggestion may help do this.)  The second and more 
difficult part of the problem is locating these marked jobs after the parent 
process has completed.  My first thought is that it would be awesome if I could 
query the Job Tracker, by giving it a correlation id and have it return a list 
of qualifying jobs.  I don't believe this is possible today and the idea is 
also somewhat flawed by the fact that all data would be lost if the Job Tracker 
instance was restarted.  My second thought is to harvest the information 
through log data, but that seems like a lot of overhead and load on the cluster 
to do something that should be relatively simple.  My final thought is to write 
custom code to log this information out to a store that can be queried at the 
time the crunch job is executing.  Any recommendations you have are very much 
appreciated.  I believe the solution to this problem probably lies outside of 
the Crunch project, so if you need to close this issue please feel free to do 
so.

> Unable to correlate crunch jobs within Oozie
> --------------------------------------------
>
>                 Key: CRUNCH-272
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-272
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Mike Zimmerman
>
> I'm not really sure if this should be logged to Oozie or to Crunch, so please 
> feel free to move as needed.
> I would like to request a way to decorate map/reduce jobs that are spawned by 
> a Crunch pipeline so that I can programmatically determine their origin.  The 
> primary use case for this is integration with Oozie.  Oozie launches a single 
> map job to run a java action (in our case this java action runs a crunch 
> job).  Traceability from this original "launcher" job to the jobs created by 
> the crunch job is impossible without trolling logs.  This leaves a big black 
> hole for the system operator to assess the performance/impact of these jobs.  
> My initial thought was to provide a simple way to indicate a correlationId or 
> similar on a map/reduce job and then make it accessible within Oozie to query 
> for.  Obviously, that request would have to come after the correlation 
> feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

Reply via email to