[ 
https://issues.apache.org/jira/browse/OOZIE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029573#comment-16029573
 ] 

Clay B. commented on OOZIE-2876:
--------------------------------

Hi [~gezapeti], good question!

My team maintains Hadoop cluster infrastructure for all of Bloomberg (a 
multi-thousand user software engineering community). In the past, we have 
offered pre-written Java action code for interacting with Git which was 
difficult for users to take advantage of. (A Java action requires an extra JAR 
to deal with and a new action type, as they are not Java developers, for them 
to debug -- with no error hand-handling.) We would prefer a more first-class 
experience for deploying Hadoop code to production clusters to make using 
Hadoop easier for our application teams. We have a very dev-ops-y model here 
where if you write it, you also own it in production.

We see Oozie as a workflow manager for our Hadoop clusters and similarly a 
central point for privilege escalation. This puts Oozie as the logical place to 
implement a CD pipeline -- without having to distribute production kerberos 
credentials or build a bespoke Jenkins infrastructure to do Hadoop cluster 
(deployment) workflow management.

For more thoughts, I presented at ApacheCon Big Data week before last to get 
feedback. I found a few other users who saw value to aide in CD but sadly there 
was no video taken to record the presentation. The key part is how do you:
* Empower teams to deploy their own code
* Deploy code in a highly-available model
* Provide a model where a cluster ensures it has all necessary code (limit 
external dependencies)
* Not distribute keytabs to application teams (limiting ability to login to 
production systems)

I'm happy to walk anyone through the 
[slides|http://events.linuxfoundation.org/sites/events/files/slides/ApacheCon%20Oozie%20CD_0.pdf]
 over the phone if that would help too.

> Provide deployment primitives
> -----------------------------
>
>                 Key: OOZIE-2876
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2876
>             Project: Oozie
>          Issue Type: Wish
>          Components: action
>            Reporter: Clay B.
>            Priority: Minor
>
> Today one can schedule workflows which run on a cluster using many 
> pre-existing artifacts. However, today there are no helpful primitives for 
> deploying those artifacts to the cluster.
> For example, one may use a Spark JAR (hosted in a Maven repository) stored on 
> a cluster's HDFS to talk with data stored in a Hive table (the schema of 
> which is likely tracked in a source code management system somewhere) and use 
> JDBC to talk to an arbitrary database off the cluster. (As to which database 
> being configured based on the cluster being Dev, Beta or PROD all mapped in a 
> simple configuration file.) Further, the data may be verified by a simple Pig 
> script (also stored in a source code management repository).
> As a user, I'd like some way to get my binaries and ASCII configuration on a 
> cluster without significant ad hoc shell actions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to