[
https://issues.apache.org/jira/browse/OOZIE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029573#comment-16029573
]
Clay B. commented on OOZIE-2876:
--------------------------------
Hi [~gezapeti], good question!
My team maintains Hadoop cluster infrastructure for all of Bloomberg (a
multi-thousand user software engineering community). In the past, we have
offered pre-written Java action code for interacting with Git which was
difficult for users to take advantage of. (A Java action requires an extra JAR
to deal with and a new action type, as they are not Java developers, for them
to debug -- with no error hand-handling.) We would prefer a more first-class
experience for deploying Hadoop code to production clusters to make using
Hadoop easier for our application teams. We have a very dev-ops-y model here
where if you write it, you also own it in production.
We see Oozie as a workflow manager for our Hadoop clusters and similarly a
central point for privilege escalation. This puts Oozie as the logical place to
implement a CD pipeline -- without having to distribute production kerberos
credentials or build a bespoke Jenkins infrastructure to do Hadoop cluster
(deployment) workflow management.
For more thoughts, I presented at ApacheCon Big Data week before last to get
feedback. I found a few other users who saw value to aide in CD but sadly there
was no video taken to record the presentation. The key part is how do you:
* Empower teams to deploy their own code
* Deploy code in a highly-available model
* Provide a model where a cluster ensures it has all necessary code (limit
external dependencies)
* Not distribute keytabs to application teams (limiting ability to login to
production systems)
I'm happy to walk anyone through the
[slides|http://events.linuxfoundation.org/sites/events/files/slides/ApacheCon%20Oozie%20CD_0.pdf]
over the phone if that would help too.
> Provide deployment primitives
> -----------------------------
>
> Key: OOZIE-2876
> URL: https://issues.apache.org/jira/browse/OOZIE-2876
> Project: Oozie
> Issue Type: Wish
> Components: action
> Reporter: Clay B.
> Priority: Minor
>
> Today one can schedule workflows which run on a cluster using many
> pre-existing artifacts. However, today there are no helpful primitives for
> deploying those artifacts to the cluster.
> For example, one may use a Spark JAR (hosted in a Maven repository) stored on
> a cluster's HDFS to talk with data stored in a Hive table (the schema of
> which is likely tracked in a source code management system somewhere) and use
> JDBC to talk to an arbitrary database off the cluster. (As to which database
> being configured based on the cluster being Dev, Beta or PROD all mapped in a
> simple configuration file.) Further, the data may be verified by a simple Pig
> script (also stored in a source code management repository).
> As a user, I'd like some way to get my binaries and ASCII configuration on a
> cluster without significant ad hoc shell actions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)