[
https://issues.apache.org/jira/browse/OOZIE-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Evgenij Kozhevnikov updated OOZIE-2534:
---------------------------------------
Description:
Case #1:
Some projects uses lib folder near the workflow.xml in HDFS to put dependencies
for the particular module. This jars often contains the version in name
(example, some-internal-lib-1.2.3.jar). Before update the project you have to
make sure that there are no RUNNING instances of the workflow, otherwise you
risks to get an exception "lib not found".
Case #2:
There are can be run some separate coordinators in Oozie with common sources in
HDFS (with different properties). Some projects I've seen have more than 30
coordinators with common sources in HDFS. Some of them works daily, some hourly
and another every 10 minutes. And even more these workflows can be called as
subworkflows from another sources. It is very difficult to manage all this
projects during updates. You have to suspend all dependees, make sure there are
no any running instances and anyway you have to stop your cluster activities on
this period, so after all the load on the cluster will increase.
Solution:
I suggest to have the same mechanism as sharedLibs, but for action libs. If
actionLib contains any jars in the root they will be put on classpath as it
were before. So the change will be backward compatible. But at the same time
you can create a subdirectory with prefiix "lib_" and some timestamp or
incremental version after that. JavaActionExecutor will take only the latest
version subdirectory and put all artifacts from it to job cache.
was:
Case #1:
Some projects uses lib folder near the workflow.xml in HDFS to put dependencies
for the particular module. This jars often contains the version in name
(example, some-internal-lib-1.2.3.jar). Before update the project you have to
make sure that there are no RUNNING instances of the workflow, otherwise you
risks to get an exception "lib not found".
Case #2:
There are can be run some separate coordinators in Oozie with common sources in
HDFS (with different properties). Some projects I've seen have more than 30
coordinators with common sources in HDFS. Some of them works daily, some hourly
and another every 10 minutes. And even more these workflows can be called as
subworkflows from another sources. It is very difficult to manage all this
projects during updates. You have to suspend all dependees, make sure there are
no any running instances and anyway you have to stop your cluster activities on
this period, so after all the load on the cluster will increase.
I suggest to have the same mechanism as sharedLibs, but for action libs. If
actionLib contains any jars in the root they will be put on classpath as it
were before. So the change will be backward compatible. But at the same time
you can create a subdirectory with prefiix "lib_" and some timestamp or
incremental version after that. JavaActionExecutor will take only the latest
version subdirectory and put all artifacts from it to job cache.
> Versioned action libs (similar as sharedLibs works)
> ---------------------------------------------------
>
> Key: OOZIE-2534
> URL: https://issues.apache.org/jira/browse/OOZIE-2534
> Project: Oozie
> Issue Type: Improvement
> Components: core
> Affects Versions: 4.2.0
> Reporter: Evgenij Kozhevnikov
> Priority: Trivial
> Attachments: versionedActionLibs.diff
>
>
> Case #1:
> Some projects uses lib folder near the workflow.xml in HDFS to put
> dependencies for the particular module. This jars often contains the version
> in name (example, some-internal-lib-1.2.3.jar). Before update the project you
> have to make sure that there are no RUNNING instances of the workflow,
> otherwise you risks to get an exception "lib not found".
> Case #2:
> There are can be run some separate coordinators in Oozie with common sources
> in HDFS (with different properties). Some projects I've seen have more than
> 30 coordinators with common sources in HDFS. Some of them works daily, some
> hourly and another every 10 minutes. And even more these workflows can be
> called as subworkflows from another sources. It is very difficult to manage
> all this projects during updates. You have to suspend all dependees, make
> sure there are no any running instances and anyway you have to stop your
> cluster activities on this period, so after all the load on the cluster will
> increase.
> Solution:
> I suggest to have the same mechanism as sharedLibs, but for action libs. If
> actionLib contains any jars in the root they will be put on classpath as it
> were before. So the change will be backward compatible. But at the same time
> you can create a subdirectory with prefiix "lib_" and some timestamp or
> incremental version after that. JavaActionExecutor will take only the latest
> version subdirectory and put all artifacts from it to job cache.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)