[ 
https://issues.apache.org/jira/browse/OOZIE-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Evgenij Kozhevnikov updated OOZIE-2534:
---------------------------------------
    Description: 
Case #1:
Some projects uses lib folder near the workflow.xml in HDFS to put dependencies 
for the particular module. This jars often contains the version in name 
(example, some-internal-lib-1.2.3.jar). Before update the project you have to 
make sure that there are no RUNNING instances of the workflow, otherwise you 
risks to get an exception "lib not found".

Case #2:
There are can be run some separate coordinators in Oozie with common sources in 
HDFS (with different properties). Some projects I've seen have more than 30 
coordinators with common sources in HDFS. Some of them works daily, some hourly 
and another every 10 minutes. And even more these workflows can be called as 
subworkflows from another sources. It is very difficult to manage all this 
projects during updates. You have to suspend all dependees, make sure there are 
no any running instances and anyway you have to stop your cluster activities on 
this period, so after all the load on the cluster will increase.

I suggest to have the same mechanism as sharedLibs, but for action libs. If 
actionLib contains any jars in the root they will be put on classpath as it 
were before. So the change will be backward compatible. But at the same time 
you can create a subdirectory with prefiix "lib_" and some timestamp or 
incremental version after that. JavaActionExecutor will take only the latest 
version subdirectory and put all artifacts from it to job cache.

  was:
Case #1:
Some projects uses lib folder near the workflow.xml in HDFS to put dependencies 
for the particular module. This jars often contains the version in name 
(example, some-internal-lib-1.2.3.jar). Before update the project you have to 
make sure that there are no RUNNING instances of the workflow, otherwise you 
risks to get an exception "lib not found".

Case #2:
There are can be run some separate coordinators in Oozie with common sources in 
HDFS (with different properties). Some projects I've seen have more than 30 
coordinators with common sources in HDFS. Some of them works daily, some hourly 
and another every 10 minutes. And even more these workflows can be called as 
subworkflows from another sources. It is very difficult to manage all this 
projects during updates. You have to suspend all dependees, make sure there are 
no any running instances and anyway you have to stop your cluster activities on 
this period, so after all the load on the cluster will increase.

I suggest to have the same mechanism as sharedLibs, but for action libs. If 
actionLib contains any jars in the root they will be put on classpath as it 
were before. So the change will be backward compatible. But at the same time 
you can create a subdirectory with prefiix "lib_" and some timestamp or 
incremental version after that. JavaActionExecutor will take only the latest 
version subdirectory and put all artifacts from it to job cache.

Initial implementation is attached.


> Versioned action libs (similar as sharedLibs works)
> ---------------------------------------------------
>
>                 Key: OOZIE-2534
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2534
>             Project: Oozie
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 4.2.0
>            Reporter: Evgenij Kozhevnikov
>            Priority: Trivial
>         Attachments: versionedActionLibs.diff
>
>
> Case #1:
> Some projects uses lib folder near the workflow.xml in HDFS to put 
> dependencies for the particular module. This jars often contains the version 
> in name (example, some-internal-lib-1.2.3.jar). Before update the project you 
> have to make sure that there are no RUNNING instances of the workflow, 
> otherwise you risks to get an exception "lib not found".
> Case #2:
> There are can be run some separate coordinators in Oozie with common sources 
> in HDFS (with different properties). Some projects I've seen have more than 
> 30 coordinators with common sources in HDFS. Some of them works daily, some 
> hourly and another every 10 minutes. And even more these workflows can be 
> called as subworkflows from another sources. It is very difficult to manage 
> all this projects during updates. You have to suspend all dependees, make 
> sure there are no any running instances and anyway you have to stop your 
> cluster activities on this period, so after all the load on the cluster will 
> increase.
> I suggest to have the same mechanism as sharedLibs, but for action libs. If 
> actionLib contains any jars in the root they will be put on classpath as it 
> were before. So the change will be backward compatible. But at the same time 
> you can create a subdirectory with prefiix "lib_" and some timestamp or 
> incremental version after that. JavaActionExecutor will take only the latest 
> version subdirectory and put all artifacts from it to job cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to