[ 
https://issues.apache.org/jira/browse/FALCON-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086807#comment-15086807
 ] 

Ajay Yadava commented on FALCON-1728:
-------------------------------------

[~bvellanki] It is most definitely not a bug and as I said earlier lot of users 
are heavily dependent on this feature, so removing it is not an option. If you 
can tell us what is it that you want to achieve and this feature is not 
allowing you to do then probably we can suggest alternatives. If the use case 
is to just disallow certain users from doing certain hypothetical bad 
configurations even then removing this feature is definitely not the solution 
and we should consider other alternatives. 

Even without this feature there are many ways in which two processes can end up 
overwriting the same location, so it is up to the users to configure their 
entities properly. 

{quote}
Now you have a process ProcessOne whose output feed is FeedOne. The process is 
run on clusters ClusterTwo and ClusterThree. When oozie runs the process 
instance, the user expects the output data to be generated in
ClusterOne/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR}
{quote}
When process runs only on clusterTwo and clusterThree why is the user expecting 
the data to be generated in ClusterOne?

If two job instances of process in different cluster are writing to same output 
directory then it means your feed has same location in different clusters, why 
do you want that? 

Are you trying to suggest that the replication will overwrite? In that case did 
you configure the partitions properly?

> Process entity definition allows multiple clusters when it has output Feed 
> defined. 
> ------------------------------------------------------------------------------------
>
>                 Key: FALCON-1728
>                 URL: https://issues.apache.org/jira/browse/FALCON-1728
>             Project: Falcon
>          Issue Type: Bug
>          Components: process
>    Affects Versions: 0.9
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>            Priority: Critical
>
> Process XSD allows user to specify multiple clusters per process entity. I am 
> guessing this would allow a user to run duplicate instance of the process on 
> multiple clusters at the same time (I do not really see a need for this). 
> When the process has an output feed defined, you can have duplicate process 
> instances writing to same feed instance, causing data corruption/failures. 
> The solution is to 
> 1. Do not allow multiple clusters per process. Let the user define a 
> duplicate process if user wants to run duplicate instances.  
> OR
> 2. Allow multiple clusters, but only when there is no output feed defined.
> [~sriksun] please let me know if there is any other reason for allowing 
> multiple clusters in a process. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to