[jira] [Comment Edited] (FALCON-1686) Support for reprocessing

Ajay Yadava (JIRA) Wed, 23 Dec 2015 03:46:44 -0800

    [ 
https://issues.apache.org/jira/browse/FALCON-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069538#comment-15069538
 ]


Ajay Yadava edited comment on FALCON-1686 at 12/23/15 11:45 AM:
----------------------------------------------------------------

This is the use case which I was talking about. You want to reprocess the data 
which was already processed by the old code, this should be solved by the 
*effective time update* feature. 

>From what I understand [~sriksun] is talking about another use case. That is 
>the case when you run your process and figure out that the code is correct but 
>you have missed some instances because of incorrect start date. In this case 
>you want *new instances* for a time range earlier than the start time and need 
>to update start date of your process to an earlier time.  This is also a valid 
>use case but won't be solved by effective time update feature.


was (Author: ajayyadava):
I think this is the use case which I was talking about. You want to reprocess 
the instances which were already processed by the old code, this should be 
solved by the *effective time update* feature. 

>From what I understand Srikanth Sundarrajan is talking about another use case. 
>That is the case when you run your process and figure out that the code is 
>correct but you have missed some instances because of incorrect start date. In 
>this case you want *new instances* for a time range earlier than the start 
>time and need to update start date of your process to an earlier time.  This 
>is also a valid use case but won't be solved by effective time update feature.

> Support for reprocessing
> ------------------------
>
>                 Key: FALCON-1686
>                 URL: https://issues.apache.org/jira/browse/FALCON-1686
>             Project: Falcon
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Mass Dosage
>
> We have a number of ETL jobs which we schedule to run on a regular basis with 
> Falcon. This works fine. However, we often have cases where we need to run 
> the exact same jobs over past date ranges in order to reprocess data after a 
> code change. There doesn't seem to be any easy way to do this in Falcon at 
> the moment. Ideally we'd have a controlled way of saying "run this process 
> for dates between X and Y". There should also be a way to control whether 
> downstream processes are triggered by the data being reprocessed or not. In 
> some cases you may want downstream jobs to also run on the new data but in 
> other cases you might not. 
> With Oozie, if one wants to reprocess data from any time in history, one can 
> update the start & end-dates (using the job.properties file) and submit a new 
> coordinator to run alongside the existing one. As the coordinator-ids are 
> unique they do not clash. In Falcon, processes are defined by their readable 
> name so one would need to update that in the process file directly. 
> We are currently working around this issue by making a copy of the original 
> Falcon process, giving it a different name and changing the dates. This isn't 
> ideal and leads to a lot of XML duplication. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (FALCON-1686) Support for reprocessing

Reply via email to