[ 
https://issues.apache.org/jira/browse/FALCON-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068646#comment-15068646
 ] 

Ajay Yadava commented on FALCON-1686:
-------------------------------------

{quote}
 in order to *reprocess* data after a code change.
{quote}

Since it said reprocessing of the data, I assumed that the time was between 
start time and end time. Effective time won't help in scenarios where one needs 
to process data before start date. I am not sure what is the exact use case for 
[[email protected]] but I think effective time and update of start date are 
2 separate issues and probably makes sense to track in 2 different JIRAs.


> Support for reprocessing
> ------------------------
>
>                 Key: FALCON-1686
>                 URL: https://issues.apache.org/jira/browse/FALCON-1686
>             Project: Falcon
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Mass Dosage
>
> We have a number of ETL jobs which we schedule to run on a regular basis with 
> Falcon. This works fine. However, we often have cases where we need to run 
> the exact same jobs over past date ranges in order to reprocess data after a 
> code change. There doesn't seem to be any easy way to do this in Falcon at 
> the moment. Ideally we'd have a controlled way of saying "run this process 
> for dates between X and Y". There should also be a way to control whether 
> downstream processes are triggered by the data being reprocessed or not. In 
> some cases you may want downstream jobs to also run on the new data but in 
> other cases you might not. 
> With Oozie, if one wants to reprocess data from any time in history, one can 
> update the start & end-dates (using the job.properties file) and submit a new 
> coordinator to run alongside the existing one. As the coordinator-ids are 
> unique they do not clash. In Falcon, processes are defined by their readable 
> name so one would need to update that in the process file directly. 
> We are currently working around this issue by making a copy of the original 
> Falcon process, giving it a different name and changing the dates. This isn't 
> ideal and leads to a lot of XML duplication. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to