[jira] [Commented] (FALCON-1686) Support for reprocessing

pavan kumar kolamuri (JIRA) Mon, 21 Dec 2015 09:54:12 -0800

    [ 
https://issues.apache.org/jira/browse/FALCON-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066792#comment-15066792
 ]


pavan kumar kolamuri commented on FALCON-1686:
----------------------------------------------

Hi [~massdosage] . I have few questions over your description, can you please 
clarify on this. 

 We have a number of ETL jobs which we schedule to run on a regular basis with 
Falcon. This works fine. However, we often have cases where we need to run the 
exact same jobs over past date ranges in order to reprocess data after a code 
change. 
           
        Why rerun of that process won't help you during that time ? 

There should also be a way to control whether downstream processes are 
triggered by the data being reprocessed or not. In some cases you may want 
downstream jobs to also run on the new data but in other cases you might not.
       
         How down stream processes rerun depends ? Is it based on data that 
comes ? Or only few of processes needs rerun and few others don't required ? 
Falcon Late Data processing will help in this case right ?


          


> Support for reprocessing
> ------------------------
>
>                 Key: FALCON-1686
>                 URL: https://issues.apache.org/jira/browse/FALCON-1686
>             Project: Falcon
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Mass Dosage
>
> We have a number of ETL jobs which we schedule to run on a regular basis with 
> Falcon. This works fine. However, we often have cases where we need to run 
> the exact same jobs over past date ranges in order to reprocess data after a 
> code change. There doesn't seem to be any easy way to do this in Falcon at 
> the moment. Ideally we'd have a controlled way of saying "run this process 
> for dates between X and Y". There should also be a way to control whether 
> downstream processes are triggered by the data being reprocessed or not. In 
> some cases you may want downstream jobs to also run on the new data but in 
> other cases you might not. 
> With Oozie, if one wants to reprocess data from any time in history, one can 
> update the start & end-dates (using the job.properties file) and submit a new 
> coordinator to run alongside the existing one. As the coordinator-ids are 
> unique they do not clash. In Falcon, processes are defined by their readable 
> name so one would need to update that in the process file directly. 
> We are currently working around this issue by making a copy of the original 
> Falcon process, giving it a different name and changing the dates. This isn't 
> ideal and leads to a lot of XML duplication. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FALCON-1686) Support for reprocessing

Reply via email to