[
https://issues.apache.org/jira/browse/FALCON-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069538#comment-15069538
]
Ajay Yadava edited comment on FALCON-1686 at 12/23/15 11:45 AM:
----------------------------------------------------------------
This is the use case which I was talking about. You want to reprocess the data
which was already processed by the old code, this should be solved by the
*effective time update* feature.
>From what I understand [~sriksun] is talking about another use case. That is
>the case when you run your process and figure out that the code is correct but
>you have missed some instances because of incorrect start date. In this case
>you want *new instances* for a time range earlier than the start time and need
>to update start date of your process to an earlier time. This is also a valid
>use case but won't be solved by effective time update feature.
was (Author: ajayyadava):
I think this is the use case which I was talking about. You want to reprocess
the instances which were already processed by the old code, this should be
solved by the *effective time update* feature.
>From what I understand Srikanth Sundarrajan is talking about another use case.
>That is the case when you run your process and figure out that the code is
>correct but you have missed some instances because of incorrect start date. In
>this case you want *new instances* for a time range earlier than the start
>time and need to update start date of your process to an earlier time. This
>is also a valid use case but won't be solved by effective time update feature.
> Support for reprocessing
> ------------------------
>
> Key: FALCON-1686
> URL: https://issues.apache.org/jira/browse/FALCON-1686
> Project: Falcon
> Issue Type: Improvement
> Affects Versions: 0.7
> Reporter: Mass Dosage
>
> We have a number of ETL jobs which we schedule to run on a regular basis with
> Falcon. This works fine. However, we often have cases where we need to run
> the exact same jobs over past date ranges in order to reprocess data after a
> code change. There doesn't seem to be any easy way to do this in Falcon at
> the moment. Ideally we'd have a controlled way of saying "run this process
> for dates between X and Y". There should also be a way to control whether
> downstream processes are triggered by the data being reprocessed or not. In
> some cases you may want downstream jobs to also run on the new data but in
> other cases you might not.
> With Oozie, if one wants to reprocess data from any time in history, one can
> update the start & end-dates (using the job.properties file) and submit a new
> coordinator to run alongside the existing one. As the coordinator-ids are
> unique they do not clash. In Falcon, processes are defined by their readable
> name so one would need to update that in the process file directly.
> We are currently working around this issue by making a copy of the original
> Falcon process, giving it a different name and changing the dates. This isn't
> ideal and leads to a lot of XML duplication.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)