[
https://issues.apache.org/jira/browse/FALCON-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068646#comment-15068646
]
Ajay Yadava commented on FALCON-1686:
-------------------------------------
{quote}
in order to *reprocess* data after a code change.
{quote}
Since it said reprocessing of the data, I assumed that the time was between
start time and end time. Effective time won't help in scenarios where one needs
to process data before start date. I am not sure what is the exact use case for
[[email protected]] but I think effective time and update of start date are
2 separate issues and probably makes sense to track in 2 different JIRAs.
> Support for reprocessing
> ------------------------
>
> Key: FALCON-1686
> URL: https://issues.apache.org/jira/browse/FALCON-1686
> Project: Falcon
> Issue Type: Improvement
> Affects Versions: 0.7
> Reporter: Mass Dosage
>
> We have a number of ETL jobs which we schedule to run on a regular basis with
> Falcon. This works fine. However, we often have cases where we need to run
> the exact same jobs over past date ranges in order to reprocess data after a
> code change. There doesn't seem to be any easy way to do this in Falcon at
> the moment. Ideally we'd have a controlled way of saying "run this process
> for dates between X and Y". There should also be a way to control whether
> downstream processes are triggered by the data being reprocessed or not. In
> some cases you may want downstream jobs to also run on the new data but in
> other cases you might not.
> With Oozie, if one wants to reprocess data from any time in history, one can
> update the start & end-dates (using the job.properties file) and submit a new
> coordinator to run alongside the existing one. As the coordinator-ids are
> unique they do not clash. In Falcon, processes are defined by their readable
> name so one would need to update that in the process file directly.
> We are currently working around this issue by making a copy of the original
> Falcon process, giving it a different name and changing the dates. This isn't
> ideal and leads to a lot of XML duplication.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)