[ 
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560159#comment-14560159
 ] 

Rohini Palaniswamy commented on OOZIE-2216:
-------------------------------------------

Overall design and proposal is good.

bq. System should suppose to know what all data have been already processed, It 
will be read from DB, Thinking to keep the last input directory information on 
runConf of COORD_ACTION table.
      What happens if the version directories are produced out of sequence?

Haven't give much thought to incremental/delta processing in conjunction with 
OOZIE-1976. I will check with [~puru] and some of our users and update here 
with feedback if anything comes out of that discussion.

> Aperiodic Data handling in oozie
> --------------------------------
>
>                 Key: OOZIE-2216
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2216
>             Project: Oozie
>          Issue Type: New Feature
>          Components: coordinator
>            Reporter: Jaydeep Vishwakarma
>            Assignee: Jaydeep Vishwakarma
>         Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any 
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed 
> schedule/frequency. 
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a 
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be 
> very few.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to