[ 
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518199#comment-14518199
 ] 

Robert Kanter commented on OOZIE-2216:
--------------------------------------

[~jaydeepvishwakarma], having this feature would be really great, and something 
that's been on the "To Do" list for a long time.  That said, it's not an easy 
task (which is why it hasn't been done yet :)).  The design document looks okay 
at a high level, but I'm not sure I understand how Oozie will "know" when the 
next data arrives.  Will Oozie be doing very frequent polling of HDFS to check 
for new data?  If so, I'm not sure the NN is going to like that.  One thing 
that might be helpful is HDFS's iNotify feature, which, from my understanding, 
will send out a notification so we don't have to poll HDFS.  I had actually 
created OOZIE-2179 for doing that regardless.  Perhaps you can take advantage 
of iNotify for checking for new data?

> Aperiodic Data handling in oozie
> --------------------------------
>
>                 Key: OOZIE-2216
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2216
>             Project: Oozie
>          Issue Type: New Feature
>          Components: coordinator
>            Reporter: Jaydeep Vishwakarma
>            Assignee: Jaydeep Vishwakarma
>         Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any 
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed 
> schedule/frequency. 
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a 
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be 
> very few.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to