[
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519085#comment-14519085
]
Jaydeep Vishwakarma commented on OOZIE-2216:
--------------------------------------------
Thanks [~rkanter] for reviewing doc. The polling mechanism will be same as what
we have for periodic coordinator instances.
But In Aperiodic data handling when data is available then only instance will
be created, So at any point of time only one polling will be happen through
coordinator, if data is available instance will be created for input data. This
brings the advantage also to avoid polling though multiple instances.
Although we should use iNotify feature, which can be use for both kind of data
sets by some common methods.
If you have any other suggestion or query related to it most welcome.
I am still looking for more feedback from other committers and contributors as
well, before I finalize the doc and start to work on it.
Thanks in advance
> Aperiodic Data handling in oozie
> --------------------------------
>
> Key: OOZIE-2216
> URL: https://issues.apache.org/jira/browse/OOZIE-2216
> Project: Oozie
> Issue Type: New Feature
> Components: coordinator
> Reporter: Jaydeep Vishwakarma
> Assignee: Jaydeep Vishwakarma
> Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed
> schedule/frequency.
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be
> very few.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)