[
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532558#comment-14532558
]
Srikanth Sundarrajan commented on OOZIE-2216:
---------------------------------------------
[~jaydeepvishwakarma], This would be a nice addition to Oozie. Looking at the
design, it looks like the major shift you plan to bring about is to avoid eager
materialization followed by input check to lazy materialization upon input
availability for coordinators that are marked as gating on aperiodic datasets.
Seems simple enough. Perhaps you can share your thinking on these.
1. How periodic will the polling be when materialization is lazy (to gauge the
effect this would have on NN) ?
2. What is the behavior when some periodic and aperiodic datasets are required
for a coordinator. Is that supported ?
3. How will this co-exist with features outlined in OOZIE-1976
4. You seem to imply that there would no schema changes. Would you need any
additional state maintained for this, if so where is that planned to be
maintained?
5. Do you expect the DB to be loaded more than what it is today?
Thanks for taking this up.
> Aperiodic Data handling in oozie
> --------------------------------
>
> Key: OOZIE-2216
> URL: https://issues.apache.org/jira/browse/OOZIE-2216
> Project: Oozie
> Issue Type: New Feature
> Components: coordinator
> Reporter: Jaydeep Vishwakarma
> Assignee: Jaydeep Vishwakarma
> Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed
> schedule/frequency.
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be
> very few.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)