[
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536173#comment-14536173
]
Jaydeep Vishwakarma commented on OOZIE-2216:
--------------------------------------------
Thanks for reviewing [~sriksun], My replies are inlined.
1. How periodic will the polling be when materialization is lazy (to gauge the
effect this would have on NN) ?
I am not quite clear with the question can you please elaborate it more.
2. What is the behavior when some periodic and aperiodic datasets are required
for a coordinator. Is that supported ?
When we will have both periodic and aperiodic datasets , we have to give
priority to aperiodic datasets. To run an instance the system will check the
aperiodic data availability, the moment it is available it will check for
periodic dataset.
Let say if aperiodic dataset was available but periodic dataset missing for
more then the next nominal time of periodic dataset, System will refresh the
input sets.
3. How will this co-exist with features outlined in OOZIE-1976.
The only third use case "delta datasets (process data incrementally)" can be
handle by Aperiodic data handling. Waiting for [~puru] to confirm it.
4. You seem to imply that there would no schema changes. Would you need any
additional state maintained for this, if so where is that planned to be
maintained?
The coord definition will have the data sets which will indicate the Aperiodic
data handling. As DB already storing coord job definition, I just need to read
and interpret this on code.
5. Do you expect the DB to be loaded more than what it is today?
System should suppose to know what all data have been already processed, It
will be read from DB, Thinking to keep the last input directory information on
runConf of COORD_ACTION table.
I will put these information in doc, Once we finalize on it.
> Aperiodic Data handling in oozie
> --------------------------------
>
> Key: OOZIE-2216
> URL: https://issues.apache.org/jira/browse/OOZIE-2216
> Project: Oozie
> Issue Type: New Feature
> Components: coordinator
> Reporter: Jaydeep Vishwakarma
> Assignee: Jaydeep Vishwakarma
> Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed
> schedule/frequency.
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be
> very few.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)