[ 
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536173#comment-14536173
 ] 

Jaydeep Vishwakarma commented on OOZIE-2216:
--------------------------------------------

Thanks for reviewing [~sriksun],  My replies are inlined.
1. How periodic will the polling be when materialization is lazy (to gauge the 
effect this would have on NN) ?
  I am not quite clear with the question can you please elaborate it more. 
2. What is the behavior when some periodic and aperiodic datasets are required 
for a coordinator. Is that supported ?
When we will have both periodic and aperiodic datasets , we have to give 
priority to aperiodic datasets. To run an instance the system will check the 
aperiodic data availability, the moment it is available it will check for 
periodic dataset.
 Let say if aperiodic dataset was available but periodic dataset missing for 
more then the next nominal time of periodic dataset, System will refresh the 
input sets.
3. How will this co-exist with features outlined in OOZIE-1976.
The only third use case "delta datasets (process data incrementally)" can be 
handle by Aperiodic data handling.  Waiting for [~puru] to confirm it.
4. You seem to imply that there would no schema changes. Would you need any 
additional state maintained for this, if so where is that planned to be 
maintained?
The coord definition will have the data sets which will indicate the Aperiodic 
data handling. As DB already storing coord job definition, I just need to read 
and interpret this on code.
5. Do you expect the DB to be loaded more than what it is today?
System should suppose to know what all data have been already processed, It 
will be read from DB, Thinking to keep the last input directory information on 
runConf of COORD_ACTION table.

I will put these information in doc, Once we finalize on it.

> Aperiodic Data handling in oozie
> --------------------------------
>
>                 Key: OOZIE-2216
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2216
>             Project: Oozie
>          Issue Type: New Feature
>          Components: coordinator
>            Reporter: Jaydeep Vishwakarma
>            Assignee: Jaydeep Vishwakarma
>         Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any 
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed 
> schedule/frequency. 
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a 
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be 
> very few.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to