[ 
https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116017#comment-14116017
 ] 

Robert Kanter commented on OOZIE-1976:
--------------------------------------

Here's my feedback on the v2 document:
- "Design considerations"
-- On #1 about embedding the control info in the missing dependencies string to 
avoid DB changes, that sounds fine for now.  We may want to revisit this if it 
gets really messy or requires a lot of string parsing etc.  
- "Logical Changes"
-- On Case 1, is it possible to do nested logical combinations?  e.g. something 
like {{(A || B) && (C || D)}} or {{(A && B) || (C && D)}}.  This may get too 
complicated if we allow too much here though...
- "Oozie job -info API for coord action"
-- I like this.  It should be very helpful in figuring out why these more 
complicated coords didn't trigger when the user thinks it should

> Specifying coordinator input datasets in more logical ways
> ----------------------------------------------------------
>
>                 Key: OOZIE-1976
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1976
>             Project: Oozie
>          Issue Type: New Feature
>          Components: coordinator
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>            Assignee: Mona Chitnis
>             Fix For: trunk
>
>         Attachments: OOZIE-1976-rough-design-2.pdf, 
> OOZIE-1976-rough-design.pdf
>
>
> All dataset instances specified as input to coordinator, currently work on 
> AND logic i.e. ALL of them should be available for workflow to start. We 
> should enhance this to include more logical ways of specifying availability 
> criteria e.g.
>  * OR between instances
>  * minimum N out of K instances
>  * delta datasets (process data incrementally)
> Use-cases for this:
>  * Different datasets are BCP, and workflow can run with either, whichever 
> arrives earlier.
>  * Data is not guaranteed, and while $coord:latest allows skipping to 
> available ones, workflow will never trigger unless mentioned number of 
> instances are found.
>  * Workflow is like a ‘refining’ algorithm which should run after minimum 
> required datasets are ready, and should only process the delta for efficiency.
> This JIRA is to discuss the design and then the review the implementation for 
> some or all of the above features.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to