[ 
https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949698#comment-14949698
 ] 

Robert Kanter commented on OOZIE-1976:
--------------------------------------

I took a first pass through the code on RB.  

I also had a general question: Should we add NOT or XOR?  I don’t have specific 
use cases, but it might be helpful to have these logical operators.  

It looks like you’ve added an async dataset option in the XSD, but I don’t see 
any code using it anywhere, and it’s not mentioned in the doc or JIRA.  Is this 
something you’re planning on adding in a later patch?

> Specifying coordinator input datasets in more logical ways
> ----------------------------------------------------------
>
>                 Key: OOZIE-1976
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1976
>             Project: Oozie
>          Issue Type: New Feature
>          Components: coordinator
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>            Assignee: Purshotam Shah
>             Fix For: trunk
>
>         Attachments: Input-check.docx, OOZIE-1976-WIP.patch, 
> OOZIE-1976-rough-design-2.pdf, OOZIE-1976-rough-design.pdf
>
>
> All dataset instances specified as input to coordinator, currently work on 
> AND logic i.e. ALL of them should be available for workflow to start. We 
> should enhance this to include more logical ways of specifying availability 
> criteria e.g.
>  * OR between instances
>  * minimum N out of K instances
>  * delta datasets (process data incrementally)
> Use-cases for this:
>  * Different datasets are BCP, and workflow can run with either, whichever 
> arrives earlier.
>  * Data is not guaranteed, and while $coord:latest allows skipping to 
> available ones, workflow will never trigger unless mentioned number of 
> instances are found.
>  * Workflow is like a ‘refining’ algorithm which should run after minimum 
> required datasets are ready, and should only process the delta for efficiency.
> This JIRA is to discuss the design and then the review the implementation for 
> some or all of the above features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to