[jira] [Created] (GEARPUMP-68) If-statement support in DAG

Manu Zhang (JIRA) Thu, 21 Apr 2016 01:01:04 -0700

Manu Zhang created GEARPUMP-68:
----------------------------------

             Summary: If-statement support in DAG
                 Key: GEARPUMP-68
                 URL: https://issues.apache.org/jira/browse/GEARPUMP-68
             Project: Apache Gearpump
          Issue Type: New Feature
            Reporter: Manu Zhang



imported from [https://github.com/gearpump/gearpump/issues/1456] on behalf of 
[~whjiang]


h1. Goal

Currently, in Gearpump, publisher will publish each message to all the 
subscriptions. However, there are cases that need to selectively publish to 
certain subscription. E.g. in fraud detection use case, a threshold will be 
checked to determine which route to go (a good user, a bad user or a suspicious 
user?). Basically, this routing was represented as an IF-statement.

{code}
if (is_from_good_user(message)) 
    no more check needed
else if(is_from_bad_user(message))
    alert and no more check needed
else
    perform additional check to decide
{code}

To support such routing, we need selectively route at processor level. (#1343 
is on task level instead of processor level.)

h1. Solution

h2. solution 1

Solution 1 is a workaround solution. No change need from Gearpump core part. 
Each If-then-else statement was represented as 2 processors

{code}
upstream ~> conditionTrueFilter ~> thenClause   #filter out the false condition 
messages in conditionTrueFilter
upstream ~> conditionFalseFilter ~> elseClause  #filter out the true condition 
messages in conditionFalseFilter
{code}

The main advantage of this solution is no need to change any code at Gearpump 
core side.

The main disadvantages are:

    It is hard to maintain. E.g. for dynamic DAG, if such day in future we need 
to change the condition, we need to carefully change both nodes. Otherwise, 
they will be inconsistent.
    Bad performance. if the two condition filters are not at the same JVM as 
upstream. It means we will see significant network transport which is 
unnecessary.
    Hard to understand. User need to learn this BKM to write if-statement.
    Bad to express on UI. From DAG structure, it is impossible to know which 
one is the then clause and which one is the else clause. So, user is unable to 
have insight of the goodness of the condition check. E.g. does the condition 
check succeed in most cases?

h2. Solution 2

Solution 2 is to add built-in support for if-statement. Basically, the design 
is:
# Allow a processor to have more than one output channels. Each channel has a 
name. Each channel has a default output channel named "out". Processor can add 
alias names.
# Each channel can have multiple subscribers. Thus, for dynamic DAG, user can 
dynamically add/remove subscriptions for certain channel.

So, it is quite easy to implement if-statment support using this solution:

    A special IFProcessor is created. It has two output channels then and else. 
UI can show the channel name on edge. Inside IFProcessor, user can write code 
output("then", msg) to output to then channel.
    DSL can be expressed like this

{code}
  upstream.if(condition, thenClause, elseClause)
{code}

or

{code}
   val ifStmt = upstream.if(condition)
   ifStmt.then(thenClause)
   ifStmt.else(elseClause)
{code}
    Low level API can express as

{code}
  A#"then" ~ partioner ~> B
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (GEARPUMP-68) If-statement support in DAG

Reply via email to