Manu Zhang created GEARPUMP-68:
----------------------------------
Summary: If-statement support in DAG
Key: GEARPUMP-68
URL: https://issues.apache.org/jira/browse/GEARPUMP-68
Project: Apache Gearpump
Issue Type: New Feature
Reporter: Manu Zhang
imported from [https://github.com/gearpump/gearpump/issues/1456] on behalf of
[~whjiang]
h1. Goal
Currently, in Gearpump, publisher will publish each message to all the
subscriptions. However, there are cases that need to selectively publish to
certain subscription. E.g. in fraud detection use case, a threshold will be
checked to determine which route to go (a good user, a bad user or a suspicious
user?). Basically, this routing was represented as an IF-statement.
{code}
if (is_from_good_user(message))
no more check needed
else if(is_from_bad_user(message))
alert and no more check needed
else
perform additional check to decide
{code}
To support such routing, we need selectively route at processor level. (#1343
is on task level instead of processor level.)
h1. Solution
h2. solution 1
Solution 1 is a workaround solution. No change need from Gearpump core part.
Each If-then-else statement was represented as 2 processors
{code}
upstream ~> conditionTrueFilter ~> thenClause #filter out the false condition
messages in conditionTrueFilter
upstream ~> conditionFalseFilter ~> elseClause #filter out the true condition
messages in conditionFalseFilter
{code}
The main advantage of this solution is no need to change any code at Gearpump
core side.
The main disadvantages are:
It is hard to maintain. E.g. for dynamic DAG, if such day in future we need
to change the condition, we need to carefully change both nodes. Otherwise,
they will be inconsistent.
Bad performance. if the two condition filters are not at the same JVM as
upstream. It means we will see significant network transport which is
unnecessary.
Hard to understand. User need to learn this BKM to write if-statement.
Bad to express on UI. From DAG structure, it is impossible to know which
one is the then clause and which one is the else clause. So, user is unable to
have insight of the goodness of the condition check. E.g. does the condition
check succeed in most cases?
h2. Solution 2
Solution 2 is to add built-in support for if-statement. Basically, the design
is:
# Allow a processor to have more than one output channels. Each channel has a
name. Each channel has a default output channel named "out". Processor can add
alias names.
# Each channel can have multiple subscribers. Thus, for dynamic DAG, user can
dynamically add/remove subscriptions for certain channel.
So, it is quite easy to implement if-statment support using this solution:
A special IFProcessor is created. It has two output channels then and else.
UI can show the channel name on edge. Inside IFProcessor, user can write code
output("then", msg) to output to then channel.
DSL can be expressed like this
{code}
upstream.if(condition, thenClause, elseClause)
{code}
or
{code}
val ifStmt = upstream.if(condition)
ifStmt.then(thenClause)
ifStmt.else(elseClause)
{code}
Low level API can express as
{code}
A#"then" ~ partioner ~> B
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)