[ 
https://issues.apache.org/jira/browse/FLINK-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502103#comment-16502103
 ] 

Hequn Cheng commented on FLINK-9528:
------------------------------------

You are right. The duplicate delete messages in upsert mode won't affect the 
correctness. However it may bring some extra side effects, for instance, user 
can not filter the data what they don't need(actually), although the redundant 
messages won't affect the correctness.

To avoid the duplicate delete messages, we can keep a state in Filter. The 
Filter only send one delete message when the predicate evaluates from true to 
false. But this will bring state or shuffle cost.

I was thinking whether there is a way that we can send a bundle message from 
the upstream aggregate node. The bundle message contains a retract message and 
an update message(retract-old&insert-new). For the {{Filter}}, it will turn the 
bundle message into a normal message when the insert-new is filtered. And for 
the {{UpsertTableSink}}, it can throw away the useless retract-old message when 
received the bundle.



> Incorrect results: Filter does not treat Upsert messages correctly.
> -------------------------------------------------------------------
>
>                 Key: FLINK-9528
>                 URL: https://issues.apache.org/jira/browse/FLINK-9528
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>    Affects Versions: 1.3.3, 1.5.0, 1.4.2
>            Reporter: Fabian Hueske
>            Assignee: Hequn Cheng
>            Priority: Critical
>
> Currently, Filters (i.e., Calcs with predicates) do not distinguish between 
> retraction and upsert mode. A Calc looks at record (regardless of its update 
> semantics) and either discard it (predicate evaluates to false) or pass it on 
> (predicate evaluates to true).
> This works fine for messages with retraction semantics but is not correct for 
> upsert messages.
> The following test case (can be pasted into {{TableSinkITCase}}) shows the 
> problem:
> {code:java}
>   @Test
>   def testUpsertsWithFilter(): Unit = {
>     val env = StreamExecutionEnvironment.getExecutionEnvironment
>     env.getConfig.enableObjectReuse()
>     env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>     val tEnv = TableEnvironment.getTableEnvironment(env)
>     val t = StreamTestData.get3TupleDataStream(env)
>       .assignAscendingTimestamps(_._1.toLong)
>       .toTable(tEnv, 'id, 'num, 'text)
>     t.select('text.charLength() as 'len)
>       .groupBy('len)
>       .select('len, 'len.count as 'cnt)
>       // .where('cnt < 7)
>       .writeToSink(new TestUpsertSink(Array("len"), false))
>     env.execute()
>     val results = RowCollector.getAndClearValues
>     val retracted = RowCollector.upsertResults(results, Array(0)).sorted
>     val expectedWithoutFilter = List(
>       "2,1", "5,1", "9,9", "10,7", "11,1", "14,1", "25,1").sorted
>     val expectedWithFilter = List(
>     "2,1", "5,1", "11,1", "14,1", "25,1").sorted
>     assertEquals(expectedWithoutFilter, retracted)
>     // assertEquals(expectedWithFilter, retracted)
>   }
> {code}
> When we add a filter on the aggregation result, we would expect that all rows 
> that do not fulfill the condition are removed from the result. However, the 
> filter only removes the upsert message such that the previous version remains 
> in the result.
> One solution could be to make a filter aware of the update semantics (retract 
> or upsert) and convert the upsert message into a delete message if the 
> predicate evaluates to false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to