[ 
https://issues.apache.org/jira/browse/FLINK-30562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Wozniakowski updated FLINK-30562:
----------------------------------------
    Attachment: flink-asf-30562-clean.zip

> CEP Operator misses patterns on SideOutputs and parallelism >1 since 1.15.x+
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-30562
>                 URL: https://issues.apache.org/jira/browse/FLINK-30562
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream, Library / CEP
>    Affects Versions: 1.16.0, 1.15.3
>         Environment: Problem observed in:
> Production:
> Dockerised Flink cluster running in AWS Fargate, sourced from AWS Kinesis and 
> sink to AWS SQS
> Local:
> Completely local MiniCluster based test with no external sinks or sources
>            Reporter: Thomas Wozniakowski
>            Priority: Major
>         Attachments: flink-asf-30562-clean.zip
>
>
> (Apologies for the speculative and somewhat vague ticket, but I wanted to 
> raise this while I am investigating to see if anyone has suggestions to help 
> me narrow down the problem.)
> We are encountering an issue where our streaming Flink job has stopped 
> working correctly since Flink 1.15.3. This problem is also present on Flink 
> 1.16.0. The Keyed CEP operators that our job uses are no longer emitting 
> Patterns reliably, but critically *this is only happening when parallelism is 
> set to a value greater than 1*. 
> Our local build tests were previously set up using in-JVM `MiniCluster` 
> instances, or dockerised Flink clusters all set with a parallelism of 1, so 
> this problem was not caught and it caused an outage when we upgraded the 
> cluster version in production.
> Observing the job using the Flink console in production, I can see that 
> events are *arriving* into the Keyed CEP operators, but no Pattern events are 
> being emitted out of any of the operators. Furthermore, all the reported 
> Watermark values are zero, though I don't know if that is a red herring as it 
> seems Watermark reporting seems to have changed since 1.14.x.
> I am currently attempting to create a stripped down version of our streaming 
> job to demonstrate the problem, but this is quite tricky to set up. In the 
> meantime I would appreciate any hints that could point me in the right 
> direction.
> I have isolated the problem to the Keyed CEP operator by removing our real 
> sinks and sources from the failing test. I am still seeing the erroneous 
> behaviour when setting up a job as:
> # Events are read from a list using `env.fromCollection( ... )`
> # CEP operator processes events
> # Output is captured in another list for assertions
> My best guess at the moment is something to do with Watermark emission? There 
> seems to have been changes related to watermark alignment, perhaps this has 
> caused some kind of regression in the CEP library? To reiterate, *this 
> problem only occurs with parallelism of 2 or more. Setting the parallelism to 
> 1 immediately fixes the issue*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to