[
https://issues.apache.org/jira/browse/FLINK-38827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emre Kartoglu updated FLINK-38827:
----------------------------------
Component/s: (was: Connectors / Firehose)
(was: Connectors / Kinesis)
Fix Version/s: (was: 1.17.0)
(was: 1.16.1)
(was: 1.15.4)
(was: aws-connector-4.1.0)
(was: aws-connector-3.1.0)
Affects Version/s: aws-connector-5.0.0
(was: 1.16.0)
(was: 1.15.3)
(was: aws-connector-3.0.0)
(was: aws-connector-4.0.0)
Description: (was: AWS Sinks based on Async Sink can enter a
deadlock situation if the AWS async client throws error outside of the future.
We observed this with a local application:
{code:java}
java.lang.NullPointerException
at
org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.utils.NettyUtils.closedChannelMessage(NettyUtils.java:135)
at
org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.utils.NettyUtils.decorateException(NettyUtils.java:71)
at
org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:310)
at
org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequestListener(NettyRequestExecutor.java:189)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
at
org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.CancellableAcquireChannelPool.lambda$acquire$1(CancellableAcquireChannelPool.java:58)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
at
org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.ensureAcquiredChannelIsHealthy(HealthCheckedChannelPool.java:114)
at
org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.lambda$tryAcquire$1(HealthCheckedChannelPool.java:97)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:502)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at
org.apache.flink.kinesis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at
org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at
org.apache.flink.kinesis.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:829){code}
Related AWS SDK issues that can cause this:
* [https://github.com/aws/aws-sdk-java-v2/issues/3435]
* [https://github.com/aws/aws-sdk-java-v2/issues/1812]
If an error is thrown and not handled by the future then the AsyncSink will
never decrement {{{}inFlightRequestCount{}}}. the job will hang while trying
flush for checkpoint
!sink-deadlock.png|width=736,height=374!)
Priority: Major (was: Critical)
Summary: [Draft] NullPointerException and job restart when
consuming from DynamoDB CDC (was: NullPointerException and job restart when
consuming from DynamoDB CDC)
> [Draft] NullPointerException and job restart when consuming from DynamoDB CDC
> -----------------------------------------------------------------------------
>
> Key: FLINK-38827
> URL: https://issues.apache.org/jira/browse/FLINK-38827
> Project: Flink
> Issue Type: Bug
> Components: Connectors / DynamoDB
> Affects Versions: aws-connector-5.0.0
> Reporter: Emre Kartoglu
> Assignee: Danny Cranmer
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)