[ https://issues.apache.org/jira/browse/FLINK-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Knauf updated FLINK-10353: ------------------------------------- Description: If a KafkaProducer with {{Semantic.EXACTLY_ONCE}} is restored from a savepoint written with {{Semantic.AT_LEAST_ONCE}} the job fails on restore with the NPE below. This makes it impossible to upgrade an AT_LEAST_ONCE pipeline to an EXACTL_ONCE pipeline statefully. {quote} java.lang.NullPointerException at java.util.Hashtable.put(Hashtable.java:460) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initTransactionalProducer(FlinkKafkaProducer011.java:955) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.recoverAndCommit(FlinkKafkaProducer011.java:733) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.recoverAndCommit(FlinkKafkaProducer011.java:93) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.recoverAndCommitInternal(TwoPhaseCommitSinkFunction.java:373) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.initializeState(TwoPhaseCommitSinkFunction.java:333) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initializeState(FlinkKafkaProducer011.java:867) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:254) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:748){quote} The reason is, that for {{Semantic.AT_LEAST_ONCE}} the snapshotted state of the {{TwoPhaseCommitFunction}} is of the form "TransactionHolder\{handle=KafkaTransactionState [transactionalId=null, producerId=-1, epoch=-1], transactionStartTime=1537175471175}". was: If a KafkaProducer with `Semantic.EXACTLY_ONCE` is restored from a savepoint written with `Semantic.AT_LEAST_ONCE` the job fails on restore with the NPE below. This makes it impossible to upgrade an AT_LEAST_ONCE pipeline to an EXACTL_ONCE pipeline statefully. ``` java.lang.NullPointerException at java.util.Hashtable.put(Hashtable.java:460) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initTransactionalProducer(FlinkKafkaProducer011.java:955) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.recoverAndCommit(FlinkKafkaProducer011.java:733) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.recoverAndCommit(FlinkKafkaProducer011.java:93) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.recoverAndCommitInternal(TwoPhaseCommitSinkFunction.java:373) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.initializeState(TwoPhaseCommitSinkFunction.java:333) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initializeState(FlinkKafkaProducer011.java:867) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:254) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:748) ``` The reason is, that for `Semantic.AT_LEAST_ONCE` the snapshotted state of the `TwoPhaseCommitFunction` is of the form ``` TransactionHolder{handle=KafkaTransactionState [transactionalId=null, producerId=-1, epoch=-1], transactionStartTime=1537175471175} ``` > Restoring a KafkaProducer with Semantic.EXACTLY_ONCE from a savepoint written > with Semantic.AT_LEAST_ONCE fails with NPE > ------------------------------------------------------------------------------------------------------------------------ > > Key: FLINK-10353 > URL: https://issues.apache.org/jira/browse/FLINK-10353 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector > Affects Versions: 1.5.3, 1.6.0 > Reporter: Konstantin Knauf > Priority: Critical > > If a KafkaProducer with {{Semantic.EXACTLY_ONCE}} is restored from a > savepoint written with {{Semantic.AT_LEAST_ONCE}} the job fails on restore > with the NPE below. This makes it impossible to upgrade an AT_LEAST_ONCE > pipeline to an EXACTL_ONCE pipeline statefully. > {quote} > java.lang.NullPointerException > at java.util.Hashtable.put(Hashtable.java:460) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initTransactionalProducer(FlinkKafkaProducer011.java:955) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.recoverAndCommit(FlinkKafkaProducer011.java:733) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.recoverAndCommit(FlinkKafkaProducer011.java:93) > at > org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.recoverAndCommitInternal(TwoPhaseCommitSinkFunction.java:373) > at > org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.initializeState(TwoPhaseCommitSinkFunction.java:333) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initializeState(FlinkKafkaProducer011.java:867) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:254) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748){quote} > The reason is, that for {{Semantic.AT_LEAST_ONCE}} the snapshotted state of > the {{TwoPhaseCommitFunction}} is of the form > "TransactionHolder\{handle=KafkaTransactionState [transactionalId=null, > producerId=-1, epoch=-1], transactionStartTime=1537175471175}". -- This message was sent by Atlassian JIRA (v7.6.3#76005)