Yordan Pavlov created FLINK-28802:
-------------------------------------

             Summary: EOFException when recovering from a checkpoint
                 Key: FLINK-28802
                 URL: https://issues.apache.org/jira/browse/FLINK-28802
             Project: Flink
          Issue Type: Bug
            Reporter: Yordan Pavlov


Flink version: 1.14.2

When recovering from a Savepoint, the TaskManager would throw an EOFException 
at this point:

[https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1681]

It looks like the transactional id is missing on deserialization. As an 
interesting observation, before trying to recover from the mentioned checkpoint 
the job failed with:

 
{code:java}
 Aug 2, 2022 @ 16:50:40.107    Caused by: 
org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker received 
an out of order sequence number.
    Aug 2, 2022 @ 16:50:40.106    2022-08-02 13:50:40.106 
[flink-akka.actor.default-dispatcher-5] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph  - Time correct sink 
UTXO Balances -> Sink Save to Kafka UTXO Balances bch-balances-v4 (1/1) 
(d2847bd31137561e3ba075a7cf70ee7c) switched from RUNNING to FAILED on 
10.42.238.71:37353-fc3e48 @ 
bch-balances-v4-flink-taskmanager-0.bch-balances-v4-flink-taskmanager.flink.svc.cluster.local
 (dataPort=42553).
    Aug 2, 2022 @ 16:50:40.106    org.apache.flink.util.FlinkRuntimeException: 
Failed to send data to Kafka bch-balances-v4-0@-1 with 
FlinkKafkaInternalProducer{transactionalId='bch-balances-v4-0-690507', 
inTransaction=true, closed=false} {code}
 

Could it be that Flink actually failed to construct the checkpoint but still 
marked it as completed. What would be the way to check this? Is there something 
like a checksum byte that is checked on recovery?
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to