[
https://issues.apache.org/jira/browse/FLINK-40016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rui Fan updated FLINK-40016:
----------------------------
Attachment:
UnalignedCheckpointRescaleITCase_upscale_pipeline_20_to_21_FAILED.log
> UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint fails with
> "Corrupt stream, found tag" (in-flight stream desync) on rescale recovery
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-40016
> URL: https://issues.apache.org/jira/browse/FLINK-40016
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Reporter: Martijn Visser
> Assignee: Rui Fan
> Priority: Major
> Attachments:
> UnalignedCheckpointRescaleITCase_upscale_pipeline_20_to_21_FAILED.log
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76170&view=results]
> (leg: test_cron_azure tests)
> {code:java}
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:180)
> at
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.execute(UnalignedCheckpointTestBase.java:217)
> at
> org.apache.flink.test.checkpointing.UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint(UnalignedCheckpointRescaleITCase.java:683)
> Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by
> FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1,
> backoffTimeMS=100)
> Caused by: java.io.IOException: Can't get next record for channel
> InputChannelInfo{gateIdx=0, inputChannelIdx=5}
> Caused by: java.io.IOException: Corrupt stream, found tag: 8
> {code}
> {{org.apache.flink.test.checkpointing.UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint}}
> (parameter {{{}[19]{}}}) fails on rescale recovery: the restored in-flight
> byte stream is misaligned, so {{StreamElementSerializer}} reads a data byte
> where an element tag is expected. Valid tags are 0-6; "tag 8" is out of
> range, confirming a stream desync rather than a format mismatch.
> Suspected to be in the recovered in-flight-buffer / filtering-recovery path
> that landed in master 2026-02 .. 2026-04 (FLINK-39018 buffer migration from
> RecoveredInputChannel to physical channels; FLINK-38930 filtering record
> before processing without spilling / heap-buffer fallback during recovery;
> FLINK-39519 reusable heap segment for pre-filter source buffers). The failing
> build's head commit (4902753429, FLINK-35562) is a flink-table-planner
> test-only change and cannot have introduced this, so the defect is latent in
> master.
> Related: FLINK-22197 (same test/method but a {{{}NoSuchFileException{}}},
> different cause), FLINK-38643 (hang), FLINK-35351 / FLINK-39162 (closed
> UC-corruption, custom-partitioner specific).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)