zhijiangW commented on issue #8242: [FLINK-6227][network] Introduce the DataConsumptionException for downstream task failure URL: https://github.com/apache/flink/pull/8242#issuecomment-490377541 Yes, I think we are on the same page now. I would focus on `b, c` in this PR, and launch separate PRs for the other cases future. Especially for `DataConnectionException` we might add the retry mechanism if possible during connecting the server and could throw `PartitionNotFoundException` or `DataConnectionException` after retry fails. I already updated the codes for the following points: - Extend the `PartitionNotFoundException` to add internal `Throwable` information. This exception might be caused by some `IOException` during opening blocking result partition file, so the internal throwable could be used for tracing /debugging the root problem. - Not transform the received `PartitionNotFoundException` to `IOException` on consumer side. - Catch parent `IOException` instead of `PartitionNotFoundException` in `LocalInputChannel` or `PartitionRequestServerHandler` during requesting partition. `IOException` could only happen in blocking result partition during opening disk file which might indicate the file corrupted/deleted. So the above cases of `b, c` are unified in this process. But there might exist concerns here: 1. For the pipelined mode, the consumer might ask JM for partition state to retrigger request after receiving `PartitionNotFoundException`. 2. For the blocking mode, the consumer should fail directly because the partition state is already known as `FINISHED`. So I think we should adjust the logic of `triggerPartitionProducerStateCheck` during receiving `PartitionNotFoundException` on consumer side. Otherwise this PR change might bring unnecessary retry if it is caused by partition corrupted not released. After you confirm this way is correct, then I would add new unit tests for covering these cases. :)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
