zhijiangW commented on issue #8242: [FLINK-6227][network] Introduce the 
DataConsumptionException for downstream task failure
URL: https://github.com/apache/flink/pull/8242#issuecomment-490377541
 
 
   Yes, I think we are on the same page now.
   
   I would focus on `b, c` in this PR, and launch separate PRs for the other 
cases future. Especially for `DataConnectionException` we might add the retry 
mechanism if possible during connecting the server and could throw 
`PartitionNotFoundException` or `DataConnectionException` after retry fails.
   
   I already updated the codes for the following points:
   
   -  Extend the `PartitionNotFoundException` to add internal `Throwable` 
information. This exception might be caused by some `IOException` during 
opening blocking result partition file, so the internal throwable could be used 
for tracing /debugging the root problem.
   
   - Not transform the received `PartitionNotFoundException` to `IOException` 
on consumer side.
   
   - Catch parent `IOException` instead of `PartitionNotFoundException` in 
`LocalInputChannel` or `PartitionRequestServerHandler` during requesting 
partition. `IOException` could only happen in blocking result partition during 
opening disk file which might indicate the file corrupted/deleted.  So the 
above cases of `b, c` are unified in this process.  But there might exist 
concerns here: 
   
   1.  For the pipelined mode, the consumer might ask JM for partition state to 
retrigger request after receiving `PartitionNotFoundException`. 
   2. For the blocking mode, the consumer should fail directly because the 
partition state is already known as `FINISHED`. 
   
   So I think we should adjust the logic of 
`triggerPartitionProducerStateCheck` during receiving 
`PartitionNotFoundException` on consumer side. Otherwise this PR change might 
bring unnecessary retry if it is caused by partition corrupted not released.
   
   After you confirm this way is correct, then I would add new unit tests for 
covering these cases. :)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to