A thread reading the result corresponding to an orphaned task can indeed cause 
hang. Good catch. 
The exceptions as well as task results can be passed across threads using 
std::shared_future. If a task thread exited with an exception, the caller of 
std::future::get will get an exception. Assuming the exiting thread stored the 
exception in the corresponding std::promise. 
No exception should escape the boundary of the thread that threw it.
And the top-level thread can then translate the exception into error string and 
report back gracefully. 

Eftiquar 

 


On 1/19/18, 1:47 PM, "Li, Mu" <[email protected]> wrote:

    Very good document, thanks!
    
     One issue with approach 1 is that resuming the operator after the failed 
one may cause error and even system hang. Say if op A writes var V while op B 
reads V. Then B will not be excited if A is failed, unless we clear their 
dependencies, but it will lead to wrong results as well.
    
    Best
    Mu
    
    > On Jan 19, 2018, at 10:07 AM, Anirudh <[email protected]> wrote:
    > 
    > Hi,
    > 
    > I have outlined the approach and proof of concept for Better Exception
    > Handling in MXNet. Please provide feedback/comments/suggestions in the
    > comments section of the wiki.
    > 
    > 
https://cwiki.apache.org/confluence/display/MXNET/Improved+exception+handling+in+MXNet
    > 
    > 
    > Note: Responses will be delayed till 01/22/2018.
    > 
    > Anirudh
    

Reply via email to