Would it make sense to have a developer best practices section on the Apache wiki where such guidance can be documented for future reference?
Bhavin Thaker. On Thu, Jan 11, 2018 at 9:56 AM Anirudh <anirudh2...@gmail.com> wrote: > Hi, > > > I have been thinking about exception handling specifically inside spawned > threads. > > As Tianqi mentioned, there is already a mechanism with LOG(FATAL) or CHECK > for exception handling inside main > > Thread. For exception handling inside spawned threads I see two places: > iterators and operators. > > > > For iterators, we can use exception_ptr to transport the exceptions from > child thread to the main thread. > > This can be implemented in the threadediter class template. Since > PrefetchingIter is used by most iterators in MXNet, > > and this uses threadediter, we should be able to cover most of our use > cases. > > > > For operators, I was thinking that we can transport the exception down the > dependency path. > > For example, when an exception is caught inside ExecuteOprBlock for a > single operator, > > We store the exception_ptr in the operator. We then propagate the > exception_ptr down to all the vars that the > > Operator writes to. Similarly, if an operator’s read vars has exception_ptr > attached to it, we propagate it down to the operator itself. > > > > We can then check if the var has an associated exception_ptr in > wait_to_read. > > One problem I see with the approach is that even if an operator fails we > may need to run subsequent operators. One way to avoid this > > Would be an onstart callback, which would mark the operator to not execute > if any of the read vars have an exception_ptr attached to it. > > > > Anirudh > > On Thu, Jan 11, 2018 at 9:02 AM, Tianqi Chen <tqc...@cs.washington.edu> > wrote: > > > I am all for RAII when possible in most of the code. The only reason some > > of the raw ptr occur in dmlc codebase was legacy-issue, and that can be > > resolved by wrapping returning ptr via unique_ptr or shared_ptr. One > > notable property of RAII is exception safe, which makes the code handle > > resource correctly when it throws in the middle. There are cases where > > memory allocation needs to be explicitly handled(e.g. GPU memory > > management) and reused where we need to do explicit management when > needed. > > > > > > As for exception handling, we do have a mechanism for handling > exceptions. > > when you do LOG(FATAL) or CHECK is caught at the C API boundary, which > > translates to return code -1 and an error is thrown on the python side. > > Throwing exception from another thread is a more tricky thing, which > > involves catching them in the engine, and usually, the state is not > correct > > in such case. But most of the cases when we need exception handling are > the > > simple case of opening a file and use CHECK should suffice. > > > > A better approach might be defining a new macro for errors intended to > > throw to a user and handled correctly, something like DMLC_EXPECT. But I > > find it might be a burden to put developer to distinguish what should be > a > > user error and a normal check, so we just use CHECK for now > > > > Tianqi > > > > On Thu, Jan 11, 2018 at 3:09 AM, Pedro Larroy < > > pedro.larroy.li...@gmail.com> > > wrote: > > > > > Hi > > > > > > I would like to encourage contributors to use RAII idioms in C++ > > > whenever possible to avoid resource leaks. > > > > > > RAII is an ugly acronym that stands for Resource Acquisition Is > > > Initialization, which basically means that you should almost never use > > > explicit new and delete operators and instead use std::make_shared, > > > std::make_unique and std::vector<uint8_t> and .data() for raw > > > buffers. Also always allocating OS resources in constructors releasing > > > them in destructors such as file descriptors. > > > > > > Asides from forgetting to call delete on an allocation, explicit > > > deletes are bad because an exception thrown in the middle prevents > > > delete from running entirely. > > > > > > This helps a lot writing correct, secure and exception safe code > > > without memory leaks. > > > > > > Another problem that I think is worth a discussion, is how to handle > > > exceptions and errors. Right now, I don't think there's a good way to > > > throw an exception in some functions without crashing the python > > > interpreter. I think we should come with a smart way to propagate > > > exceptions from the library up to the user runtime (python, scala...) > > > > > > As an example of what I'm talking about is this suspicious code that I > > > saw in a PR, which has several bugs in a few lines of code related to > > > what I'm discussing in this thread, crashing Python when trying to > > > open a file that doesn't exist. (How to propagate an exception in this > > > case?) > > > > > > https://github.com/apache/incubator-mxnet/pull/9370/files > > > > > > Please excuse the clickbait subject, just trying to grab your > > > attention in a humorous way now that the weekend is approaching. > > > > > > Pedro. > > > > > >