Error reporting in Mesos is based on Stout’s Error class and a variety of other classes, such as ErrnoError and WindowsError, that derive from Error. One goal of these classes is to simplify the construction and storage of error message strings. As an example, the ErrnoError constructor automatically incorporates std::string(strerror(errno)) into its error message.
Stout catches exceptions and converts them into Try<T> and Result<T> objects which are used as polymorphic return values that can express either an error condition or a proper return value. One consequence of catching exceptions is that call stacks from throw sites and error sites are lost. This makes it harder to debug production environment crashes post hoc. One way to improve debugability would be to add more information about the error site to the error message. This could be done by convention, but it seems that the vast majority of the error messages in Mesos currently don’t include much information about the site that originally generated an error. Because much of the code in Mesos uses Error and ErrnoError, we have an opportunity to add diagnostic information automatically with only a small amount of churn. What we’d need to do is provide Error creation macros that would pass the source file name and line number to the constructors of the various Error classes. Concretely this would involve - class Error - Rename declaration and definition to ErrorImpl (or ErrorBase). This change is not error prone. - Rename references to class’ type. These changes are mostly limited to result.hpp and try.hpp. - Don’t rename calls to constructor. These will now just invoke the macro. - Modify the constructor to take a line number and source filename in addition to a message. - Define a macro called Error(message) that has the same prototype as the original Error constructor. This macro would invoke ErrorImpl::ErrorImpl, passing the current line number and source filename along with the user-provided message. The macro would make use of __FILE__ and __LINE__, which are predefined in gcc and Visual Studio. - class ErrnoError, WindowsError. - Similar pattern as above. *Pros.* - Codebase is more debuggable in production environment because the vast majority of error sites will document the file name and line number in the error message. - Most error sites don’t require a code change. *Cons.* - Introduces a macro. In general, we’d like to move away from macros wherever possible. - Discourages subclassing of Error because each subclass would need its own special macro to call its constructor. A brief search of the code base seems to indicate that Mesos programmers don’t tend to declare subclasses of Error. Right now, it seems that subclasses exist mainly to provide different construction semantics. The big question is whether one would want to use subclasses down the road to distinguish between equivalence classes of errors, say those that should bring the process down vs those that are recoverable. - Some amount of code churn, mainly in error.hpp, result.hpp, and try.hpp. *Limitations.* - This approach by itself won’t address error sites that invoke convenience functions such as Try<T>::error() and Result<T>::error(). These could be addressed with a similar macro, but it would need to somehow deal with the template parameter as well. *Risks and Potential Complications.* - The rename process might fail to correctly distinguish between error sites (i.e. calls to Error::Error()) and other uses of the term “Error”. - There may be important compilers that don’t support __FILE__ and __LINE__. These predefined macros have been in the ANSI/ISO standard for a while, but it would still be wise to check each compiler we care about. - General hazards associated with macro expansion. - Users of Mesos may have written diagnostic tools that depend on the current values of the error strings. - Unit tests may depend on exact values of error strings. If this were the case, we'd have to provide a convenience function that strips off the file names and line numbers so that unit tests don't break when error sites move around due to edits elsewhere in the file. Let me know what you think of this idea. If the approach seems reasonable and a worthwhile, I would be willing to prototype it to see if it has some fatal flaw. Thanks for your input! -Mike