[caiman-discuss] Draft BE Error observability design

Danek Duvall Sat, 22 Aug 2009 11:24:46 -0700

Evan Layton wrote:

> Error String definitions:
>     The entry points into the library would be changed to pass back
>     a structure containing strings with information about the failure.
>     The idea here is that the caller will then be able to build their
>     own error message based on this available information instead of
>     being locked into the error message that we construct.
>         - for example:
> 
>           typedef struct err_info_str{
>               char *cmd_str; /* who called into the library (beadm, pkg(5)) */
>               char *op_str; /* operation such as BE activate */
>               char *failed_at; /* where we failed */
>               char *fail_msg_str; /* error string from failure */
>               char *fail_fixit_str; /* Context for error, instructions on
>                                        what to check or link to html content
>                                        if instructions can't be determined. */
>            } err_info_str_t;


I think strings are a poor choice for this sort of interface, particularly
informational strings, which will need to be localized anyway.

>         char *cmd_str - This is the name of the command or function that has
>                         called into the library. It is expected that the
>                         caller will provide this information before calling
>                         into the library if they which this to be part of the
>                         error strings. At some later point it is expected that
>                         this would be used for logging purposes.

I don't understand the purpose of this member at all.  The caller knows
what it is, and so has no use for this information.  And I don't see what
use the library internals would have for this, either.  You talk about
logging, but you don't say whether it's the caller or the library that
would be doing the logging.  I'm dubious that the library should do any
logging, but if it did, I don't know why this information would need to be
passed *back* -- it would need to be passed *in*, and this isn't that
interface (which would more likely be getexecname() or some thread-local
storage).

>         char *op_str - This is the higher level operation being performed, 
> such
>                        as activating a BE.
>         char *failed_at - This is the functional area or call that failed. 
> This
>                            would be things like zfs_mount or installgrub

These two are great candidates for #defines or enums.

>         char *fail_msg_str - This is any error string returned form the failed
>                          call. This would be things like the contents of the
>                          string returned by libzfs_error_description() or the
>                          captured error string from installgrub.

This one should be derivable from the previous two, and thus simply a
library call.  In the case of command output, the data would need to be
stored by the library for later retrieval, rather than being a static
string or a simple wrapper around another library function.

>         char *fail_fixit_str - This string will be generated by the code 
> calling
>                            into the code that will fill in the structure. This
>                            generated string will contain information on what
>                            the user can do to attempt to fix the cause of the
>                            failure. If this can not be determined then instead
>                            of this information a link to more information with
>                            possible fixes for errors in this area will be used
>                            for this string.

This also seems derivable from the operation and failure codes, but also
has the property that different clients will likely want to suggest
different ways forward.  A commandline app won't talk about "click this,
select that menu item", while a GUI app won't talk about what to type next.
Given that, this may be a good candidate for a code rather than a string.

I think johansen may have mentioned it, but you don't have any form of
multiple-error encapsulation, either a heirarchy or a list.  Given that
your operations are complex, you may run into several errors along the way,
some of which you managed to recover from, and some of which you didn't.
Although a normal user might not want to see everything, having that
debugging information available may prove invaluable.  A list is probably
sufficient here.

You'd also need some way of distinguishing classes of errors -- an error
which was safely handled, an error which could not be handled, an error
which occurred trying to handle a previous error, etc.

Danek

[caiman-discuss] Draft BE Error observability design

Reply via email to