[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

[email protected] Thu, 20 Aug 2009 14:24:34 -0700

Danek asked the pkg(5) team to review this proposal, too.

High level comments:

- Given that many components of install and pkg use python, and that
  the ZFS team is thinking about switching to python for libzfs, I'm
  surprised that libbe is written in C.  It seems like python might be a
  better choice, especially if most of its consumers use python.

- The error structure defined below is not sufficiently extensible.  It
  may work for today's errors, but it's not flexible enough to grow with
  any further changes to libbe.  Consumers aren't going to want to
  change their code every 6 months, when new functionality gets added.

- If you're going to encapsulate the error information in an object, and
  only allow access through the accessor functions, they need to be
  refactored.  The error data should stay in the err_info_str.  These
  functions should also be able to expand, should more errors get added
  later.

- I don't see any proposal for a Python wrapper for this code.  If
  you're intent upon writing this in C, there should be a corresponding
  Python module with an exception handling heirarchy that knows how to
  deal with your error reporting structures.

On Thu, Aug 20, 2009 at 01:21:03PM -0700, Danek Duvall wrote:
> If someone on the team has a little bit of spare time (Brock? Johansen?),
> could they take a look at the BE error observability document Evan sent out
> on caiman-discuss? 

> Date: Tue, 18 Aug 2009 17:40:34 -0600
> From: Evan Layton <Evan.Layton at sun.com>
> Subject: [caiman-discuss] Draft BE Error observability design
> To: caiman-discuss <caiman-discuss at opensolaris.org>

> Error String definitions:
>     The entry points into the library would be changed to pass back
>     a structure containing strings with information about the failure.
>     The idea here is that the caller will then be able to build their
>     own error message based on this available information instead of
>     being locked into the error message that we construct.
>         - for example:
> 
>           typedef struct err_info_str{
>               char *cmd_str; /* who called into the library (beadm, pkg(5)) */
>               char *op_str; /* operation such as BE activate */
>               char *failed_at; /* where we failed */
>               char *fail_msg_str; /* error string from failure */
>               char *fail_fixit_str; /* Context for error, instructions on
>                                        what to check or link to html content
>                                        if instructions can't be determined. */
>            } err_info_str_t;

I have a couple of issues with this approach.

In my high-level comments, I observed that this structure doesn't allow
libbe to grow or add any other error information beyond what's provided
here.  At a minimum, there should be a field, or linked list, that allow
callers in the error handling chain to add ancillary information as an
operation progresses.  The initial failure may be that a device was in
use, but as the error is propagated up the call chain, there may be
other failures that occur too.  How do you plan to handle this case?
Augmenting the error object with other information is one approach, but
there are certainly others.

There are other extensibility issues to consider.  In particular, this
format is fixed to contain 5 pointers to char *.  I think it would make
more sense to define a generic err_info structure that contains a code
that describes the type of error that follows.  The sockaddr structure
is a good example of something similar.  Depending upon what socket
method is being used, the pointer to sockaddr is copied in and then read
as a particular structure.  I can imagine that we might want to have
multiple different types of err_info structures, but the current
approach precludes that option.

As a practical matter, enforcing encapsulation when you have a series of
pointers in an object is difficult.  I would consider using fixed length
fields, if practical.  That way, it's less likely that the error strings
will get leaked, or used after they've been freed.  I have some more
comments on this, but I'll save them for the section on the accessor
functions.

> Accessor Functions:
> These functions are used to access the fields in the data structure as the 
> structure itself will be encapsulated.
>     - err_set_cmd_str(err_info_str_t *err_info, char *cmd_str)
>         - This is used by the caller before calling into the library to set 
> the
>           name of the command or function calling into the library.
>     - err_set_op_str(err_info_str_t *err_info, char *op_str)
>         - This function adds the operation to the error string structure.
>     - err_get_op_str(err_info_str_t *err_info)
>        - this function will return the contents of the option string from
>          err_info_str structure.
>     - err_set_fail_strs(err_info_str_t *err_info, char *failed_at,
>       char *fail_msg_str, char *fail_fixit_str)
>         - This function fills in the failed_at, fail_str and fail_info into 
> the
>           err_info_str_t structure
>         - This functions can be used to fill in any or all of these fields in
>           the structure.
>     - err_get_failed_at_str(err_info_str_t *err_info)
>         - retrieves the failed_at string from the structure
>     - err_get_fail_msg_str(err_info_str_t *err_info)
>         - retrieves the fail_msg_str string from the structure
>     - err_get_fixit_str(err_info_str_t *err_info)
>         - retrieves the fail_fixit_str string from the structure

I think you could pare this down to to two functions:

        int err_get_str(err_info_str_t *ei, err_option_t eo, char *outstr,
                size_t len);

        int err_set_str(err_info_t *ei, err_option_t eo, restrict char *instr);

Introducing a function for each member of the structure means that
callers of the library get stuck using the various calls for all time.
If you add another field, you have to add another method.  If you remove
a field, then you either have to change all of the consumers, or leave a
useless routine around.  You can always add more error option values,
and enum or #define them to something sensible.  With this approach, the
call interface doesn't change, but you can add or remove details from
the encapsulated structure.

In addition to refining the interface, I would make sure that you have a
good way of constructing and destroying these objects.  If they're tied
to a boot-environment specifc handle that's great.  Otherwise, you'll
have to make sure that the caller calls some destroy_my_error_object()
routine to get these cleaned up.  It's not optimal, especially since
some applications are haphazardly constructed.

The previous set of accessor functions appear to just return a pointer
to the data that's contained in the err_info_str_t.  I'd like to advise
against that approach since it violates encapsulation, and makes keeping
track of these strings more difficult.  I would propose that in the case
of set_str() the instr argument is copied and set_str allocates the
storage, if it's available, for the string.  For get_str, the caller
must provide the storage for get_str to copy the string into.  This
makes it a lot easier to keep track of what routines are responsible for
allocating and freeing memory.

This looks like a good start, but there's a lot more work to do.

-j

[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

Reply via email to