[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

Evan Layton Thu, 20 Aug 2009 16:08:20 -0600

johansen at sun.com wrote:
> Danek asked the pkg(5) team to review this proposal, too.
> 
> High level comments:
> 
> - Given that many components of install and pkg use python, and that
>   the ZFS team is thinking about switching to python for libzfs, I'm
>   surprised that libbe is written in C.  It seems like python might be a
>   better choice, especially if most of its consumers use python.


The libbe library is written in c currently to take advantage of
certain functionality specific to libzfs which made for a much simpler
interface and code.

Re-writing this in python is well beyond the scope of what we are
attempting to do here. We will not be re-writing libbe in python simply
to handle error observability. We will also not be re-writing it
in python as long as libzfs is in c. If they really are planning on
doing this soon then we will indeed take a look at the possibility of
doing the re-write at that time.

> 
> - The error structure defined below is not sufficiently extensible.  It
>   may work for today's errors, but it's not flexible enough to grow with
>   any further changes to libbe.  Consumers aren't going to want to
>   change their code every 6 months, when new functionality gets added.

Can you describe what you are referring to here? I'm not sure I see
what new functionality within the the consumer would cause this to
need to be more extensible. The consumer decides what goes into the
strings described.

Is there something specific you have in mind here?

> 
> - If you're going to encapsulate the error information in an object, and
>   only allow access through the accessor functions, they need to be
>   refactored.  The error data should stay in the err_info_str.  These
>   functions should also be able to expand, should more errors get added
>   later.

These strings are generated outside the encapsulation and are not
static strings that are looked up. The error information is not
encapsulated only the structure is. The error codes will continue to
be passed back as an be_errno_t as it is today. Any current consumer
will continue to work without any changes.

Since the errors themselves are not part of the encapsulation and
the related strings are generated outside the accessor functions there
will be nothing to add when new errors are created. The errors live
outside this structure and only make use of the structure as a way to
return the strings but not the error itself.

> 
> - I don't see any proposal for a Python wrapper for this code.  If
>   you're intent upon writing this in C, there should be a corresponding
>   Python module with an exception handling heirarchy that knows how to
>   deal with your error reporting structures.

There may be in the future but for this project providing a separate
python wrapper is out of scope. The python changes that are in scope
are the changes to the libbe python wrapper that already exists with
be changed to handle this data.


-evan

> 
> On Thu, Aug 20, 2009 at 01:21:03PM -0700, Danek Duvall wrote:
>> If someone on the team has a little bit of spare time (Brock? Johansen?),
>> could they take a look at the BE error observability document Evan sent out
>> on caiman-discuss? 
> 
>> Date: Tue, 18 Aug 2009 17:40:34 -0600
>> From: Evan Layton <Evan.Layton at sun.com>
>> Subject: [caiman-discuss] Draft BE Error observability design
>> To: caiman-discuss <caiman-discuss at opensolaris.org>
> 
>> Error String definitions:
>>     The entry points into the library would be changed to pass back
>>     a structure containing strings with information about the failure.
>>     The idea here is that the caller will then be able to build their
>>     own error message based on this available information instead of
>>     being locked into the error message that we construct.
>>         - for example:
>>
>>           typedef struct err_info_str{
>>               char *cmd_str; /* who called into the library (beadm, pkg(5)) 
>> */
>>               char *op_str; /* operation such as BE activate */
>>               char *failed_at; /* where we failed */
>>               char *fail_msg_str; /* error string from failure */
>>               char *fail_fixit_str; /* Context for error, instructions on
>>                                        what to check or link to html content
>>                                        if instructions can't be determined. 
>> */
>>            } err_info_str_t;
> 
> I have a couple of issues with this approach.
> 
> In my high-level comments, I observed that this structure doesn't allow
> libbe to grow or add any other error information beyond what's provided
> here.  At a minimum, there should be a field, or linked list, that allow
> callers in the error handling chain to add ancillary information as an
> operation progresses.  The initial failure may be that a device was in
> use, but as the error is propagated up the call chain, there may be
> other failures that occur too.  How do you plan to handle this case?
> Augmenting the error object with other information is one approach, but
> there are certainly others.
> 
> There are other extensibility issues to consider.  In particular, this
> format is fixed to contain 5 pointers to char *.  I think it would make
> more sense to define a generic err_info structure that contains a code
> that describes the type of error that follows.  The sockaddr structure
> is a good example of something similar.  Depending upon what socket
> method is being used, the pointer to sockaddr is copied in and then read
> as a particular structure.  I can imagine that we might want to have
> multiple different types of err_info structures, but the current
> approach precludes that option.
> 
> As a practical matter, enforcing encapsulation when you have a series of
> pointers in an object is difficult.  I would consider using fixed length
> fields, if practical.  That way, it's less likely that the error strings
> will get leaked, or used after they've been freed.  I have some more
> comments on this, but I'll save them for the section on the accessor
> functions.
> 
>> Accessor Functions:
>> These functions are used to access the fields in the data structure as the 
>> structure itself will be encapsulated.
>>     - err_set_cmd_str(err_info_str_t *err_info, char *cmd_str)
>>         - This is used by the caller before calling into the library to set 
>> the
>>           name of the command or function calling into the library.
>>     - err_set_op_str(err_info_str_t *err_info, char *op_str)
>>         - This function adds the operation to the error string structure.
>>     - err_get_op_str(err_info_str_t *err_info)
>>        - this function will return the contents of the option string from
>>          err_info_str structure.
>>     - err_set_fail_strs(err_info_str_t *err_info, char *failed_at,
>>       char *fail_msg_str, char *fail_fixit_str)
>>         - This function fills in the failed_at, fail_str and fail_info into 
>> the
>>           err_info_str_t structure
>>         - This functions can be used to fill in any or all of these fields in
>>           the structure.
>>     - err_get_failed_at_str(err_info_str_t *err_info)
>>         - retrieves the failed_at string from the structure
>>     - err_get_fail_msg_str(err_info_str_t *err_info)
>>         - retrieves the fail_msg_str string from the structure
>>     - err_get_fixit_str(err_info_str_t *err_info)
>>         - retrieves the fail_fixit_str string from the structure
> 
> I think you could pare this down to to two functions:
> 
>       int err_get_str(err_info_str_t *ei, err_option_t eo, char *outstr,
>               size_t len);
> 
>       int err_set_str(err_info_t *ei, err_option_t eo, restrict char *instr);
> 
> Introducing a function for each member of the structure means that
> callers of the library get stuck using the various calls for all time.
> If you add another field, you have to add another method.  If you remove
> a field, then you either have to change all of the consumers, or leave a
> useless routine around.  You can always add more error option values,
> and enum or #define them to something sensible.  With this approach, the
> call interface doesn't change, but you can add or remove details from
> the encapsulated structure.
> 
> In addition to refining the interface, I would make sure that you have a
> good way of constructing and destroying these objects.  If they're tied
> to a boot-environment specifc handle that's great.  Otherwise, you'll
> have to make sure that the caller calls some destroy_my_error_object()
> routine to get these cleaned up.  It's not optimal, especially since
> some applications are haphazardly constructed.
> 
> The previous set of accessor functions appear to just return a pointer
> to the data that's contained in the err_info_str_t.  I'd like to advise
> against that approach since it violates encapsulation, and makes keeping
> track of these strings more difficult.  I would propose that in the case
> of set_str() the instr argument is copied and set_str allocates the
> storage, if it's available, for the string.  For get_str, the caller
> must provide the storage for get_str to copy the string into.  This
> makes it a lot easier to keep track of what routines are responsible for
> allocating and freeing memory.
> 
> This looks like a good start, but there's a lot more work to do.
> 
> -j
> _______________________________________________
> caiman-discuss mailing list
> caiman-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

Reply via email to