-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael Stone wrote: | On Tue, Mar 04, 2008 at 08:22:31PM -0500, Benjamin M. Schwartz wrote: |> Michael Stone wrote: |> | My central error-handling goal has been to compactly express my |> | assumptions in a form that will prevent them from being violated in |> | ignorance. Should I have different goals? |> |> 1. I find Rainbow very impressive, and I am sure you are well aware of the |> various arguments made regarding error handling. | | Thank you. While it's true that I'm aware of some arguments regarding error | handling, I'm always interested in improving. It seems like one of the | most regularly failed challenges in the craft of programming. | |> In my view, restricting assertions to internal invariants provides an |> easy way of distinguishing problems in Rainbow from problems in |> Activities and other parts of the system. | | True, but the convention that I have established of separating error | messages into contract-violations and 'everything else', recorded in | per-activity logs and in a daemon-wide log (/var/log/rainbow) would seem | to accomplish similar goals.
I have not read the relevant Rainbow source, so I cannot comment very intelligently on this. However, if Rainbow wishes to log a contract violation, it should insert the phrase "contract violation" into the logfile. Otherwise, how is a person reading the log to know this? |> 2. Among your goals, you might consider maximizing the ability of novice |> programmers to figure out what they've done wrong. | | It's not my primary goal, but I'll agree that it's worth considering. | |> The wiki page on translation even goes so far as to |> recommend using gettext for error strings, so that users and |> administrators may debug the system without knowing English. I used the phrase "debug the system". That was a poor choice. I should say "recognize bugs in the system", and additionally "distinguish between bugs in the system and bugs in the activities they're developing". | | I'm still not convinced. Wouldn't we be better served by translating the | source code itself, or an overview of the source code like my 'Taste the | Rainbow' pages? | | Consider: in my experience, debugging consists of searching the diff | between one's mental model and reality from which it follows that the | material which should be translated is the material which provides the | clearest, most accurate mental model of the problem. Your experience is extremely unusual and non-representative. You are an expert computer scientist who frequently reads source code written by others. You are familiar with the OLPC operating system details, including D-Bus and the Bitfrost requirements, perhaps moreso than anyone else in the world. The people who will be reading these logfiles will be developers who are trying to debug their activities. The activity may have crashed because it attempted to violate a Bitfrost rule and was killed by Rainbow. These developers (ideally mostly children) will likely be building their activities by making small modifications to existing activities. That means most won't even understand their own code. How could you possibly expect them to understand yours? | Also consider: had there been an actual bug in Rainbow, which would have | been more useful to Waqas in diagnosing and fixing the problem: | translated error messages or better written or documented source code? Not fixing. It is absurd to imagine that any appreciable number of users will be able fix Rainbow bugs. Rather, when Rainbow experiences an internal error, it should be extremely obvious that the problem is with Rainbow. For example, an excellent type of behavior would be for Rainbow to print, in the logfile: RAINBOW BUG: Rainbow has encountered an internal error. This indicates a bug in Rainbow. The error code is 752. This line would be sufficient for activity developers to understand that the problem is not simply in their code. It also makes it possible for users to participate usefully in the development process, by reporting the bug in an unambiguous way. Error codes are also important because they allow users to identify problems even when e-mailing logfiles is impossible due to software bugs or lack of connectivity. This error line is also nice because it only needs to be translated once, with the error code number substituted programmatically. This output could be improved further by adding an additional sentence, such as: This error code indicates that Rainbow's directory permissions have reached an inconsistent state. This line, like a BSOD, serves mainly to make users feel like the system's designers want them to know what's going on in case of a failure. However, the implementation overhead is undeniably high, especially given the need for many translations. On the plus side, these strings also serve as documentation when reading the source code. | | Put another way, doesn't this kind of error message uselessly duplicate | information that is best recorded in the failing assertion itself (and | in the name of the function containing it, in this case, | | check_cwd(... [cwd=]/home/olpc/Activities/Qirat.activity) | assert ck.negative(W_OK, 0) | | ? I have no idea what any of those names mean, despite having looked at the source. I can now guess that "cwd" means "current working directory" and "ck" means "check", but I still have no idea what the code actually does. ~ Reading code is hard, and you should never expect anyone to do it unless they are planning on modifying that code. | |> 3. Did this assertion failure result in the termination of the Rainbow |> daemon? | | The present implementation calls clone() before executing any | activity-launching code. Termination of the child by failure to handle | the AssertionError is a design goal. | |> Raising exceptions for input errors has the distinct |> advantage of allowing one to catch exceptions thrown further down the call |> stack, instead of exiting. Note that when I say "specific exceptions", it |> would be perfectly reasonable to wrap up all errors due to permissions in |> a PermissionsException, etc. | | First, what can I reasonably expect to accomplish by catching such an | exception? You can print a sensible error message, such as "The current activity (Qirat) could not be launched because the permissions on its bundle directory are insecure." | Second, given that the exception is being raised in a child | process that may have been compromised by malicious data, I'm not | terribly interested in informing the main daemon to the particulars of | the failure; the log file is quite sufficient for my purposes. I agree; there is no need to send information up to the main daemon. I think specialized exceptions make it easier to achieve informative logfiles. - --Ben -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHzhlqUJT6e6HFtqQRAk4pAJ9xab+6sXvc6RSOqBLFkalBo4UFtgCff5B6 HJg89MaTolZ9rPryVhyzOAU= =DGoW -----END PGP SIGNATURE----- _______________________________________________ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel