RATIONALE: To improve the serviceability of Lustre, we need a common format for messages (errors and information) printed by Lustre.
GOALS: - Consistent, non-reuseable message IDs. Each ID must uniquely identify a message. This will make it easy to find messages in troubleshooting documentation, and allow us to produce a web-based tool that can interpret messages automatically. - It would be nice to have a consistent format for all variables printed as part of a message. DETAILS OF PROPOSED FORMAT: Messages are of the form: Lustre [ID 1234]: MESSAGE Messages must be as readable and useful to a Lustre administrator as is practicable. Messages should not contain any information that is only useful to engineers familiar with Lustre internals - these should be saved to the debug log instead. The message ID (1234 above) is a 4-digit decimal number. This should meet our needs for several years. To allow for future growth, message IDs beyond 4 digits are OK and must be accepted by any tool that parses these messages. Message IDs must never be reused. If a message or its format changes, a new message ID must be allocated and used instead. These message IDs will be assigned using a page on the Lustre wiki listing the engineer allocating the message and the source file in which it will be used. This will become obsolete quickly but that's OK - we only need to know that a message ID has been "claimed", by whom, and for what purpose. Once code that prints a message using a particular message ID has been committed to any branch of CVS, the format of the message may no longer be changed. Details of what the message means and how to interpret any variables in the message must be sent to an email address (to be determined.) This will go to the team(s) responsible for updating the troubleshooting documentation and the web-based analysis tool. Variables printed as part of messages must be formatted so that parsing by the web-based analysis tool can be done with regular expressions, and should be as readable and grammatical as possible. The value of the variable can be inserted as part of a text message, or it can be printed in the format: 'name: value', depending on what is the most legible. Values need not be human-readable (for example, printing -ERRNO return codes is still acceptable) provided that they can be translated into human-readable form by the web-based analysis tool. EXAMPLE MESSAGES: Lustre [ID 1000]: Lustre version 1.5.97 loaded Lustre [ID 1234]: Server handling error on server [EMAIL PROTECTED]: transaction 11602746/0, opcode 42 returned -2 _______________________________________________ Lustre-devel mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-devel
