Hi Peter, Jody, Andy:
I believe that what is being undertaken will cover everything promised
to customers. Doing less is not efficient, but if I've missed out on
something let me know.
The discussion below demonstrates the process that I think should be
followed in improving error messages - complete understanding, a
detailed consideration if the message is worth having at all (Jody and I
discussed that.) and then getting the most out of it.
I hope this is satisfactory.
- Peter -
1. Improve understandability, diagnostic content, and troubleshooting
instructions. Jody and I discussed and changed gobbledygook
(ptlrpc/client.c: 578 iirc) to:
"[Msgid:4711] Server XXXX encountered an error processing request
xid:XX, subsys:N, opcode:N, export:ABC. Check server logs."
2. The precise explanation was arrived at in discussion - "This error
is only printed when normal execution of the request at the server could
not take place. This can be caused by failed devices, unexpected setup
problems etc. This error message may under such circumstances appear at
multiple clients, but the cause of the problem lies on the server. The
message was generated in response to an error reply sent by the server
to the client, so communications are not suspect when this message is
printed. NEXT STEP: check server logs."
The quoted text can go into the message catalogue for Msgid 4711. We
really don't want a wiki page with explanations, we want this collection
of pages, in line with Andy's RAS architecture.
3. Flow chart - implicitly the flow chart information was added: "Go to
the server and check the logs" is the next step for this one. Drawing
the flow chart is a bit too early, but clearly we can find the
information required when we have done a few more messages.
_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel