Re: [Gluster-users] Disappointing documentation?

Joe Julian Tue, 05 Mar 2013 13:14:25 -0800

On 03/05/2013 09:57 AM, Brian Candler wrote:

On Tue, Mar 05, 2013 at 08:33:28AM -0800, Joe Julian wrote:

It comes up on this list from time to time that there's not
sufficient documentation on troubleshooting. I assume that's what
some people mean when they refer to disappointing documentation as
the current documentation is far more detailed and useful than it
was 3 years ago when I got started. I'm not really sure what's being
asked for here, nor am I sure how one would document how to
troubleshoot. In my mind, if there's a trouble that can be
documented with a clear path to resolution, then a bug report should
be filed and that should be fixed. Any other cases that cannot be
coded for require human intervention and are already documented.

When people come to this list and say "I am seeing split brain errors" or
"ls shows question marks for file attributes"

Article(s) on the official Q&A Site but that [censored] site can't findit with a search. Grrr.

or "I need to replace a failed server with a new one"

Article also on the official Q&A Site but again search isn't finding them.

I'll try to grab the contents of those and paste them into the wikisomewhere (unless you do it first. It is a wiki after all).

or "probing a server fails",

Agreed. This would be good. Does anyone actually know how to answerthis? Please write it up on the wiki. I know I even have troublesometimes figuring out why someone's probe fails.

I don't think there's
any official documentation to help them.

"Documenting how to troubleshoot" would include what log messages you should
look for and what they mean, what xattrs you should expect to see on the
bricks and what they mean (for each case of distributed, replicated etc).
Given a basic checklist of these things, it would be easy for users to
report to the list "I checked A, B and C and the output from B was XXXX when
the docs say it should be YYYY on a working system", which is at least a
starting point.

This is where all open source seems to hit problems. Sure, there's errormessages (at least they're not "Error ##" like mysql does...) but theyseem to generally only make sense to whomever wrote the software. Thereare 7216 log entries in the source. That's a lot of man-hours todocument all of those even without any degree of detail.

Now, there are only 136 critical errors but I'm not sure I've ever seenone of those. 2991 at the level of "error" so I'm really not sure howthat could be handled. Even if someone could volunteer 8 hours/day tospend 15 minutes describing each error message, it would take themaround 4 1/2 months. That's longer than a production cycle (granted,once they were documented the production cycle would be unlikely toproduce nearly 3000 new error messages).


I'd be willing to make the list and document 1 or 2 a day. Anyone else?

As far as I'm aware, the official admin guide is completely oblivious to
internals like this.

Users may be able to find suggestions by perusing mailing list archives, or
by trying gluster 2.x wiki documentation (which may be stale), or some blog
postings.

Thanks for pointing these out. Some I (obviously) wasn't even aware werea problem.

By the way - if anyone wants to copy-paste stuff from my blog into thewiki, feel free. I keep meaning to but have been behind schedule at workand just haven't had enough free time lately.

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disappointing documentation?

Reply via email to