I looked through your logs a bit and noticed that the OSD on node01 is crashing 
due to high latencies on disk access (I think the defaults for this case are it 
asserts out if there's no progress after 10 minutes or something). 

Based on that, I pretty much have to guess that there's just too much stress on 
your disk and it's going to cause problems. You can try loosening the various 
configurable timeouts to let it run longer but it seems like really you just 
need beefier disks for the amount of stuff you're doing to them. IIRC you're 
running a monitor and an OSD on the same 2.5" physical disk, which means 
they're colliding on stuff like sync() calls.

This general slowness doesn't explain the mds log corruption, although it might 
be one of the trigger conditions. I added another assert in the Journaler code 
which might have caused the problem (though I don't think it could have) but 
don't have any other new ideas.
-Greg



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to