Hi John, Can you dump running threads? 'echo t > /proc/sysrq-trigger' and then attach the output (from dmesg or kern.log).
Thanks! sage On Sun, 26 Aug 2012, John Wright wrote: > Hi All, > We're running ceph 0.48 on small three node test cluster. We've had good > stability with I/O using dd and iozone especially after upgrading to 0.48. > However, we're running into a repeatable lockup of the linux ceph client ( > 3.3.5-2.fc16.x86_64 ) when running an mpi program that has simple I/O on a > ceph mount. This is an mpi program running processes on two nodes. It is the > remote node on which the ceph client locks up. The cient becomes immediately > unresponsive and any attempt to access the mounted volume produces a process > with status 'D'. I can see no indication in the server logs that it is ever > contacted. Regular serial processes run fine on the volume. MPI runs on the > nodes work fine when not using the ceph volume. > > So any suggestions on where to look? Any one have an experience testing > parallel programs on ceph? > > thanks, > -john > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
