On Thu, 2010-04-08 at 10:56 -0400, Lawrence Sorrillo wrote: > I am about to try to build lustre again as I am getting hangs with the > lustre mounts in my previous build. > > "Apr 7 09:09:30 host0 kernel: LustreError: > 5270:0:(o2iblnd_cb.c:2883:kiblnd_check_txs()) Timed out tx: active_txs, > 9 seconds > Apr 7 09:09:30 host0 kernel: LustreError: > 5270:0:(o2iblnd_cb.c:2945:kiblnd_check_conns()) Timed out RDMA with > 172.17.1....@o2ib (84)"
What makes you think that this is a software problem and that rebuilding the software stack will resolve it? FWIW, every time I have seen this type of problem reported, the fabric was flaky. > Here is the plan. Lustre 1.8.2 on rhel5 x86_64 using the ofed in the rhel5 > kernel. In case it's not what you mean, why don't you just use the pre-built packages that we have built and extensively tested in our QA department for you? > I have gathered the following packages from the lustre site: > e2fsprogs-1.41.6.sun1-0redhat.rhel5.x86_64.rpm > kernel-2.6.18-164.6.1.0.1.el5.src.rpm Why do you need a kernel src.rpm? > lustre-client-1.8.2-2.6.18_164.6.1.0.1.el5_lustre.1.8.2.x86_64.rpm > lustre-client-modules-1.8.2-2.6.18_164.6.1.0.1.el5_lustre.1.8.2.x86_64.rpm > > I want to get the kernel-2.6.18-164.6.1.0.1.el5.x86_64.rpm binary from > kernel-2.6.18-164.6.1.0.1.el5.src.rpm. Why not just use the binary kernel we provide instead of rebuilding your own? It's the *exact* same kernel that we used in our QA testing and therefore a known quantity. b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
