OT: Can we create a wiki page or some other form of knowledge pooling on benchmarking lustre?
Right now I'm using slides from 2009 as my source which may not be ideal... http://wiki.lustre.org/images/4/40/Wednesday_shpc-2009-benchmarking.pdf OT2: Did I miss the release announcement or was 2.10 never announced on this list? Thanks! Eli On Fri, Aug 4, 2017 at 8:49 PM, Patrick Farrell <p...@cray.com> wrote: > Brian, > > What is the actual crash? Null pointer, failed assertion/LBUG...? > Probably just a few more lines back in the log would show that. > > > Also, Lustre 2.10 has been released, you might benefit from switching to > that. There are almost certainly more bugs in this pre-2.10 development > version you're running than in the release. > > > - Patrick > ------------------------------ > *From:* lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on > behalf of Brian Andrus <toomuc...@gmail.com> > *Sent:* Friday, August 4, 2017 12:12:59 PM > *To:* lustre-discuss@lists.lustre.org > *Subject:* [lustre-discuss] nodes crash during ior test > > All, > > I am trying to run some ior benchmarking on a small system. > > It only has 2 OSSes. > I have been having some trouble where one of the clients will reboot and > do a crash dump somewhat arbitrarily. The runs will work most of the > time, but every 5 or so times, a client reboots and it is not always the > same client. > > The call trace seems to point to lnet: > > > 72095.973865] Call Trace: > [72095.973892] [<ffffffffa070e856>] ? cfs_percpt_unlock+0x36/0xc0 [libcfs] > [72095.973936] [<ffffffffa0779851>] > lnet_return_tx_credits_locked+0x211/0x480 [lnet] > [72095.973973] [<ffffffffa076c770>] lnet_msg_decommit+0xd0/0x6c0 [lnet] > [72095.974006] [<ffffffffa076d0f9>] lnet_finalize+0x1e9/0x690 [lnet] > [72095.974037] [<ffffffffa06baf45>] ksocknal_tx_done+0x85/0x1c0 [ksocklnd] > [72095.974068] [<ffffffffa06c3277>] ksocknal_handle_zcack+0x137/0x1e0 > [ksocklnd] > [72095.974101] [<ffffffffa06becf1>] > ksocknal_process_receive+0x3a1/0xd90 [ksocklnd] > [72095.974134] [<ffffffffa06bfa6e>] ksocknal_scheduler+0xee/0x670 > [ksocklnd] > [72095.974165] [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30 > [72095.974193] [<ffffffffa06bf980>] ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd] > [72095.974222] [<ffffffff810b0a4f>] kthread+0xcf/0xe0 > [72095.974244] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 > [72095.974272] [<ffffffff81697758>] ret_from_fork+0x58/0x90 > [72095.974296] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 > > I am currently using lustre 2.9.59_15_g107b2cb built for kmod > > Is there something I can do to track this down and hopefully remedy it? > > Brian Andrus > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org