There is a similar thread on this mailing list: http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/afe24159554cd3ff/8b37bababf848123?lnk=gst&q=I%2FO+error+on+clients# Also there is a bug open which reports similar problem: https://bugzilla.lustre.org/show_bug.cgi?id=23190
On 23 July 2010 10:02, Larry <[email protected]> wrote: > we have the same problem when running namd in lustre sometimes, the > console log suggest file lock expired, but I don't know why. > > On Fri, Jul 23, 2010 at 8:12 AM, Wojciech Turek <[email protected]> wrote: > > Hi Richard, > > > > If the cause of the I/O errors is Lustre there will be some message in > the > > logs. I am seeing similar problem with some applications that run on our > > cluster. The symptoms are always the same, just before application > crashes > > with I/O error node gets evicted with a message like that: > > LustreError: 167-0: This client was evicted by ddn_data-OST000f; in > > progress operations using this service will fail. > > > > The OSS that mounts the OST from the above message has following line in > the > > log: > > LustreError: 0:0:(ldlm_lockd.c:305:waiting_locks_callback()) ### lock > > callback timer expired after 101s: evicting client at 10.143....@tcp > ns: > > filter-ddn_data-OST000f_UUID lock: ffff81021a84ba00/0x744b1dd44 > > 81e38b2 lrc: 3/0,0 mode: PR/PR res: 34959884/0 rrc: 2 type: EXT > > [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 > remote: > > 0x1d34b900a905375d expref: 9 pid: 1506 timeout 8374258376 > > > > Can you please check your logs for similar messages? > > > > Best regards > > > > Wojciech > > > > On 22 July 2010 23:43, Andreas Dilger <[email protected]> wrote: > >> > >> On 2010-07-22, at 14:59, Richard Lefebvre wrote: > >> > I have a problem with the Scalable molecular dynamics software NAMD. > It > >> > write restart files once in a while. But sometime the binary write > >> > crashes. The when it crashes is not constant. The only constant thing > is > >> > it happens when it writes on our Lustre file system. When it write on > >> > something else, it is fine. I can't seem find any errors in any of the > >> > /var/log/messages. Anyone had any problems with NAMD? > >> > >> Rarely has anyone complained about Lustre not providing error messages > >> when there is a problem, so if there is nothing in /var/log/messages on > >> either the client or the server then it is hard to know whether it is a > >> Lustre problem or not... > >> > >> If possible, you could try running the application under strace (limited > >> to the IO calls, or it would be much too much data) to see which system > call > >> the error is coming from. > >> > >> Cheers, Andreas > >> -- > >> Andreas Dilger > >> Lustre Technical Lead > >> Oracle Corporation Canada Inc. > >> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> [email protected] > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
