Hi Richard, If the cause of the I/O errors is Lustre there will be some message in the logs. I am seeing similar problem with some applications that run on our cluster. The symptoms are always the same, just before application crashes with I/O error node gets evicted with a message like that: LustreError: 167-0: This client was evicted by ddn_data-OST000f; in progress operations using this service will fail.
The OSS that mounts the OST from the above message has following line in the log: LustreError: 0:0:(ldlm_lockd.c:305:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.143....@tcp ns: filter-ddn_data-OST000f_UUID lock: ffff81021a84ba00/0x744b1dd44 81e38b2 lrc: 3/0,0 mode: PR/PR res: 34959884/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x1d34b900a905375d expref: 9 pid: 1506 timeout 8374258376 Can you please check your logs for similar messages? Best regards Wojciech On 22 July 2010 23:43, Andreas Dilger <[email protected]> wrote: > On 2010-07-22, at 14:59, Richard Lefebvre wrote: > > I have a problem with the Scalable molecular dynamics software NAMD. It > > write restart files once in a while. But sometime the binary write > > crashes. The when it crashes is not constant. The only constant thing is > > it happens when it writes on our Lustre file system. When it write on > > something else, it is fine. I can't seem find any errors in any of the > > /var/log/messages. Anyone had any problems with NAMD? > > Rarely has anyone complained about Lustre not providing error messages when > there is a problem, so if there is nothing in /var/log/messages on either > the client or the server then it is hard to know whether it is a Lustre > problem or not... > > If possible, you could try running the application under strace (limited to > the IO calls, or it would be much too much data) to see which system call > the error is coming from. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
