Hello Folks,
Recently we've had a fair number of messages akin to the following
coming from out OSS syslog:
n0004: LustreError: 168-f: lrc-OST0002: BAD WRITE CHECKSUM: changed in transit
before arrival at OST from 12345-10.4.8....@o2ib inum 1409775/2324736913 object
1771080/0 extent [401408-2809855]
n0004: LustreError: Skipped 13 previous similar messages
n0004: LustreError: 10839:0:(ost_handler.c:1169:ost_brw_write()) client csum
ae09a542, original server csum cfb6ab4b, server csum now cfb6ab4b
There appear to be no specific clients, OSSs or OSTs in common. We'll commonly
get a block of messages concerning one OST w/ different clients involved and
then move on to another OST. As such, I'm doubting this is a memory issue.
Previous mails on this list mention MMAP, but there doesn't seem to be any
mention in these messages. Ideas?
----------------
John White
High Performance Computing Services (HPCS)
(510) 486-7307
One Cyclotron Rd, MS: 50B-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss