On Jul 24, 2009 10:33 -0400, Craig Prescott wrote: > We've been testing some 1.8.0.1 patchless clients (RHEL5.3, x86_64, RPMs > from the Sun download page) with our 1.6.4.2 servers. > > The OSS nodes started logging these LustreErrors from the 1.8.0.1 clients: > > > LustreError: 7302:0:(ost_handler.c:1157:ost_brw_write()) client csum > > 8448447f, original server csum 66fb7cff, server csum now 66fb7cff > > LustreError: 7302:0:(ost_handler.c:1157:ost_brw_write()) Skipped 1 previous > > similar message > > LustreError: 7391:0:(ost_handler.c:1095:ost_brw_write()) client csum > > 9d8c7d6a, server csum 2cfdcb47 > > LustreError: 168-f: ufhpc-OST0004: BAD WRITE CHECKSUM: changed in transit > > before arrival at OST from 12345-10.13.28...@tcp inum 38470778/1485322248 > > object 67094039/0 extent [0-1023] > > Is this a known issue with running 1.8.0.1 clients against 1.6.4.2 > servers? We aren't seeing these messages in relation to our 1.6 clients.
This is a known issue if the clients are using mmap IO (which can change the kernel pages w/o notifying the kernel. It would be possible to fix this warning by adding a "file is mmapped" flag to the RPC and suppress the console error on the server and subsequent error message if the IO never makes it to the server at least once in the next 5 retries. Unfortunately, since this is a non-fatal error, nobody has worked on fixing it yet. > Looking through the Lustre bugzilla, I see bug 18296, which discusses > these messages, but it was logged against Lustre version 1.6.6. The 1.6 and 1.8 code is very similar, with only a handful of isolated features added. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
