I was wrong about only one client having problems. It seems to be all of them, except the mds server (see below), so it is a problem of the filesystem (not the client) after all.
> Could you elaborate about how "broken" the files are? When I do an 'ls', the filenames are flashing in red (this is for example the case for broken symbolic links). Permissions, date and owner are missing, like in the middle of the next three lines: -rw------- 1 root root 18308319 Jul 16 2009 stat_1247756353.gz ?--------- ? ? ? ? ? stat_1248125742.gz drwxr-xr-x 2 stephane ukmhd 4096 Jul 8 2009 stephane Attempting to access the file more closely results in an I/O error: [r...@mhdc ~]# ls -l /workspace/ls-lR_2009-01-20 ls: /workspace/ls-lR_2009-01-20: Input/output error [r...@mhdc ~]# cp /workspace/ls-lR_2009-01-20 /tmp cp: cannot stat `/workspace/ls-lR_2009-01-20': Input/output error > > From your description and the error message you provide, I suspect that > one(or some) of the OSTs went down. What does `lctl dl` show? > The files are accessible from the mds server, and the OSTs seem visible from the "broken" clients: [r...@mhdc ~]# lctl dl 0 UP mgc mgc192.168.101....@tcp 63568484-f714-da05-c5c2-b96db1b22962 5 1 UP lov home-clilov-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 4 2 UP mdc home-MDT0000-mdc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5 3 UP osc home-OST0001-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5 4 UP osc home-OST0003-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5 5 UP osc home-OST0002-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5 6 UP osc home-OST0005-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5 7 UP osc home-OST0004-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5 8 UP osc home-OST0000-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5 Does this help? Herbert > 在 2010-11-18,下午8:18, Herbert Fruchtl 写道: > >> I have a Lustre (1.6.7) system that looks OKish (as far as I can see) from >> the >> mds and most of the clients. From one client however (the users' login >> machine) >> it looks broken. Some files are missing, some seem broken, and the df >> command >> hangs. >> >> Rebooting the client doesn't change anything. Is it broken, or is there some >> persistent information that I need to flush? When I do an ls on a partially >> broken directory, I get the following two lines in /var/log/messages: >> >> Nov 18 12:13:53 mhdc kernel: [ 7093.751196] LustreError: >> 10919:0:(file.c:999:ll_glimpse_size()) obd_enqueue returned rc -5, returning >> -EIO >> Nov 18 12:13:53 mhdc kernel: [ 7093.761098] LustreError: >> 10919:0:(file.c:999:ll_glimpse_size()) Skipped 9 previous similar messages >> >> Any ideas how to proceed with the least disruption? >> >> Thanks in advance, >> >> Herbert >> -- >> Herbert Fruchtl >> Senior Scientific Computing Officer >> School of Chemistry, School of Mathematics and Statistics >> University of St Andrews >> -- >> The University of St Andrews is a charity registered in Scotland: >> No SC013532 >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > -- Herbert Fruchtl Senior Scientific Computing Officer School of Chemistry, School of Mathematics and Statistics University of St Andrews -- The University of St Andrews is a charity registered in Scotland: No SC013532 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
