Hi Brian,
Brian J. Murrell wrote:
On Thu, 2009-01-22 at 15:44 +0000, Wojciech Turek wrote:
Hello,
Hi,
Lustre MDS report following error:
Jan 22 15:20:40 mds01.beowulf.cluster kernel: LustreError:
24680:0:(lov_request.c:692:lov_update_create_set()) error creating fid
0xeb79c9d sub-object on OST idx 4/1: rc = -28
-28 is ENOSPC.
Which I translate as that one of the OST (index 4/1) is full and has
no space left on device.
Yes.
OSS seem to be consistent and says:
Jan 22 15:21:15 storage08.beowulf.cluster kernel: LustreError:
23507:0:(filter_io_26.c:721:filter_commitrw_write()) error starting
transaction: rc = -30
Hrm. I'm not sure a -30 (EROFS) would translate to a -28 to the MDS. I
think it would also be a -30. So are you sure you are looking at
correlating messages? The timestamps, if the two nodes are in sync also
seem to indicate a lack of correlation with 35s of disparity.
Perhaps there is an actual -28 in the OSS log prior to the -30 one?
Yes you are right, there is plenty of messages with -30 in the logs and
probably they are not related to -28.
Which I translate as Client would like to write to an existing file
but it can't because file system is read only.
Indeed. But why is it read-only? There should be an event in the OSS
log saying that it was turning the filesystem read-only.
The OST device is still mounted with rw option
Yeah. That's just the state at mount time. Lustre will set a device
read-only in the case of filesystem errors, as one example.
Now the main question is why Lustre thinks that OST(idx4) is full?
No, I think the main question is why is it read-only. The full
situation may have been transient where it filled up momentarily and
then some objects were removed. In any case, this is a secondary issue
and really only need be considered once the read-only situation is
understood.
Thank you for puting me in right track. I found these in the syslog
Jan 22 10:18:40 storage08.beowulf.cluster kernel: LDISKFS-fs error
(device dm-8): mb_free_blocks: double-free of inode 16203779's block
688627718(bit 8198 in group 21015)
Jan 22 10:18:40 storage08.beowulf.cluster kernel:
Jan 22 10:18:40 storage08.beowulf.cluster kernel: Remounting filesystem
read-only
Is this means that the file system may be corrupted? I am going to run
fsck -f on this device and try to mount it back, is that a right
procedure?
I did not find any errors on my S2A9500 storage, so I am not sure when
this corruption could occur.
Is it possible that this OST have meny orphaned objects which takes
all the available space?
That would be reflected in the df. If you suspect there may be orphan
objects though, you could lfsck to verify and clean.
Is there a way of reclaiming back this free space?
If you mean orphaned OST objects, then lfsck.
b.
Cheers
Wojciech
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss
|
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss