Hi Brian,

Brian J. Murrell wrote:
On Thu, 2009-01-22 at 15:44 +0000, Wojciech Turek wrote:
  
Hello,
    

Hi,

  
Lustre MDS report following error:
Jan 22 15:20:40 mds01.beowulf.cluster kernel: LustreError:
24680:0:(lov_request.c:692:lov_update_create_set()) error creating fid
0xeb79c9d sub-object on OST idx 4/1: rc = -28
    

-28 is ENOSPC.
 
  
Which I translate as that one of the OST (index 4/1) is full and has
no space left on device.
    

Yes.

  
OSS seem to be consistent and says:
Jan 22 15:21:15 storage08.beowulf.cluster kernel: LustreError:
23507:0:(filter_io_26.c:721:filter_commitrw_write()) error starting
transaction: rc = -30
    

Hrm.  I'm not sure a -30 (EROFS) would translate to a -28 to the MDS.  I
think it would also be a -30.  So are you sure you are looking at
correlating messages?  The timestamps, if the two nodes are in sync also
seem to indicate a lack of correlation with 35s of disparity.

Perhaps there is an actual -28 in the OSS log prior to the -30 one?
  
Yes you are right, there is plenty of messages with -30 in the logs and probably they are not related to -28.
Which  I translate as Client would like to write to an existing file
but it can't because file system is read only.
    

Indeed.  But why is it read-only?  There should be an event in the OSS
log saying that it was turning the filesystem read-only.

  
The OST device is still mounted with rw option
    

Yeah.  That's just the state at mount time.  Lustre will set a device
read-only in the case of filesystem errors, as one example.


  
Now the main question is why Lustre thinks that OST(idx4) is full?
    

No, I think the main question is why is it read-only.  The full
situation may have been transient where it filled up momentarily and
then some objects were removed.  In any case, this is a secondary issue
and really only need be considered once the read-only situation is
understood.
  
Thank you for puting me in right track. I found these in the syslog
Jan 22 10:18:40 storage08.beowulf.cluster kernel: LDISKFS-fs error (device dm-8): mb_free_blocks: double-free of inode 16203779's block 688627718(bit 8198 in group 21015)
Jan 22 10:18:40 storage08.beowulf.cluster kernel: 
Jan 22 10:18:40 storage08.beowulf.cluster kernel: Remounting filesystem read-only

Is this means that the file system may be corrupted? I am going to run fsck -f on this device and try to mount it back, is that a right procedure?
I did not find any errors on my S2A9500  storage, so I am not sure when this corruption could occur.


  
Is it possible that this OST have meny orphaned objects which takes
all the available space?
    

That would be reflected in the df.  If you suspect there may be orphan
objects though, you could lfsck to verify and clean.

  
Is there a way of reclaiming back this free space?
    

If you mean orphaned OST objects, then lfsck.

b.
  
Cheers

Wojciech
  

_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to