Hello!

On Oct 30, 2016, at 8:33 AM, Thomas Roth wrote:

> Hi all,
> 
> we have a larger amount of files that give ??? on 'ls' and the error "Cannot 
> allocate memory"
> The corresponding error on the OSS is
> "lvbo_init failed for resource ... rc = -2"
> 
> This seems similar to LU-5457 (although the OSTs do not go into disconn 
> state).
> Our filesystem is on Lustre 2.5.3, zfs 0.6.3, from the start. So per Oleg's 
> explanation,
> "this could be fallout from earlier sync failures where OST announced it 
> created some objects, failed to sync that to disk and then after dying and 
> restarting the objects that were handed out by MDTs out of this pool are no 
> longer there"
> 
> The affected OSTs are evenly distributed, however.
> Finding the creation time of those files is difficult at best, but I am not 
> aware of any series of crashes of so many OSSes in the recent months.
> And how can this happen with ZFS-OSTs? Should this be possible so easily?


   First of all, 2.5.3 is kind of old.

   The error itself means that you have a file on MDS, but no corresponding 
objects.
   The explanation in LU-5457 is just one possible scenario, but there might be 
others
   that cause the objects to be deleted.

   Is there a pattern to the files? I.e. is it so that all such files were 
created
   at aroudn the same time (if you cannot tell just by the filename/location, 
you might
   use debugfs/whatever zfs equivalent to look at inode modification time.)

   If they are distributed in time on different OSTs, but localised for every 
one OST
   individually, might be a good idea to check OST logs from that period.


Bye,
    Oleg
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to