Re: [lustre-discuss] Accessing files with bad PFL causing MDS kernel panics

Colin Faber via lustre-discuss Tue, 25 Oct 2022 14:16:20 -0700

Hi Nathan, looks like you're hitting
https://jira.whamcloud.com/browse/LU-16152


-cf


On Tue, Oct 25, 2022 at 2:43 PM Nathan Crawford <[email protected]> wrote:

> Hi All,
>
>   I'm looking for possible work-arounds to recover data from some
> mis-migrated files (as seen in  LU-16152). Basically, there's a bug in "lfs
> setstripe --yaml" where extent start/end values in the yaml file >= 2GiB
> overflow to 16 EiB - 2 GiB.
>
>   Using lfs_migrate, I re-striped many files in directories with a default
> striping pattern containing these values.  I'm pretty sure that the data
> exists (was trying to purge an older OST, and disk usage on the other OSTs
> increased as the purged OST decreased), and an lfsck procedure happily
> returns after a day or so. Unfortunately, attempts to access or re-migrate
> the files triggers a kernel panic on the MDS with:
>
> LustreError: 12576:0:(osd_io.c:311:kmem_to_page()) ASSERTION( !((unsigned
> long)addr & ~(~(((1UL) << 12)-1))) ) failed:
> LustreError: 12576:0:(osd_io.c:311:kmem_to_page()) LBUG
> Kernel panic - not syncing: LBUG
>
>  The servers are lustre 2.12.8 on OpenZFS 0.8.5 on CentOS 7.9. The output
> from "lfs getstripe -v badfile" is attached.
>
>   I can use lfs find to search for files with these bad extent endpoint
> values, then move them to a quarantine area on the same FS. This will allow
> the rest of the system to stay up (hopefully) but recovering the data is
> still needed.
>
> Thanks!
> Nate
>
> --
>
> Dr. Nathan Crawford              [email protected]
> Director of Scientific Computing
> School of Physical Sciences
> 164 Rowland Hall                 Office: 152 Rowland Hall
> University of California, Irvine  Phone: 949-824-1380
> Irvine, CA 92697-2025, USA
>
> _______________________________________________
> lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Accessing files with bad PFL causing MDS kernel panics

Reply via email to