Hi Nathan, looks like you're hitting https://jira.whamcloud.com/browse/LU-16152
-cf On Tue, Oct 25, 2022 at 2:43 PM Nathan Crawford <[email protected]> wrote: > Hi All, > > I'm looking for possible work-arounds to recover data from some > mis-migrated files (as seen in LU-16152). Basically, there's a bug in "lfs > setstripe --yaml" where extent start/end values in the yaml file >= 2GiB > overflow to 16 EiB - 2 GiB. > > Using lfs_migrate, I re-striped many files in directories with a default > striping pattern containing these values. I'm pretty sure that the data > exists (was trying to purge an older OST, and disk usage on the other OSTs > increased as the purged OST decreased), and an lfsck procedure happily > returns after a day or so. Unfortunately, attempts to access or re-migrate > the files triggers a kernel panic on the MDS with: > > LustreError: 12576:0:(osd_io.c:311:kmem_to_page()) ASSERTION( !((unsigned > long)addr & ~(~(((1UL) << 12)-1))) ) failed: > LustreError: 12576:0:(osd_io.c:311:kmem_to_page()) LBUG > Kernel panic - not syncing: LBUG > > The servers are lustre 2.12.8 on OpenZFS 0.8.5 on CentOS 7.9. The output > from "lfs getstripe -v badfile" is attached. > > I can use lfs find to search for files with these bad extent endpoint > values, then move them to a quarantine area on the same FS. This will allow > the rest of the system to stay up (hopefully) but recovering the data is > still needed. > > Thanks! > Nate > > -- > > Dr. Nathan Crawford [email protected] > Director of Scientific Computing > School of Physical Sciences > 164 Rowland Hall Office: 152 Rowland Hall > University of California, Irvine Phone: 949-824-1380 > Irvine, CA 92697-2025, USA > > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
