Workload is mixed. We ran a rados cpool to backup the metadata pool.
So your thinking that truncating journal and purge queue (we are luminous) with a reset could bring us online missing just data from that day. (most when the issue started) If so we could continue our scan into our recovery partition and give it a try tomorrow after discussions with our recovery team. On Mon, Nov 5, 2018 at 7:40 PM Sergey Malinin <[email protected]> wrote: > What was your recent workload? There are chances not to lose much if it > was mostly read ops. If such, you *must backup your metadata pool via > "rados export" in order to preserve omap data*, then try truncating > journals (along with purge queue if supported by your ceph version), wiping > session table, and resetting the fs. > > > On 6.11.2018, at 03:26, Rhian Resnick <[email protected]> wrote: > > That was our original plan. So we migrated to bigger disks and have space > but recover dentry uses up all our memory (128 GB) and crashes out. > > On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin <[email protected]> wrote: > >> I had the same problem with multi-mds. I solved it by freeing up a little >> space on OSDs, doing "recover dentries", truncating the journal, and then >> "fs reset". After that I was able to revert to single-active MDS and kept >> on running for a year until it failed on 13.2.2 upgrade :)) >> >> >> On 6.11.2018, at 03:18, Rhian Resnick <[email protected]> wrote: >> >> Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used >> all space on OSD and now 2 ranks report damage. The recovery tools on the >> journal fail as they run out of memory leaving us with the option of >> truncating the journal and loosing data or recovering using the scan tools. >> >> Any ideas on solutions are welcome. I posted all the logs and and cluster >> design previously but am happy to do so again. We are not desperate but we >> are hurting with this long downtime. >> >> On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin <[email protected]> wrote: >> >>> What kind of damage have you had? Maybe it is worth trying to get MDS to >>> start and backup valuable data instead of doing long running recovery? >>> >>> >>> On 6.11.2018, at 02:59, Rhian Resnick <[email protected]> wrote: >>> >>> Sounds like I get to have some fun tonight. >>> >>> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin <[email protected] wrote: >>> >>>> inode linkage (i.e. folder hierarchy) and file names are stored in omap >>>> data of objects in metadata pool. You can write a script that would >>>> traverse through all the metadata pool to find out file names correspond to >>>> objects in data pool and fetch required files via 'rados get' command. >>>> >>>> > On 6.11.2018, at 02:26, Sergey Malinin <[email protected]> wrote: >>>> > >>>> > Yes, 'rados -h'. >>>> > >>>> > >>>> >> On 6.11.2018, at 02:25, Rhian Resnick <[email protected]> wrote: >>>> >> >>>> >> Does a tool exist to recover files from a cephfs data partition? We >>>> are rebuilding metadata but have a user who needs data asap. >>>> >> _______________________________________________ >>>> >> ceph-users mailing list >>>> >> [email protected] >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> > >>>> >>>> >>> >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
