So my colleague Sean Crosby and I were looking through the logs (with debug mds 
= 10) and found some references just before the crash to inode number. We 
converted it from HEX to decimal and got something like 109953*5*627776 (last 
few digits not necessarily correct). We set one digit up i.e to 109953*6*627776 
and used that as the value for take_inos i.e:

$ cephfs-table-tool all take_inos 1099536627776


After that, the MDS could start successfully and we have a HEALTH_OK cluster 
once more!


It would still be useful if `show inode` in cephfs-table-tool actually shows us 
the max inode number at least though. And I think take_inos should be 
documented as well in the Disaster Recovery guide. :)


We'll be monitoring the cluster for the next few days. Hopefully nothing too 
interesting to share after this! 😉


Cheers,

Linh

________________________________
From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Linh Vu 
<v...@unimelb.edu.au>
Sent: Monday, 25 June 2018 7:06:45 PM
To: ceph-users
Subject: [ceph-users] Help! Luminous 12.2.5 CephFS - MDS crashed and now won't 
start (failing at MDCache::add_inode)


Hi all,


We have a Luminous 12.2.5 cluster, running entirely just CephFS with 1 active 
and 1 standby MDS. The active MDS crashed and now won't start again with this 
same error:

#######

     0> 2018-06-25 16:11:21.136203 7f01c2749700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/mds/MDCache.cc:
 In function 'void MDCache::add_inode(CInode*)' thread 7f01c2749700 time 
2018-06-25 16:11:21.133236
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/mds/MDCache.cc:
 262: FAILED assert(!p)
#######

Right before that point is just a bunch of client connection requests.

There are also a few other inode errors such as:

#######
2018-06-25 09:30:37.889166 7f934c1e5700 -1 log_channel(cluster) log [ERR] : 
loaded dup inode 0x1000098f00a [2,head] v3426852030 at 
~mds0/stray5/1000098f00a, but inode 0x1000098f00a.head v3426838533 already 
exists at ~mds0/stray2/1000098f00a
#######

We've done this for recovery:

$ make sure all MDS are shut down (all crashed by this point anyway)
$ ceph fs set myfs cluster_down true
$ cephfs-journal-tool journal export backup.bin
$ cephfs-journal-tool event recover_dentries summary
Events by type:
  FRAGMENT: 9
  OPEN: 29082
  SESSION: 15
  SUBTREEMAP: 241
  UPDATE: 171835
Errors: 0
$ cephfs-table-tool all reset session
{
    "0": {
        "data": {},
        "result": 0
    }
}
$ cephfs-table-tool all reset inode
{
    "0": {
        "data": {},
        "result": 0
    }
}
$ cephfs-journal-tool --rank=myfs:0 journal reset

old journal was 35714605847583~423728061

new journal start will be 35715031236608 (1660964 bytes past old end)
writing journal head
writing EResetJournal entry
done
$ ceph mds fail 0
$ ceph fs reset hpc_projects --yes-i-really-mean-it
$ start up MDS again

However, we keep getting the same error as above.

We found this: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-December/023136.html 
which has a similar issue, and some suggestions on using the cephfs-table-tool 
take_inos command, as our problem looks like we can't create new inodes. 
However, we don't quite understand the show inode or take_inos command. On our 
cluster, we see this:

$ cephfs-table-tool 0 show inode
{
    "0": {
        "data": {
            "version": 1,
            "inotable": {
                "projected_free": [
                    {
                        "start": 1099511627776,
                        "len": 1099511627776
                    }
                ],
                "free": [
                    {
                        "start": 1099511627776,
                        "len": 1099511627776
                    }
                ]
            }
        },
        "result": 0
    }
}

Our test cluster shows the exact same output, and running `cephfs-table-tool 
all take_inos 100000` (on the test cluster) doesn't seem to do anything to the 
output of the above, and also the inode number from creating new files doesn't 
seem to jump +100K from where it was (likely we misunderstood how take_inos 
works). On our test cluster (no recovery nor reset has been run on it), the 
latest max inode, from our file creation and running `ls -li` is 1099511627792, 
just a tiny bit bigger than the "start" value above which seems to match the 
file count we've created on it.

How do we find out what is our latest max inode on our production cluster, when 
`show inode` doesn't seem to show us anything useful?


Also, FYI, over a week ago, we had a network failure, and had to perform 
recovery then. The recovery seemed OK, but there were some clients that were 
still running jobs from previously and seemed to have recovered so we were 
still in the process of draining and rebooting them as they finish their jobs. 
Some would come back with bad files but nothing that caused troubles until now.

Very much appreciate any help!

Cheers,

Linh
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to