Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

Pickett, Neale T Mon, 01 Apr 2019 11:32:18 -0700

We decided to go ahead and try truncating the journal, but before we did, we 
would try to back it up. However, there are ridiculous values in the header. It 
can't write a journal this large because (I presume) my ext4 filesystem can't 
seek to this position in the (sparse) file.



I would not be surprised to learn that memory allocation is trying to do 
something similar, hence the allocation of all available memory. This seems 
like a new kind of journal corruption that isn't being reported correctly.

[root@lima /]# time cephfs-journal-tool --cluster=prodstore journal export 
backup.bin
journal is 24652730602129~673601102
2019-04-01 17:49:52.776977 7fdcb999e040 -1 Error 22 ((22) Invalid argument) 
seeking to 0x166be9401291
Error ((22) Invalid argument)

real    0m27.832s
user    0m2.028s
sys     0m3.438s
[root@lima /]# cephfs-journal-tool --cluster=prodstore event get summary
Events by type:
  EXPORT: 187
  IMPORTFINISH: 182
  IMPORTSTART: 182
  OPEN: 3133
  SUBTREEMAP: 129
  UPDATE: 42185
Errors: 0
[root@lima /]# cephfs-journal-tool --cluster=prodstore header get
{
    "magic": "ceph fs volume v011",
    "write_pos": 24653404029749,
    "expire_pos": 24652730602129,
    "trimmed_pos": 24652730597376,
    "stream_format": 1,
    "layout": {
        "stripe_unit": 4194304,
        "stripe_count": 1,
        "object_size": 4194304,
        "pool_id": 2,
        "pool_ns": ""
    }
}

[root@lima /]# printf "%x\n" "24653404029749"
166c1163c335
[root@lima /]# printf "%x\n" "24652730602129"
166be9401291

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

Reply via email to