We decided to go ahead and try truncating the journal, but before we did, we
would try to back it up. However, there are ridiculous values in the header. It
can't write a journal this large because (I presume) my ext4 filesystem can't
seek to this position in the (sparse) file.
I would not be surprised to learn that memory allocation is trying to do
something similar, hence the allocation of all available memory. This seems
like a new kind of journal corruption that isn't being reported correctly.
[root@lima /]# time cephfs-journal-tool --cluster=prodstore journal export
backup.bin
journal is 24652730602129~673601102
2019-04-01 17:49:52.776977 7fdcb999e040 -1 Error 22 ((22) Invalid argument)
seeking to 0x166be9401291
Error ((22) Invalid argument)
real 0m27.832s
user 0m2.028s
sys 0m3.438s
[root@lima /]# cephfs-journal-tool --cluster=prodstore event get summary
Events by type:
EXPORT: 187
IMPORTFINISH: 182
IMPORTSTART: 182
OPEN: 3133
SUBTREEMAP: 129
UPDATE: 42185
Errors: 0
[root@lima /]# cephfs-journal-tool --cluster=prodstore header get
{
"magic": "ceph fs volume v011",
"write_pos": 24653404029749,
"expire_pos": 24652730602129,
"trimmed_pos": 24652730597376,
"stream_format": 1,
"layout": {
"stripe_unit": 4194304,
"stripe_count": 1,
"object_size": 4194304,
"pool_id": 2,
"pool_ns": ""
}
}
[root@lima /]# printf "%x\n" "24653404029749"
166c1163c335
[root@lima /]# printf "%x\n" "24652730602129"
166be9401291
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com