On 15/08/2019 03.40, Jeff Layton wrote:
On Wed, 2019-08-14 at 19:29 +0200, Ilya Dryomov wrote:
Jeff, the oops seems to be a NULL dereference in ceph_lock_message().
Please take a look.


(sorry for duplicate mail -- the other one ended up in moderation)

Thanks Ilya,

That function is pretty straightforward. We don't do a whole lot of
pointer chasing in there, so I'm a little unclear on where this would
have crashed. Right offhand, that kernel is probably missing
1b52931ca9b5b87 (ceph: remove duplicated filelock ref increase), but
that seems unlikely to result in an oops.

Hector, if you have the debuginfo for this kernel installed on one of
these machines, could you run gdb against the ceph.ko module and then
do:

      gdb> list *(ceph_lock_message+0x212)

That may give me a better hint as to what went wrong.

This is what I get:

(gdb)  list *(ceph_lock_message+0x212)
0xd782 is in ceph_lock_message (/build/linux-hwe-B83fOS/linux-hwe-4.18.0/fs/ceph/locks.c:116). 111 req->r_wait_for_completion = ceph_lock_wait_for_completion;
112
113             err = ceph_mdsc_do_request(mdsc, inode, req);
114
115             if (operation == CEPH_MDS_OP_GETFILELOCK) {
116 fl->fl_pid = -le64_to_cpu(req->r_reply_info.filelock_reply->pid); 117 if (CEPH_LOCK_SHARED == req->r_reply_info.filelock_reply->type)
118                             fl->fl_type = F_RDLCK;
119 else if (CEPH_LOCK_EXCL == req->r_reply_info.filelock_reply->type)
120                             fl->fl_type = F_WRLCK;

Disasm:

   0x000000000000d77b <+523>:   mov    0x250(%rbx),%rdx
   0x000000000000d782 <+530>:   mov    0x20(%rdx),%rdx
   0x000000000000d786 <+534>:   neg    %edx
   0x000000000000d788 <+536>:   mov    %edx,0x48(%r15)

That means req->r_reply_info.filelock_reply was NULL.


--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to