> On Fri, Sep 3, 2010 at 8:02 AM, Bogdan Lobodzinski <[email protected]>
> wrote:
>>
>> Hello all,
>>
>> let me continue my troubles, the title can stay the same.
>> As I wrote, my ceph configuration survived my critical test
>> svn co https://root.cern.ch/svn/root/trunk root
>> and suddenly, during the night, at 5 oclock ceph became stuck again -
>> without any kind of user activity, no work at all with /ceph directory.
>> The node is running as
>> mds1, mon1, osd0
>>
>> System log file reports (the problem starts with entry:
>> "Sep 2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale" ):
>> --------
>> Sep 1 12:40:38 h1farm183 kernel: [10983.398458] Btrfs loaded
>> Sep 1 12:44:25 h1farm183 kernel: [11210.109913] ceph: loaded (mon/mds/osd
>> proto 15/32/24, osdmap 5/5 5/5)
>> Sep 1 13:08:25 h1farm183 kernel: [12650.255052] device fsid
>> 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
>> 1
>> Sep 1 14:25:06 h1farm183 kernel: [17251.100851] RPC: Registered udp
>> transport module.
>> Sep 1 14:25:06 h1farm183 kernel: [17251.100854] RPC: Registered tcp
>> transport module.
>> Sep 1 14:25:06 h1farm183 kernel: [17251.100855] RPC: Registered tcp NFSv4.1
>> backchannel transport module.
>> Sep 1 14:25:20 h1farm183 kernel: [17265.404967] device fsid
>> 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
>> 1
>> Sep 1 14:25:20 h1farm183 kernel: [17265.562870] udev: starting version 151
>> Sep 1 14:25:26 h1farm183 kernel: [17271.752817] device fsid
>> 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
>> 1
>> ...
>> Sep 1 16:41:51 h1farm183 kernel: [25456.385184] device fsid
>> 4940eafa1c110ce7-c14b44192348589f devid 1 transid 12 /dev/sdb1
>> Sep 1 16:42:21 h1farm183 kernel: [25486.297025] ceph: client4100 fsid
>> 4ea08089-acf1-b738-6f72-96c3ed029b71
>> Sep 1 16:42:21 h1farm183 kernel: [25486.297169] ceph: mon0
>> 131.169.74.116:6789 session established
>> Sep 2 02:37:54 h1farm183 rsyslogd: [origin software="rsyslogd"
>> swVersion="4.2.0" x-pid="863" x-info="http://www.rsyslog.com"] rsyslogd was
>> HUPed, type 'lightweight'.
>> Sep 2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale
>> Sep 2 05:44:57 h1farm183 kernel: [72441.976037] ceph: mds0 caps stale
>> Sep 2 05:45:27 h1farm183 kernel: [72472.066320] ceph: mds0 reconnect start
>> Sep 2 05:45:27 h1farm183 kernel: [72472.069681] Modules linked in: nfs
>> lockd nfs_acl auth_rpcgss sunrpc ceph btrfs zlib_deflate crc32c libcrc32c
>> ppdev lp parport openafs(P) ipt_MASQUERADE iptable_nat nf_nat
>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp
>> iptable_filter ip_tables x_tables bridge stp fbcon tileblit font bitblit
>> softcursor vga16fb vgastate radeon ttm mptctl drm_kms_helper bnx2 drm usbhid
>> i5000_edac hid dell_wmi shpchp edac_core agpgart i2c_algo_bit i5k_amb dcdbas
>> psmouse serio_raw mptsas mptscsih mptbase scsi_transport_sas [last unloaded:
>> kvm]
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332]
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] Pid: 6184, comm:
>> ceph-msgr/1 Tainted: P (2.6.32-24-generic-pae #42-Ubuntu)
>> PowerEdge 1950
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] EIP: 0060:[<c01ea907>]
>> EFLAGS: 00010246 CPU: 1
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] EIP is at
>> kunmap_high+0x97/0xa0
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] EAX: 00000000 EBX: f5d17000
>> ECX: c0916848 EDX: 00000292
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] ESI: c17ee940 EDI: f5d18000
>> EBP: f5fb3c6c ESP: f5fb3c64
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] DS: 007b ES: 007b FS: 00d8
>> GS: 00e0 SS: 0068
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] c07d9280 f50b10a0 f5fb3c74
>> c0138307 f5fb3c98 f9ad7d54 00000000 f5fb3cbc
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] <0> 00000038 0000002b
>> eaee1018 ee4bcd70 00000000 f5fb3d14 f9ada09d 00000000
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] <0> eaee108c 0000005c
>> f60bab40 eaee0e00 ee788440 f50b10a0 00000a21 00000000
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [<c0138307>] ?
>> kunmap+0x57/0x60
>> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [<f9ad7d54>] ?
>> ceph_pagelist_append+0x54/0x110 [ceph]
...
>> The node was stuck at all.
>> Do you know what can be a reason ?
Maybe the following patch fixes it? I'll push a fix to the unstable
branch, let me know if it works for you.
Thanks,
Yehuda
diff --git a/fs/ceph/pagelist.c b/fs/ceph/pagelist.c
index b6859f4..46a368b 100644
--- a/fs/ceph/pagelist.c
+++ b/fs/ceph/pagelist.c
@@ -5,10 +5,18 @@
#include "pagelist.h"
+static void ceph_pagelist_unmap_tail(struct ceph_pagelist *pl)
+{
+ struct page *page = list_entry(pl->head.prev, struct page,
+ lru);
+ kunmap(page);
+}
+
int ceph_pagelist_release(struct ceph_pagelist *pl)
{
if (pl->mapped_tail)
- kunmap(pl->mapped_tail);
+ ceph_pagelist_unmap_tail(pl);
+
while (!list_empty(&pl->head)) {
struct page *page = list_first_entry(&pl->head, struct page,
lru);
@@ -26,7 +34,7 @@ static int ceph_pagelist_addpage(struct ceph_pagelist *pl)
pl->room += PAGE_SIZE;
list_add_tail(&page->lru, &pl->head);
if (pl->mapped_tail)
- kunmap(pl->mapped_tail);
+ ceph_pagelist_unmap_tail(pl);
pl->mapped_tail = kmap(page);
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html