Hi Zahid, On Tue, 2013-01-15 at 14:36 -0800, Zahid Chowdhury wrote: > Hello, > I am running a Centos 5.5 (kernel 2.6.18-194.17.4.el5). I have used the > Centos distribution with the nilfs kernel module 2.0.22 to statically build > nilfs into the kernel (that's why I renamed 2.6.18-194.17.4.el5 to > 2.6.18-194.17.4.el5SSI_NILFS). I have enabled netconsole as the box is mostly > headless - the kernel panic messages below came up through netconsole. The > garbage collection daemon is nilfs-utils 2.1.0. The processor is a Intel(R) > Atom(TM) CPU D510 dual core with 2 contexts. The SSD is a Industrial Grade > Apacer 16GB SLC. At the time the kernel panicked there many (> 100) soft-real > time processes with nice levels of -19 running (the cleanerd runs at +19 nice > level as we have found that otherwise it disturbs the soft real-time > processes). These soft real-time processes also are memory hogs & cpu hogs > (less than < a few % idle even with all the cores/contexts), such that less > than a few K of memory is available (we will be fixing the apps, but still > nilfs should not panic the kernel) at anytime. We do allow overcommit and the > all processes are at the normal oom_adj value of 0 except for critical > processes like syslogd & klogd, sshd, crond, nilfs_cleanerd, ifplugd, > dbus-daemon. Btw, we did much testing and no kernel panics occurred over > weeks until I oom_adj the critical processes just today. > > > Has anybody seen the kernel panic messages I see below? Is there any fix for > this in a Centos 5.5 kernel? Would upgrading to a newer nilfs module clear up > this panic? Would upgrading to a newer kernel clear up this panic? Upgrading > cleanerd? Any other suggestions/questions are very welcome. Thanks all. >
First of all, I think that it makes sense to try to upgrade kernel and nilfs-utils. It needs to understand that your issue can be reproduced on actual state of NILFS2 code. Secondly, what value of vm.min_free_kbytes do you have in your system? Do you have in system log any error messages about page allocation failure? Thirdly, I don't clearly understand currently how to try to reproduce your issue. Could you describe in more details what filesystem operations were before issue occurrence? Do you have any NILFS2-related error messages in your system log before kernel panic? Thanks, Vyacheslav Dubeyko. > > Zahid > > P.S.: Panic flow over netconsole into syslogd - sorry for so many lines, alas > Solaris syslogd seems to wrap early: > > Jan 15 12:22:38 ------------[ cut here ]------------ > Jan 15 12:22:38 kernel BUG at fs/nilfs2/page.c:317! > Jan 15 12:22:38 invalid opcode: 0000 [#1] > Jan 15 12:22:38 SMP > Jan 15 12:22:38 > Jan 15 12:22:38 last sysfs file: > /devices/pci0000:00/0000:00:1c.0/0000:02:00.0/irq > Jan 15 12:22:38 Modules linked in: > Jan 15 12:22:38 netconsole > Jan 15 12:22:38 autofs4 > Jan 15 12:22:38 dme1737 > Jan 15 12:22:38 hwmon_vid > Jan 15 12:22:38 hidp > Jan 15 12:22:38 l2cap > Jan 15 12:22:38 bluetooth > Jan 15 12:22:38 sunrpc > Jan 15 12:22:38 bridge > Jan 15 12:22:38 ip_nat_ftp > Jan 15 12:22:38 ip_conntrack_ftp > Jan 15 12:22:38 ip_conntrack_netbios_ns > Jan 15 12:22:38 iptable_mangle > Jan 15 12:22:38 iptable_filter > Jan 15 12:22:38 ipt_MASQUERADE > Jan 15 12:22:38 xt_tcpudp > Jan 15 12:22:38 iptable_nat > Jan 15 12:22:38 ip_nat > Jan 15 12:22:38 ip_conntrack > Jan 15 12:22:38 nfnetlink > Jan 15 12:22:38 ip_tables > Jan 15 12:22:38 x_tables > Jan 15 12:22:38 loop > Jan 15 12:22:38 dm_mirror > Jan 15 12:22:38 dm_multipath > Jan 15 12:22:38 scsi_dh > Jan 15 12:22:38 video > Jan 15 12:22:38 backlight > Jan 15 12:22:38 sbs > Jan 15 12:22:38 power_meter > Jan 15 12:22:38 hwmon > Jan 15 12:22:38 i2c_ec > Jan 15 12:22:38 dell_wmi > Jan 15 12:22:38 wmi > Jan 15 12:22:38 button > Jan 15 12:22:38 battery > Jan 15 12:22:38 asus_acpi > Jan 15 12:22:38 ac > Jan 15 12:22:38 lp > Jan 15 12:22:38 snd_hda_intel > Jan 15 12:22:38 snd_seq_dummy > Jan 15 12:22:38 sg > Jan 15 12:22:38 snd_seq_oss > Jan 15 12:22:38 snd_seq_midi_event > Jan 15 12:22:38 snd_seq > Jan 15 12:22:38 snd_seq_device > Jan 15 12:22:38 snd_pcm_oss > Jan 15 12:22:38 snd_mixer_oss > Jan 15 12:22:38 snd_pcm > Jan 15 12:22:38 snd_timer > Jan 15 12:22:38 snd_page_alloc > Jan 15 12:22:38 parport_pc > Jan 15 12:22:38 e1000e > Jan 15 12:22:38 pcspkr > Jan 15 12:22:38 snd_hwdep > Jan 15 12:22:38 serio_raw > Jan 15 12:22:38 parport > Jan 15 12:22:38 i2c_i801 > Jan 15 12:22:38 i2c_core > Jan 15 12:22:38 snd > Jan 15 12:22:38 soundcore > Jan 15 12:22:38 dm_raid45 > Jan 15 12:22:38 dm_message > Jan 15 12:22:38 dm_region_hash > Jan 15 12:22:38 dm_log > Jan 15 12:22:38 dm_mod > Jan 15 12:22:38 dm_mem_cache > Jan 15 12:22:38 usb_storage > Jan 15 12:22:38 ata_piix > Jan 15 12:22:38 libata > Jan 15 12:22:38 sd_mod > Jan 15 12:22:38 scsi_mod > Jan 15 12:22:38 ext3 > Jan 15 12:22:38 jbd > Jan 15 12:22:38 uhci_hcd > Jan 15 12:22:38 ohci_hcd > Jan 15 12:22:38 ehci_hcd > Jan 15 12:22:38 > Jan 15 12:22:38 CPU: 0 > Jan 15 12:22:38 EIP: 0060:[<c04c078b>] Not tainted VLI > Jan 15 12:22:38 EFLAGS: 00010246 (2.6.18-194.17.4.el5SSI_NILFS #1) > Jan 15 12:22:38 EIP is at nilfs_copy_page+0x29/0x198 > Jan 15 12:22:38 eax: 80010029 ebx: c1329100 ecx: 00000000 edx: c135de00 > Jan 15 12:22:38 esi: 00000000 edi: f6df3f30 ebp: f6df3cf4 esp: f7a14ca8 > Jan 15 12:22:38 ds: 007b es: 007b ss: 0068 > Jan 15 12:22:38 Process nilfs_cleanerd (pid: 1653, ti=f7a14000 task=f79c4000 > task.ti=f7a14000) > Jan 15 12:22:38 > Jan 15 12:22:38 Stack: > Jan 15 12:22:38 ec2e8000 > Jan 15 12:22:38 e0461000 > Jan 15 12:22:38 c135de00 > Jan 15 12:22:38 c1585d00 > Jan 15 12:22:38 f6df3f30 > Jan 15 12:22:38 c0458ba8 > Jan 15 12:22:38 c135de00 > Jan 15 12:22:38 c1329100 > Jan 15 12:22:38 > Jan 15 12:22:38 > Jan 15 12:22:38 f6df3f30 > Jan 15 12:22:38 f6df3cf4 > Jan 15 12:22:38 c04c0ff2 > Jan 15 12:22:38 00001f8e > Jan 15 12:22:38 00000005 > Jan 15 12:22:38 00001f7c > Jan 15 12:22:38 0000000e > Jan 15 12:22:38 00000000 > Jan 15 12:22:38 > Jan 15 12:22:38 > Jan 15 12:22:38 c1407240 > Jan 15 12:22:38 c12b5ac0 > Jan 15 12:22:38 c152afe0 > Jan 15 12:22:38 c1462ae0 > Jan 15 12:22:38 c1408c20 > Jan 15 12:22:38 c135de00 > Jan 15 12:22:38 c11fdda0 > Jan 15 12:22:38 c1503320 > Jan 15 12:22:38 > Jan 15 12:22:38 Call Trace: > Jan 15 12:22:38 [<c0458ba8>] > Jan 15 12:22:38 find_lock_page+0x1a/0x7e > Jan 15 12:22:38 [<c04c0ff2>] > Jan 15 12:22:38 nilfs_copy_back_pages+0xbb/0x1e7 > Jan 15 12:22:38 [<c04d2f3b>] > Jan 15 12:22:38 nilfs_commit_gcdat_inode+0x83/0xa8 > Jan 15 12:22:38 [<c04cc0de>] > Jan 15 12:22:38 nilfs_segctor_complete_write+0x1dd/0x301 > Jan 15 12:22:38 [<c04cd337>] > Jan 15 12:22:38 nilfs_segctor_do_construct+0x1011/0x1384 > Jan 15 12:22:38 [<c045dbea>] > Jan 15 12:22:38 __set_page_dirty_nobuffers+0xb0/0xd3 > Jan 15 12:22:38 [<c04c17f3>] > Jan 15 12:22:38 nilfs_mdt_mark_block_dirty+0x41/0x47 > Jan 15 12:22:38 [<c04cd8c1>] > Jan 15 12:22:38 nilfs_segctor_construct+0x82/0x261 > Jan 15 12:22:38 [<c04ceada>] > Jan 15 12:22:38 nilfs_clean_segments+0xa9/0x1c4 > Jan 15 12:22:38 [<c04d26e2>] > Jan 15 12:22:38 nilfs_ioctl+0x444/0x57d > Jan 15 12:22:38 [<c0465900>] > Jan 15 12:22:38 free_pgd_range+0x108/0x190 > Jan 15 12:22:38 [<c04d229e>] > Jan 15 12:22:38 nilfs_ioctl+0x0/0x57d > Jan 15 12:22:38 [<c048620d>] > Jan 15 12:22:38 do_ioctl+0x1c/0x5d > Jan 15 12:22:38 [<c04867a1>] > Jan 15 12:22:38 vfs_ioctl+0x47b/0x4d3 > Jan 15 12:22:38 [<c041eef6>] > Jan 15 12:22:38 enqueue_task+0x29/0x39 > Jan 15 12:22:38 [<c0486841>] > Jan 15 12:22:38 sys_ioctl+0x48/0x5f > Jan 15 12:22:38 [<c0404f17>] > Jan 15 12:22:38 syscall_call+0x7/0xb > Jan 15 12:22:38 ======================= > Jan 15 12:22:38 Code: > Jan 15 12:22:38 00 > Jan 15 12:22:38 c3 > Jan 15 12:22:38 55 > Jan 15 12:22:38 57 > Jan 15 12:22:38 56 > Jan 15 12:22:38 89 > Jan 15 12:22:38 ce > Jan 15 12:22:38 53 > Jan 15 12:22:38 89 > Jan 15 12:22:38 c3 > Jan 15 12:22:38 83 > Jan 15 12:22:38 ec > Jan 15 12:22:38 18 > Jan 15 12:22:38 89 > Jan 15 12:22:38 54 > Jan 15 12:22:38 24 > Jan 15 12:22:38 08 > Jan 15 12:22:38 8b > Jan 15 12:22:38 00 > Jan 15 12:22:38 f6 > Jan 15 12:22:38 c4 > Jan 15 12:22:38 10 > Jan 15 12:22:38 74 > Jan 15 12:22:38 08 > Jan 15 12:22:38 0f > Jan 15 12:22:38 0b > Jan 15 12:22:38 3b > Jan 15 12:22:38 01 > Jan 15 12:22:38 22 > Jan 15 12:22:38 1b > Jan 15 12:22:38 66 > Jan 15 12:22:38 c0 > Jan 15 12:22:38 8b > Jan 15 12:22:38 54 > Jan 15 12:22:38 24 > Jan 15 12:22:38 08 > Jan 15 12:22:38 8b > Jan 15 12:22:38 02 > Jan 15 12:22:38 f6 > Jan 15 12:22:38 c4 > Jan 15 12:22:38 08 > Jan 15 12:22:38 75 > Jan 15 12:22:38 08 > Jan 15 12:22:38 f> > Jan 15 12:22:38 0b > Jan 15 12:22:38 3d > Jan 15 12:22:38 01 > Jan 15 12:22:38 22 > Jan 15 12:22:38 1b > Jan 15 12:22:38 66 > Jan 15 12:22:38 c0 > Jan 15 12:22:38 8b > Jan 15 12:22:38 03 > Jan 15 12:22:38 8b > Jan 15 12:22:38 7c > Jan 15 12:22:38 24 > Jan 15 12:22:38 08 > Jan 15 12:22:38 f6 > Jan 15 12:22:38 c4 > Jan 15 12:22:38 08 > Jan 15 12:22:38 8b > Jan 15 12:22:38 6f > Jan 15 12:22:38 0c > Jan 15 12:22:38 75 > Jan 15 12:22:38 > Jan 15 12:22:38 EIP: [<c04c078b>] > Jan 15 12:22:38 nilfs_copy_page+0x29/0x198 > Jan 15 12:22:38 SS:ESP 0068:f7a14ca8 > Jan 15 12:22:38 > Jan 15 12:22:38 Kernel panic - not syncing: Fatal exception > Jan 15 12:22:38 > ~ > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
