Hi Zahid,

On Tue, 2013-01-15 at 14:36 -0800, Zahid Chowdhury wrote:
> Hello,
>   I am running a Centos 5.5 (kernel 2.6.18-194.17.4.el5). I have used the
> Centos distribution with the nilfs kernel module 2.0.22 to statically build
> nilfs into the kernel (that's why I renamed 2.6.18-194.17.4.el5 to 
> 2.6.18-194.17.4.el5SSI_NILFS). I have enabled netconsole as the box is mostly 
> headless - the kernel panic messages below came up through netconsole. The 
> garbage collection daemon is nilfs-utils 2.1.0. The processor is a Intel(R) 
> Atom(TM) CPU D510 dual core with 2 contexts. The SSD is a Industrial Grade 
> Apacer 16GB SLC. At the time the kernel panicked there many (> 100) soft-real 
> time processes with nice levels of -19 running (the cleanerd runs at +19 nice 
> level as we have found that otherwise it disturbs the soft real-time 
> processes). These soft real-time processes also are memory hogs & cpu hogs 
> (less than < a few % idle even with all the cores/contexts), such that less 
> than a few K of memory is available (we will be fixing the apps, but still 
> nilfs should not panic the kernel) at anytime. We do allow overcommit and the 
> all processes are at the normal oom_adj value of 0 except for critical 
> processes like syslogd & klogd, sshd, crond, nilfs_cleanerd, ifplugd, 
> dbus-daemon. Btw, we did much testing and no kernel panics occurred over 
> weeks until I oom_adj the critical processes just today.
> 
> 
> Has anybody seen the kernel panic messages I see below? Is there any fix for 
> this in a Centos 5.5 kernel? Would upgrading to a newer nilfs module clear up 
> this panic? Would upgrading to a newer kernel clear up this panic? Upgrading 
> cleanerd? Any other suggestions/questions are very welcome. Thanks all.
> 

First of all, I think that it makes sense to try to upgrade kernel and
nilfs-utils. It needs to understand that your issue can be reproduced on
actual state of NILFS2 code.

Secondly, what value of vm.min_free_kbytes do you have in your system?
Do you have in system log any error messages about page allocation
failure?

Thirdly, I don't clearly understand currently how to try to reproduce
your issue. Could you describe in more details what filesystem
operations were before issue occurrence? Do you have any NILFS2-related
error messages in your system log before kernel panic?

Thanks,
Vyacheslav Dubeyko.

> 
> Zahid
> 
> P.S.: Panic flow over netconsole into syslogd - sorry for so many lines, alas 
> Solaris syslogd seems to wrap early:
> 
> Jan 15 12:22:38  ------------[ cut here ]------------
> Jan 15 12:22:38  kernel BUG at fs/nilfs2/page.c:317!
> Jan 15 12:22:38  invalid opcode: 0000 [#1]
> Jan 15 12:22:38  SMP
> Jan 15 12:22:38
> Jan 15 12:22:38  last sysfs file: 
> /devices/pci0000:00/0000:00:1c.0/0000:02:00.0/irq
> Jan 15 12:22:38  Modules linked in:
> Jan 15 12:22:38   netconsole
> Jan 15 12:22:38   autofs4
> Jan 15 12:22:38   dme1737
> Jan 15 12:22:38   hwmon_vid
> Jan 15 12:22:38   hidp
> Jan 15 12:22:38   l2cap
> Jan 15 12:22:38   bluetooth
> Jan 15 12:22:38   sunrpc
> Jan 15 12:22:38   bridge
> Jan 15 12:22:38   ip_nat_ftp
> Jan 15 12:22:38   ip_conntrack_ftp
> Jan 15 12:22:38   ip_conntrack_netbios_ns
> Jan 15 12:22:38   iptable_mangle
> Jan 15 12:22:38   iptable_filter
> Jan 15 12:22:38   ipt_MASQUERADE
> Jan 15 12:22:38   xt_tcpudp
> Jan 15 12:22:38   iptable_nat
> Jan 15 12:22:38   ip_nat
> Jan 15 12:22:38   ip_conntrack
> Jan 15 12:22:38   nfnetlink
> Jan 15 12:22:38   ip_tables
> Jan 15 12:22:38   x_tables
> Jan 15 12:22:38   loop
> Jan 15 12:22:38   dm_mirror
> Jan 15 12:22:38   dm_multipath
> Jan 15 12:22:38   scsi_dh
> Jan 15 12:22:38   video
> Jan 15 12:22:38   backlight
> Jan 15 12:22:38   sbs
> Jan 15 12:22:38   power_meter
> Jan 15 12:22:38   hwmon
> Jan 15 12:22:38   i2c_ec
> Jan 15 12:22:38   dell_wmi
> Jan 15 12:22:38   wmi
> Jan 15 12:22:38   button
> Jan 15 12:22:38   battery
> Jan 15 12:22:38   asus_acpi
> Jan 15 12:22:38   ac
> Jan 15 12:22:38   lp
> Jan 15 12:22:38   snd_hda_intel
> Jan 15 12:22:38   snd_seq_dummy
> Jan 15 12:22:38   sg
> Jan 15 12:22:38   snd_seq_oss
> Jan 15 12:22:38   snd_seq_midi_event
> Jan 15 12:22:38   snd_seq
> Jan 15 12:22:38   snd_seq_device
> Jan 15 12:22:38   snd_pcm_oss
> Jan 15 12:22:38   snd_mixer_oss
> Jan 15 12:22:38   snd_pcm
> Jan 15 12:22:38   snd_timer
> Jan 15 12:22:38   snd_page_alloc
> Jan 15 12:22:38   parport_pc
> Jan 15 12:22:38   e1000e
> Jan 15 12:22:38   pcspkr
> Jan 15 12:22:38   snd_hwdep
> Jan 15 12:22:38   serio_raw
> Jan 15 12:22:38   parport
> Jan 15 12:22:38   i2c_i801
> Jan 15 12:22:38   i2c_core
> Jan 15 12:22:38   snd
> Jan 15 12:22:38   soundcore
> Jan 15 12:22:38   dm_raid45
> Jan 15 12:22:38   dm_message
> Jan 15 12:22:38   dm_region_hash
> Jan 15 12:22:38   dm_log
> Jan 15 12:22:38   dm_mod
> Jan 15 12:22:38   dm_mem_cache
> Jan 15 12:22:38   usb_storage
> Jan 15 12:22:38   ata_piix
> Jan 15 12:22:38   libata
> Jan 15 12:22:38   sd_mod
> Jan 15 12:22:38   scsi_mod
> Jan 15 12:22:38   ext3
> Jan 15 12:22:38   jbd
> Jan 15 12:22:38   uhci_hcd
> Jan 15 12:22:38   ohci_hcd
> Jan 15 12:22:38   ehci_hcd
> Jan 15 12:22:38
> Jan 15 12:22:38  CPU:    0
> Jan 15 12:22:38  EIP:    0060:[<c04c078b>]    Not tainted VLI
> Jan 15 12:22:38  EFLAGS: 00010246   (2.6.18-194.17.4.el5SSI_NILFS #1)
> Jan 15 12:22:38  EIP is at nilfs_copy_page+0x29/0x198
> Jan 15 12:22:38  eax: 80010029   ebx: c1329100   ecx: 00000000   edx: c135de00
> Jan 15 12:22:38  esi: 00000000   edi: f6df3f30   ebp: f6df3cf4   esp: f7a14ca8
> Jan 15 12:22:38  ds: 007b   es: 007b   ss: 0068
> Jan 15 12:22:38  Process nilfs_cleanerd (pid: 1653, ti=f7a14000 task=f79c4000 
> task.ti=f7a14000)
> Jan 15 12:22:38
> Jan 15 12:22:38  Stack:
> Jan 15 12:22:38  ec2e8000
> Jan 15 12:22:38  e0461000
> Jan 15 12:22:38  c135de00
> Jan 15 12:22:38  c1585d00
> Jan 15 12:22:38  f6df3f30
> Jan 15 12:22:38  c0458ba8
> Jan 15 12:22:38  c135de00
> Jan 15 12:22:38  c1329100
> Jan 15 12:22:38
> Jan 15 12:22:38
> Jan 15 12:22:38  f6df3f30
> Jan 15 12:22:38  f6df3cf4
> Jan 15 12:22:38  c04c0ff2
> Jan 15 12:22:38  00001f8e
> Jan 15 12:22:38  00000005
> Jan 15 12:22:38  00001f7c
> Jan 15 12:22:38  0000000e
> Jan 15 12:22:38  00000000
> Jan 15 12:22:38
> Jan 15 12:22:38
> Jan 15 12:22:38  c1407240
> Jan 15 12:22:38  c12b5ac0
> Jan 15 12:22:38  c152afe0
> Jan 15 12:22:38  c1462ae0
> Jan 15 12:22:38  c1408c20
> Jan 15 12:22:38  c135de00
> Jan 15 12:22:38  c11fdda0
> Jan 15 12:22:38  c1503320
> Jan 15 12:22:38
> Jan 15 12:22:38  Call Trace:
> Jan 15 12:22:38   [<c0458ba8>]
> Jan 15 12:22:38  find_lock_page+0x1a/0x7e
> Jan 15 12:22:38   [<c04c0ff2>]
> Jan 15 12:22:38  nilfs_copy_back_pages+0xbb/0x1e7
> Jan 15 12:22:38   [<c04d2f3b>]
> Jan 15 12:22:38  nilfs_commit_gcdat_inode+0x83/0xa8
> Jan 15 12:22:38   [<c04cc0de>]
> Jan 15 12:22:38  nilfs_segctor_complete_write+0x1dd/0x301
> Jan 15 12:22:38   [<c04cd337>]
> Jan 15 12:22:38  nilfs_segctor_do_construct+0x1011/0x1384
> Jan 15 12:22:38   [<c045dbea>]
> Jan 15 12:22:38  __set_page_dirty_nobuffers+0xb0/0xd3
> Jan 15 12:22:38   [<c04c17f3>]
> Jan 15 12:22:38  nilfs_mdt_mark_block_dirty+0x41/0x47
> Jan 15 12:22:38   [<c04cd8c1>]
> Jan 15 12:22:38  nilfs_segctor_construct+0x82/0x261
> Jan 15 12:22:38   [<c04ceada>]
> Jan 15 12:22:38  nilfs_clean_segments+0xa9/0x1c4
> Jan 15 12:22:38   [<c04d26e2>]
> Jan 15 12:22:38  nilfs_ioctl+0x444/0x57d
> Jan 15 12:22:38   [<c0465900>]
> Jan 15 12:22:38  free_pgd_range+0x108/0x190
> Jan 15 12:22:38   [<c04d229e>]
> Jan 15 12:22:38  nilfs_ioctl+0x0/0x57d
> Jan 15 12:22:38   [<c048620d>]
> Jan 15 12:22:38  do_ioctl+0x1c/0x5d
> Jan 15 12:22:38   [<c04867a1>]
> Jan 15 12:22:38  vfs_ioctl+0x47b/0x4d3
> Jan 15 12:22:38   [<c041eef6>]
> Jan 15 12:22:38  enqueue_task+0x29/0x39
> Jan 15 12:22:38   [<c0486841>]
> Jan 15 12:22:38  sys_ioctl+0x48/0x5f
> Jan 15 12:22:38   [<c0404f17>]
> Jan 15 12:22:38  syscall_call+0x7/0xb
> Jan 15 12:22:38   =======================
> Jan 15 12:22:38  Code:
> Jan 15 12:22:38  00
> Jan 15 12:22:38  c3
> Jan 15 12:22:38  55
> Jan 15 12:22:38  57
> Jan 15 12:22:38  56
> Jan 15 12:22:38  89
> Jan 15 12:22:38  ce
> Jan 15 12:22:38  53
> Jan 15 12:22:38  89
> Jan 15 12:22:38  c3
> Jan 15 12:22:38  83
> Jan 15 12:22:38  ec
> Jan 15 12:22:38  18
> Jan 15 12:22:38  89
> Jan 15 12:22:38  54
> Jan 15 12:22:38  24
> Jan 15 12:22:38  08
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  00
> Jan 15 12:22:38  f6
> Jan 15 12:22:38  c4
> Jan 15 12:22:38  10
> Jan 15 12:22:38  74
> Jan 15 12:22:38  08
> Jan 15 12:22:38  0f
> Jan 15 12:22:38  0b
> Jan 15 12:22:38  3b
> Jan 15 12:22:38  01
> Jan 15 12:22:38  22
> Jan 15 12:22:38  1b
> Jan 15 12:22:38  66
> Jan 15 12:22:38  c0
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  54
> Jan 15 12:22:38  24
> Jan 15 12:22:38  08
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  02
> Jan 15 12:22:38  f6
> Jan 15 12:22:38  c4
> Jan 15 12:22:38  08
> Jan 15 12:22:38  75
> Jan 15 12:22:38  08
> Jan 15 12:22:38  f>
> Jan 15 12:22:38  0b
> Jan 15 12:22:38  3d
> Jan 15 12:22:38  01
> Jan 15 12:22:38  22
> Jan 15 12:22:38  1b
> Jan 15 12:22:38  66
> Jan 15 12:22:38  c0
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  03
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  7c
> Jan 15 12:22:38  24
> Jan 15 12:22:38  08
> Jan 15 12:22:38  f6
> Jan 15 12:22:38  c4
> Jan 15 12:22:38  08
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  6f
> Jan 15 12:22:38  0c
> Jan 15 12:22:38  75
> Jan 15 12:22:38
> Jan 15 12:22:38  EIP: [<c04c078b>]
> Jan 15 12:22:38  nilfs_copy_page+0x29/0x198
> Jan 15 12:22:38   SS:ESP 0068:f7a14ca8
> Jan 15 12:22:38
> Jan 15 12:22:38  Kernel panic - not syncing: Fatal exception
> Jan 15 12:22:38
> ~
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to