** Changed in: linux (Ubuntu)
       Status: Incomplete => In Progress

** Changed in: linux (Ubuntu)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu)
     Assignee: (unassigned) => Rafael David Tinoco (inaddy)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1607355

Title:
  Task (usually mongod) blocked more 120 seconds (lock-ups) in juju on
  lxc/lxd + zfs

Status in linux package in Ubuntu:
  In Progress

Bug description:
  I was able to reproduce this 2 or 3 times last 2 days. I have the
  following setup:

  Containers for Trusty/kilo service machines:

  inaddy@workstation:~$ lxc-ls  | grep tk
  tkcephmon01  RUNNING 0         -      192.168.65.52 -    
  tkcephmon02  RUNNING 0         -      192.168.65.51 -    
  tkcephmon03  RUNNING 0         -      192.168.65.48 -    
  tkcinder     RUNNING 0         -      192.168.65.49 -    
  tkdash       RUNNING 0         -      192.168.65.50 -    
  tkglance     RUNNING 0         -      192.168.65.53 -    
  tkjuju       RUNNING 0         -      192.168.65.15 -    
  tkkeystone   RUNNING 0         -      192.168.65.54 -    
  tkmysql      RUNNING 0         -      192.168.65.55 -    
  tknova       RUNNING 0         -      192.168.65.56 -    
  tkrabbit     RUNNING 0         -      192.168.65.57 -    
  tkswiftproxy RUNNING 0         -      192.168.65.58 -    

  And compute nodes + neutrongw as kvm guests:

  inaddy@workstation:~$ virsh list --all | grep tk
   21    tkcompute01                    running
   22    tkcompute02                    running
   23    tkcompute03                    running
   24    tkneutrongw                    running

  All my LXC containers are on top of ZFS:

  Linux workstation 4.4.0-32-generic #51-Ubuntu SMP Tue Jul 19 18:09:07
  UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

  And my KVM guests are on top of ext4 + 1.2 raid0 stripped volume.

  I'm getting the lockups bellow (usually for mongod, from tkjuju
  container, the juju controller). After the first lockup appears
  (schedule timeout coming from zfs sync logic most likely), JuJu
  controller starts giving me errors on "update-status". From "juju
  status":

  glance/0                error          idle        1.25.6  10      9292/tcp   
                tkglance       hook failed: "update-status"
  keystone/0              error          idle        1.25.6  11                 
                tkkeystone     hook failed: "update-status"
  mysql/0                 error          idle        1.25.6  12                 
                tkmysql        hook failed: "config-changed"
  neutron-api/0           error          idle        1.25.6  4       9696/tcp   
                tkneutrongw    hook failed: "update-status"
  nova-compute/0          error          idle        1.25.6  1                  
                tkcompute01    hook failed: "update-status"
  nova-compute/1          error          idle        1.25.6  2                  
                tkcompute02    hook failed: "update-status"
  nova-compute/2          error          idle        1.25.6  3                  
                tkcompute03    hook failed: "update-status"

  Lockups:

  [105601.816578] INFO: task mongod:14480 blocked for more than 120 seconds.
  [105601.816583]       Tainted: P           O    4.4.0-32-generic #51-Ubuntu
  [105601.816584] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [105601.816586] mongod          D ffff88010ec47ba8     0 14480  16855 
0x00000100
  [105601.816590]  ffff88010ec47ba8 0000000057992eeb ffff880108e5ee00 
ffff880108e58dc0
  [105601.816592]  ffff88010ec48000 ffff88081ecd6d00 7fffffffffffffff 
ffffffff8182a600
  [105601.816594]  ffff88010ec47d08 ffff88010ec47bc0 ffffffff81829e05 
0000000000000000
  [105601.816596] Call Trace:
  [105601.816603]  [<ffffffff8182a600>] ? bit_wait+0x60/0x60
  [105601.816606]  [<ffffffff81829e05>] schedule+0x35/0x80
  [105601.816608]  [<ffffffff8182cf25>] schedule_timeout+0x1b5/0x270
  [105601.816612]  [<ffffffff8118d939>] ? find_get_pages_tag+0x109/0x190
  [105601.816614]  [<ffffffff8182a600>] ? bit_wait+0x60/0x60
  [105601.816616]  [<ffffffff81829334>] io_schedule_timeout+0xa4/0x110
  [105601.816618]  [<ffffffff8182a61b>] bit_wait_io+0x1b/0x70
  [105601.816620]  [<ffffffff8182a1ad>] __wait_on_bit+0x5d/0x90
  [105601.816622]  [<ffffffff8118d04b>] wait_on_page_bit+0xcb/0xf0
  [105601.816625]  [<ffffffff810c3ce0>] ? autoremove_wake_function+0x40/0x40
  [105601.816628]  [<ffffffff8118d163>] __filemap_fdatawait_range+0xf3/0x160
  [105601.816630]  [<ffffffff8118d1e4>] filemap_fdatawait_range+0x14/0x30
  [105601.816632]  [<ffffffff8118f0df>] filemap_write_and_wait_range+0x3f/0x70
  [105601.816682]  [<ffffffffc0902ea8>] zpl_fsync+0x38/0x90 [zfs]
  [105601.816685]  [<ffffffff8124116b>] vfs_fsync_range+0x4b/0xb0
  [105601.816687]  [<ffffffff811cabee>] SyS_msync+0x17e/0x1f0
  [105601.816689]  [<ffffffff8182def2>] entry_SYSCALL_64_fastpath+0x16/0x71
  [117121.961545] INFO: task txg_sync:4589 blocked for more than 120 seconds.
  [117121.961549]       Tainted: P           O    4.4.0-32-generic #51-Ubuntu
  [117121.961550] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [117121.961551] txg_sync        D ffff8807e1fbbaa8     0  4589      2 
0x00000000
  [117121.961558]  ffff8807e1fbbaa8 ffff88081ed96d00 ffff8807faf03700 
ffff8807e2dbb700
  [117121.961560]  ffff8807e1fbc000 ffff88081ed16d00 7fffffffffffffff 
ffff88017a8f2208
  [117121.961561]  0000000000000001 ffff8807e1fbbac0 ffffffff81829e05 
0000000000000000
  [117121.961563] Call Trace:
  [117121.961569]  [<ffffffff81829e05>] schedule+0x35/0x80
  [117121.961571]  [<ffffffff8182cf25>] schedule_timeout+0x1b5/0x270
  [117121.961574]  [<ffffffff810ac0c2>] ? default_wake_function+0x12/0x20
  [117121.961576]  [<ffffffff810c35e2>] ? __wake_up_common+0x52/0x90
  [117121.961578]  [<ffffffff81829334>] io_schedule_timeout+0xa4/0x110
  [117121.961586]  [<ffffffffc0759bec>] cv_wait_common+0xbc/0x140 [spl]
  [117121.961589]  [<ffffffff810c3ca0>] ? wake_atomic_t_function+0x60/0x60
  [117121.961593]  [<ffffffffc0759cc8>] __cv_wait_io+0x18/0x20 [spl]
  [117121.961633]  [<ffffffffc08fd2fe>] zio_wait+0x10e/0x1f0 [zfs]
  [117121.961653]  [<ffffffffc0886c58>] dsl_pool_sync+0xb8/0x430 [zfs]
  [117121.961676]  [<ffffffffc08a25b6>] spa_sync+0x366/0xb30 [zfs]
  [117121.961677]  [<ffffffff810ac0c2>] ? default_wake_function+0x12/0x20
  [117121.961701]  [<ffffffffc08b3a4a>] txg_sync_thread+0x3ba/0x630 [zfs]
  [117121.961725]  [<ffffffffc08b3690>] ? txg_delay+0x180/0x180 [zfs]
  [117121.961729]  [<ffffffffc0754e31>] thread_generic_wrapper+0x71/0x80 [spl]
  [117121.961732]  [<ffffffffc0754dc0>] ? __thread_exit+0x20/0x20 [spl]
  [117121.961734]  [<ffffffff810a0808>] kthread+0xd8/0xf0
  [117121.961736]  [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0
  [117121.961737]  [<ffffffff8182e28f>] ret_from_fork+0x3f/0x70
  [117121.961739]  [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0
  [117121.961831] INFO: task mongod:14480 blocked for more than 120 seconds.
  [117121.961832]       Tainted: P           O    4.4.0-32-generic #51-Ubuntu
  [117121.961833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [117121.961834] mongod          D ffff88010ec47ba8     0 14480  16855 
0x00000100
  [117121.961836]  ffff88010ec47ba8 0000000057995bf6 ffff8807faf01b80 
ffff880108e58dc0
  [117121.961837]  ffff88010ec48000 ffff88081ec96d00 7fffffffffffffff 
ffffffff8182a600
  [117121.961839]  ffff88010ec47d08 ffff88010ec47bc0 ffffffff81829e05 
0000000000000000
  [117121.961841] Call Trace:
  [117121.961843]  [<ffffffff8182a600>] ? bit_wait+0x60/0x60
  [117121.961844]  [<ffffffff81829e05>] schedule+0x35/0x80
  [117121.961846]  [<ffffffff8182cf25>] schedule_timeout+0x1b5/0x270
  [117121.961849]  [<ffffffff8118d939>] ? find_get_pages_tag+0x109/0x190
  [117121.961851]  [<ffffffff8182a600>] ? bit_wait+0x60/0x60
  [117121.961852]  [<ffffffff81829334>] io_schedule_timeout+0xa4/0x110
  [117121.961854]  [<ffffffff8182a61b>] bit_wait_io+0x1b/0x70
  [117121.961856]  [<ffffffff8182a1ad>] __wait_on_bit+0x5d/0x90
  [117121.961857]  [<ffffffff8118d04b>] wait_on_page_bit+0xcb/0xf0
  [117121.961859]  [<ffffffff810c3ce0>] ? autoremove_wake_function+0x40/0x40
  [117121.961861]  [<ffffffff8118d163>] __filemap_fdatawait_range+0xf3/0x160
  [117121.961863]  [<ffffffff8118d1e4>] filemap_fdatawait_range+0x14/0x30
  [117121.961864]  [<ffffffff8118f0df>] filemap_write_and_wait_range+0x3f/0x70
  [117121.961891]  [<ffffffffc0902ea8>] zpl_fsync+0x38/0x90 [zfs]
  [117121.961893]  [<ffffffff8124116b>] vfs_fsync_range+0x4b/0xb0
  [117121.961895]  [<ffffffff811cabee>] SyS_msync+0x17e/0x1f0
  [117121.961897]  [<ffffffff8182def2>] entry_SYSCALL_64_fastpath+0x16/0x71
  [147242.344176] INFO: task txg_sync:4589 blocked for more than 120 seconds.
  [147242.344180]       Tainted: P           O    4.4.0-32-generic #51-Ubuntu
  [147242.344181] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [147242.344182] txg_sync        D ffff8807e1fbbaa8     0  4589      2 
0x00000000
  [147242.344185]  ffff8807e1fbbaa8 ffff88081ed56d00 ffff8807faf02940 
ffff8807e2dbb700
  [147242.344191]  ffff8807e1fbc000 ffff88081ecd6d00 7fffffffffffffff 
ffff8801789e5228
  [147242.344192]  0000000000000001 ffff8807e1fbbac0 ffffffff81829e05 
0000000000000000
  [147242.344194] Call Trace:
  [147242.344200]  [<ffffffff81829e05>] schedule+0x35/0x80
  [147242.344202]  [<ffffffff8182cf25>] schedule_timeout+0x1b5/0x270
  [147242.344205]  [<ffffffff810ac0c2>] ? default_wake_function+0x12/0x20
  [147242.344207]  [<ffffffff810c35e2>] ? __wake_up_common+0x52/0x90
  [147242.344209]  [<ffffffff81829334>] io_schedule_timeout+0xa4/0x110
  [147242.344217]  [<ffffffffc0759bec>] cv_wait_common+0xbc/0x140 [spl]
  [147242.344219]  [<ffffffff810c3ca0>] ? wake_atomic_t_function+0x60/0x60
  [147242.344224]  [<ffffffffc0759cc8>] __cv_wait_io+0x18/0x20 [spl]
  [147242.344264]  [<ffffffffc08fd2fe>] zio_wait+0x10e/0x1f0 [zfs]
  [147242.344285]  [<ffffffffc0886c58>] dsl_pool_sync+0xb8/0x430 [zfs]
  [147242.344307]  [<ffffffffc08a25b6>] spa_sync+0x366/0xb30 [zfs]
  [147242.344309]  [<ffffffff810ac0c2>] ? default_wake_function+0x12/0x20
  [147242.344333]  [<ffffffffc08b3a4a>] txg_sync_thread+0x3ba/0x630 [zfs]
  [147242.344357]  [<ffffffffc08b3690>] ? txg_delay+0x180/0x180 [zfs]
  [147242.344360]  [<ffffffffc0754e31>] thread_generic_wrapper+0x71/0x80 [spl]
  [147242.344364]  [<ffffffffc0754dc0>] ? __thread_exit+0x20/0x20 [spl]
  [147242.344366]  [<ffffffff810a0808>] kthread+0xd8/0xf0
  [147242.344367]  [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0
  [147242.344369]  [<ffffffff8182e28f>] ret_from_fork+0x3f/0x70
  [147242.344370]  [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0
  [147242.344461] INFO: task mongod:14480 blocked for more than 120 seconds.
  [147242.344462]       Tainted: P           O    4.4.0-32-generic #51-Ubuntu
  [147242.344463] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [147242.344464] mongod          D ffff88010ec47ba8     0 14480  16855 
0x00000100
  [147242.344466]  ffff88010ec47ba8 000000005799d18d ffff8807faf03700 
ffff880108e58dc0
  [147242.344467]  ffff88010ec48000 ffff88081ed16d00 7fffffffffffffff 
ffffffff8182a600
  [147242.344469]  ffff88010ec47d08 ffff88010ec47bc0 ffffffff81829e05 
0000000000000000
  [147242.344470] Call Trace:
  [147242.344473]  [<ffffffff8182a600>] ? bit_wait+0x60/0x60
  [147242.344474]  [<ffffffff81829e05>] schedule+0x35/0x80
  [147242.344476]  [<ffffffff8182cf25>] schedule_timeout+0x1b5/0x270
  [147242.344478]  [<ffffffff8118d939>] ? find_get_pages_tag+0x109/0x190
  [147242.344480]  [<ffffffff8182a600>] ? bit_wait+0x60/0x60
  [147242.344482]  [<ffffffff81829334>] io_schedule_timeout+0xa4/0x110
  [147242.344483]  [<ffffffff8182a61b>] bit_wait_io+0x1b/0x70
  [147242.344485]  [<ffffffff8182a1ad>] __wait_on_bit+0x5d/0x90
  [147242.344487]  [<ffffffff8118d04b>] wait_on_page_bit+0xcb/0xf0
  [147242.344488]  [<ffffffff810c3ce0>] ? autoremove_wake_function+0x40/0x40
  [147242.344490]  [<ffffffff8118d163>] __filemap_fdatawait_range+0xf3/0x160
  [147242.344492]  [<ffffffff8118d1e4>] filemap_fdatawait_range+0x14/0x30
  [147242.344493]  [<ffffffff8118f0df>] filemap_write_and_wait_range+0x3f/0x70
  [147242.344520]  [<ffffffffc0902ea8>] zpl_fsync+0x38/0x90 [zfs]
  [147242.344522]  [<ffffffff8124116b>] vfs_fsync_range+0x4b/0xb0
  [147242.344525]  [<ffffffff811cabee>] SyS_msync+0x17e/0x1f0
  [147242.344526]  [<ffffffff8182def2>] entry_SYSCALL_64_fastpath+0x16/0x71

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1607355/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to