Hi

I have the same issue a few months ago. One of my client hung on waiting
for a file writing.
And other clients seems not being effected by it. However, if other clients
access to the same
hanged direcotry, it would hang there as well.

My cluster is 12.2.8 and I use kernel client on other servers.

What I did is migrating filestore to bluestore and add SSD to each server.
Besides, I increased
mds_cache_memory_limit. I know these operations are not the right solution
direct to this issue. But
I do haven't encountered this issue again from then on.

Hope it would help you.

Thanks

Li,Ning <lining916740...@icloud.com> 于2018年12月5日周三 下午10:10写道:

> Hi Zheng,
>
>
> Thanks for replying.
>
>
> Not the same machine.  Our problem is other writing cephfs process will
> hung for a long time.
>
> The application node and storage nodes are independent . The cephfs mounts
> on application node.
> We copy large files to the application node’s cephfs mount point from two
> other machines.
> These writing cephfs  process like nginx、prometheus will stay hung for a
> long time when copying.
>
> The phenomenon can’t be understand. Because the incoming data traffic
> brandwidth is just about 100MB/s, however
> the ceph read/write brandwith is at least 200MB/s. And I’m sure that no
> other people or process share ceph brandwith when we copy.
>
>
> It is worth mentioning that the memory of the application node is just
> 30G.
>
> According to the stack in dmesg,The hung process seems to being waiting
> for data writeback.
>
> I guess the problem maybe at client side,because when hung happen,I login
> to other cephfs client and it’s normal to do some direct writing to cephfs.
>
>
> Please share me some ideas or suggestion about troubleshoting if you have.
>
> —
> Best Regards
> Li, Ning
>
>
>
> On Dec 5, 2018, at 17:57, Yan, Zheng <uker...@gmail.com> wrote:
>
> Is the cephfs mount on the same machine that run OSD?
>
> On Wed, Dec 5, 2018 at 2:33 PM NingLi <lining916740...@icloud.com> wrote:
>
>
> Hi all,
>
> We found that some process writing cephfs  will hang for a long time (>
> 120s)  when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app
> node's cephfs mountpoint.
>
> This problem is not always reproduciable. But when this problem occurs,
> the web(nginx) or some other services can’t be access, it’s embarrassing.
>
>
> Please give some tips.
>
>
> We are using  ceph version 13.2.1  and  3.10.0-693.el7.x86_64 kernel.
>
>
> When this problem occurs, the memory pressure is a bit high.
> # free -h
>              total        used        free         shared  buff/cache
>   available
> Mem:        30G       11G        230M        3.3M         18G         18G
> Swap:         0B        0B          0B
>
> # mpstat -P ALL
> Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
>  %guest  %gnice   %idle
> Average:     all         3.80       0.04   1.46    55.12    0.00    0.08
>    0.00    0.00    0.00   39.49
> Average:       0        1.52       0.00    0.51    97.98    0.00    0.00
>    0.00    0.00    0.00    0.00
> Average:       1        13.07      0.00   1.51    0.00      0.00    0.00
>    0.00    0.00    0.00   85.43
> Average:       2        3.00       0.00    3.00    0.50      0.00    0.00
>    0.00    0.00    0.00   93.50
> Average:       3        2.50       0.00    2.00    0.00      0.00    0.00
>    0.00    0.00    0.00   95.50
> Average:       4        1.50       0.50    1.50    96.00    0.00    0.50
>    0.00    0.00    0.00    0.00
> Average:       5       0.50       0.00    1.01    32.16     0.00    0.00
>    0.00    0.00    0.00   66.33
> Average:       6        2.49       0.00    2.49    0.00     0.00    0.50
>    0.00    0.00    0.00   94.53
> Average:       7       1.52       0.00    0.51     97.98    0.00    0.00
>    0.00    0.00    0.00    0.00
> Average:       8       1.50       0.00    1.50     97.00    0.00    0.00
>    0.00    0.00    0.00    0.00
> Average:       9       14.07     0.00    2.01     46.23    0.00    0.00
>    0.00    0.00    0.00   37.69
> Average:      10       2.00      0.00    0.50     97.50    0.00    0.00
>    0.00    0.00    0.00    0.00
> Average:      11       2.01      0.00    1.01     96.98    0.00    0.00
>    0.00    0.00    0.00    0.00
>
> We can see some process in state ‘D’ for a long time.  here are 2
> fragments in dmesg.
>
> [Fri Nov 30 15:10:43 2018] INFO: task java:11662 blocked for more than 120
> seconds.
> [Fri Nov 30 15:10:43 2018] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Fri Nov 30 15:10:43 2018] java            D ffffffff816a4180     0 11662
>  10409 0x00000180
> [Fri Nov 30 15:10:43 2018]  ffff8805f4febbd0 0000000000000082
> ffff8807e2edcf10 ffff8805f4febfd8
> [Fri Nov 30 15:10:43 2018]  ffff8805f4febfd8 ffff8805f4febfd8
> ffff8807e2edcf10 ffff88086df56cc0
> [Fri Nov 30 15:10:43 2018]  0000000000000000 7fffffffffffffff
> ffff88087ffc3be8 ffffffff816a4180
> [Fri Nov 30 15:10:43 2018] Call Trace:
> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a6059>] schedule+0x29/0x70
> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a3b69>]
> schedule_timeout+0x239/0x2c0
> [Fri Nov 30 15:10:43 2018]  [<ffffffff81190541>] ?
> pagevec_lookup_tag+0x21/0x30
> [Fri Nov 30 15:10:43 2018]  [<ffffffffc082b598>] ?
> ceph_writepages_start+0x1298/0x1520 [ceph]
> [Fri Nov 30 15:10:43 2018]  [<ffffffff810e939c>] ? ktime_get_ts64+0x4c/0xf0
> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a56dd>]
> io_schedule_timeout+0xad/0x130
> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a5778>] io_schedule+0x18/0x20
> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a4191>] bit_wait_io+0x11/0x50
> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a3cb5>] __wait_on_bit+0x65/0x90
> [Fri Nov 30 15:10:43 2018]  [<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0
> [Fri Nov 30 15:10:43 2018]  [<ffffffff810b19d0>] ?
> wake_bit_function+0x40/0x40
> [Fri Nov 30 15:10:43 2018]  [<ffffffff81181de1>]
> __filemap_fdatawait_range+0x111/0x190
> [Fri Nov 30 15:10:43 2018]  [<ffffffff811b0d79>] ? do_numa_page+0x159/0x1e0
> [Fri Nov 30 15:10:43 2018]  [<ffffffff8118f42e>] ? do_writepages+0x1e/0x40
> [Fri Nov 30 15:10:43 2018]  [<ffffffff81181e74>]
> filemap_fdatawait_range+0x14/0x30
> [Fri Nov 30 15:10:43 2018]  [<ffffffff81183db6>]
> filemap_write_and_wait_range+0x56/0x90
> [Fri Nov 30 15:10:43 2018]  [<ffffffffc083326c>] ceph_fsync+0x6c/0x510
> [ceph]
> [Fri Nov 30 15:10:43 2018]  [<ffffffff81137273>] ?
> __secure_computing+0x73/0x240
> [Fri Nov 30 15:10:43 2018]  [<ffffffff8111f5c6>] ?
> __audit_syscall_exit+0x1e6/0x280
> [Fri Nov 30 15:10:43 2018]  [<ffffffff81231d85>] do_fsync+0x65/0xa0
> [Fri Nov 30 15:10:43 2018]  [<ffffffff81232083>] SyS_fdatasync+0x13/0x20
> [Fri Nov 30 15:10:43 2018]  [<ffffffff816b1d4e>] tracesys+0xd9/0xde
>
>
> [Fri Nov 30 15:12:43 2018] INFO: task prometheus:9919 blocked for more
> than 120 seconds.
> [Fri Nov 30 15:12:43 2018] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Fri Nov 30 15:12:43 2018] prometheus      D ffff8807e87dbad8     0  9919
>   9879 0x00000080
> [Fri Nov 30 15:12:43 2018]  ffff8807e87db970 0000000000000082
> ffff8803e273cf10 ffff8807e87dbfd8
> [Fri Nov 30 15:12:43 2018]  ffff8807e87dbfd8 ffff8807e87dbfd8
> ffff8803e273cf10 ffff88046e716cc0
> [Fri Nov 30 15:12:43 2018]  0000000000000000 7fffffffffffffff
> ffffffff816a4180 ffff8807e87dbad8
> [Fri Nov 30 15:12:43 2018] Call Trace:
> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a6059>] schedule+0x29/0x70
> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a3b69>]
> schedule_timeout+0x239/0x2c0
> [Fri Nov 30 15:12:43 2018]  [<ffffffff81090007>] ?
> local_bh_enable+0x17/0x20
> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a56dd>]
> io_schedule_timeout+0xad/0x130
> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a5778>] io_schedule+0x18/0x20
> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a4191>] bit_wait_io+0x11/0x50
> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a3eaf>]
> __wait_on_bit_lock+0x5f/0xc0
> [Fri Nov 30 15:12:43 2018]  [<ffffffff81182094>] __lock_page+0x74/0x90
> [Fri Nov 30 15:12:43 2018]  [<ffffffff810b19d0>] ?
> wake_bit_function+0x40/0x40
> [Fri Nov 30 15:12:43 2018]  [<ffffffff81182a34>] __find_lock_page+0x54/0x70
> [Fri Nov 30 15:12:43 2018]  [<ffffffff811836e2>]
> grab_cache_page_write_begin+0x62/0xd0
> [Fri Nov 30 15:12:43 2018]  [<ffffffffc082a203>]
> ceph_write_begin+0x43/0xe0 [ceph]
> [Fri Nov 30 15:12:43 2018]  [<ffffffff811824ce>]
> generic_file_buffered_write+0x11e/0x2a0
> [Fri Nov 30 15:12:43 2018]  [<ffffffffc08259c4>]
> ceph_aio_write+0xac4/0xc60 [ceph]
> [Fri Nov 30 15:12:43 2018]  [<ffffffff8160417a>] ? inet_recvmsg+0x7a/0xa0
> [Fri Nov 30 15:12:43 2018]  [<ffffffff811feb3d>] do_sync_write+0x8d/0xd0
> [Fri Nov 30 15:12:43 2018]  [<ffffffff811ff5fd>] vfs_write+0xbd/0x1e0
> [Fri Nov 30 15:12:43 2018]  [<ffffffff8120040f>] SyS_write+0x7f/0xe0
> [Fri Nov 30 15:12:43 2018]  [<ffffffff816b1b49>]
> system_call_fastpath+0x16/0x1b
>
>
> But the cephfs status shows healthy.
> # ceph -s
>  cluster:
>    id:     b27bc9a3-2e32-4782-b60f-b7901b0b26e5
>    health: HEALTH_OK
>
>  services:
>    mon: 2 daemons, quorum skyaxe-storage-0,skyaxe-storage-1
>    mgr: skyaxe-storage-0(active), standbys: skyaxe-storage-1
>    mds: cephfs-1/1/1 up  {0=skyaxe-storage-1=up:active}, 1 up:standby
>    osd: 8 osds: 8 up, 8 in
>
>  data:
>    pools:   2 pools, 512 pgs
>    objects: 599.7 k objects, 876 GiB
>    usage:   2.6 TiB used, 13 TiB / 15 TiB avail
>    pgs:     512 active+clean
>
>  io:
>    client:   3.7 KiB/s wr, 0 op/s rd, 1 op/s wr
>
>
> We can see some unfinished requests in osdc.
>
> #  cat /sys/kernel/debug/ceph/xxx/osdc
> REQUESTS 98 homeless 0
> 8529    osd1    1.1bfdee5       [1,7]/1 [1,7]/1 100000011be.00000000
>    0x400024        1       0'0     write
> 8532    osd1    1.2d81c338      [1,6]/1 [1,6]/1 100000011c0.00000000
>    0x400024        1       0'0     write
> 8536    osd1    1.49324f26      [1,4]/1 [1,4]/1 100000011c4.00000000
>    0x400024        1       0'0     write
> 8542    osd1    1.1403106f      [1,7]/1 [1,7]/1 100000011c8.00000000
>    0x400024        1       0'0     write
> 8549    osd1    1.ea273113      [1,4]/1 [1,4]/1 100000011ce.00000000
>    0x400024        1       0'0     write
> 8562    osd1    1.f2d68e9a      [1,5]/1 [1,5]/1 100000011d8.00000000
>    0x400024        1       0'0     write
> 8566    osd1    1.1f212283      [1,4]/1 [1,4]/1 100000011db.00000000
>    0x400024        1       0'0     write
> 8568    osd1    1.2e1984e7      [1,5]/1 [1,5]/1 100000011de.00000000
>    0x400024        1       0'0     write
> 8576    osd1    1.da7e63d0      [1,5]/1 [1,5]/1 100000011e1.00000000
>    0x400024        1       0'0     write
> 8581    osd1    1.3b0d280       [1,4]/1 [1,4]/1 100000011e8.00000000
>    0x400024        1       0'0     write
> 8582    osd1    1.80227f0b      [1,5]/1 [1,5]/1 100000011ea.00000000
>    0x400024        1       0'0     write
> 8593    osd1    1.1efd9391      [1,6]/1 [1,6]/1 100000011ef.00000000
>    0x400024        1       0'0     write
> 8594    osd1    1.3db6e880      [1,4]/1 [1,4]/1 100000011f0.00000000
>    0x400024        1       0'0     write
> 8611    osd1    1.6d20c41d      [1,4]/1 [1,4]/1 1000000028c.00000000
>    0x400024        1       0'0     write
> 8612    osd1    1.6d20c41d      [1,4]/1 [1,4]/1 1000000028c.00000000
>    0x400024        1       0'0     write
> 8616    osd1    1.f4d90a4d      [1,7]/1 [1,7]/1 10000001206.00000000
>    0x400024        1       0'0     write
> 8619    osd1    1.740514fe      [1,5]/1 [1,5]/1 10000001208.00000000
>    0x400024        1       0'0     write
> 8630    osd1    1.d5fbabe0      [1,6]/1 [1,6]/1 10000001216.00000000
>    0x400024        1       0'0     write
> 8631    osd1    1.daf28f83      [1,4]/1 [1,4]/1 10000001217.00000000
>    0x400024        1       0'0     write
> 8635    osd1    1.d29e01a6      [1,5]/1 [1,5]/1 1000000121a.00000000
>    0x400024        1       0'0     write
> 8640    osd1    1.2de0ff63      [1,7]/1 [1,7]/1 10000001220.00000000
>    0x400024        1       0'0     write
> 8658    osd1    1.2f91206f      [1,7]/1 [1,7]/1 1000000122c.00000000
>    0x400064        1       0'0     write,startsync
> ...
>
>
> Here is our ceph config:
>
> # cat /etc/ceph/ceph.conf
> [global]
> fsid = b27bc9a3-2e32-4782-b60f-b7901b0b26e5
> mon_initial_members = skyaxe-storage-0, skyaxe-storage-1
> mon_host = 10.0.30.111,10.0.30.112
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> osd_pool_default_size = 2
> public_network = 10.0.30.0/24
> cluster_network = 10.0.40.0/24
> osd_pool_default_min_size = 1
>
> max_open_files = 131072
>
> [osd]
> osd_journal_size = 56320
>
> journal_max_write_bytes = 1073741824
> journal_max_write_entries = 10000
> journal_queue_max_ops = 50000
> journal_queue_max_bytes = 10737418240
>
> osd_mkfs_type = xfs
> osd_mkfs_options_xfs = -f
> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
>
> osd_max_write_size = 512
> osd_client_message_size_cap = 2147483648
> osd_deep_scrub_stride = 131072
> osd_op_threads = 8
> osd_disk_threads = 4
> osd_map_cache_size = 1024
> osd_map_cache_bl_size = 128
> osd_recovery_op_priority = 4
> osd_recovery_max_active = 10
> osd_max_backfills = 4
>
> —
> Best Regards
> Li, Ning
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to