Re: [ceph-users] 【cephfs】cephfs hung when scp/rsync large files

Li,Ning Wed, 05 Dec 2018 20:47:55 -0800

Thanks for your expertise sharing.

We have already been using blue store and SSD for journal actually.


The  mds_cache_memory_limit I will have a try. 

—
Best Regards
Li, Ning



> On Dec 6, 2018, at 00:44, Zhenshi Zhou <deader...@gmail.com> wrote:
> 
> Hi
> 
> I have the same issue a few months ago. One of my client hung on waiting for 
> a file writing. 
> And other clients seems not being effected by it. However, if other clients 
> access to the same 
> hanged direcotry, it would hang there as well. 
> 
> My cluster is 12.2.8 and I use kernel client on other servers.
> 
> What I did is migrating filestore to bluestore and add SSD to each server. 
> Besides, I increased 
> mds_cache_memory_limit. I know these operations are not the right solution 
> direct to this issue. But 
> I do haven't encountered this issue again from then on.
> 
> Hope it would help you.
> 
> Thanks
> 
> Li,Ning <lining916740...@icloud.com <mailto:lining916740...@icloud.com>> 
> 于2018年12月5日周三 下午10:10写道：
> Hi Zheng,
> 
> 
> Thanks for replying.
> 
> 
> Not the same machine.  Our problem is other writing cephfs process will hung 
> for a long time. 
> 
> The application node and storage nodes are independent . The cephfs mounts on 
> application node.
> We copy large files to the application node’s cephfs mount point from two 
> other machines. 
> These writing cephfs  process like nginx、prometheus will stay hung for a long 
> time when copying. 
> 
> The phenomenon can’t be understand. Because the incoming data traffic 
> brandwidth is just about 100MB/s, however 
> the ceph read/write brandwith is at least 200MB/s. And I’m sure that no other 
> people or process share ceph brandwith when we copy.
> 
> 
> It is worth mentioning that the memory of the application node is just 30G. 
> 
> According to the stack in dmesg，The hung process seems to being waiting for 
> data writeback.
> 
> I guess the problem maybe at client side，because when hung happen，I login to 
> other cephfs client and it’s normal to do some direct writing to cephfs.
> 
> 
> Please share me some ideas or suggestion about troubleshoting if you have.
> 
> —
> Best Regards
> Li, Ning
> 
> 
> 
>> On Dec 5, 2018, at 17:57, Yan, Zheng <uker...@gmail.com 
>> <mailto:uker...@gmail.com>> wrote:
>> 
>> Is the cephfs mount on the same machine that run OSD?
>> 
>> On Wed, Dec 5, 2018 at 2:33 PM NingLi <lining916740...@icloud.com 
>> <mailto:lining916740...@icloud.com>> wrote:
>>> 
>>> Hi all,
>>> 
>>> We found that some process writing cephfs  will hang for a long time (> 
>>> 120s)  when uploading(scp/rsync) large files（totally 50G ~ 300G）to the app 
>>> node's cephfs mountpoint.
>>> 
>>> This problem is not always reproduciable. But when this problem occurs, the 
>>> web(nginx) or some other services can’t be access, it’s embarrassing.
>>> 
>>> 
>>> Please give some tips.
>>> 
>>> 
>>> We are using  ceph version 13.2.1  and  3.10.0-693.el7.x86_64 kernel.
>>> 
>>> 
>>> When this problem occurs, the memory pressure is a bit high.
>>> # free -h
>>>              total        used        free         shared  buff/cache   
>>> available
>>> Mem:        30G       11G        230M        3.3M         18G         18G
>>> Swap:         0B        0B          0B
>>> 
>>> # mpstat -P ALL
>>> Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  
>>> %guest  %gnice   %idle
>>> Average:     all         3.80       0.04   1.46    55.12    0.00    0.08    
>>> 0.00    0.00    0.00   39.49
>>> Average:       0        1.52       0.00    0.51    97.98    0.00    0.00    
>>> 0.00    0.00    0.00    0.00
>>> Average:       1        13.07      0.00   1.51    0.00      0.00    0.00    
>>> 0.00    0.00    0.00   85.43
>>> Average:       2        3.00       0.00    3.00    0.50      0.00    0.00   
>>>  0.00    0.00    0.00   93.50
>>> Average:       3        2.50       0.00    2.00    0.00      0.00    0.00   
>>>  0.00    0.00    0.00   95.50
>>> Average:       4        1.50       0.50    1.50    96.00    0.00    0.50    
>>> 0.00    0.00    0.00    0.00
>>> Average:       5       0.50       0.00    1.01    32.16     0.00    0.00    
>>> 0.00    0.00    0.00   66.33
>>> Average:       6        2.49       0.00    2.49    0.00     0.00    0.50    
>>> 0.00    0.00    0.00   94.53
>>> Average:       7       1.52       0.00    0.51     97.98    0.00    0.00    
>>> 0.00    0.00    0.00    0.00
>>> Average:       8       1.50       0.00    1.50     97.00    0.00    0.00    
>>> 0.00    0.00    0.00    0.00
>>> Average:       9       14.07     0.00    2.01     46.23    0.00    0.00    
>>> 0.00    0.00    0.00   37.69
>>> Average:      10       2.00      0.00    0.50     97.50    0.00    0.00    
>>> 0.00    0.00    0.00    0.00
>>> Average:      11       2.01      0.00    1.01     96.98    0.00    0.00    
>>> 0.00    0.00    0.00    0.00
>>> 
>>> We can see some process in state ‘D’ for a long time.  here are 2 fragments 
>>> in dmesg.
>>> 
>>> [Fri Nov 30 15:10:43 2018] INFO: task java:11662 blocked for more than 120 
>>> seconds.
>>> [Fri Nov 30 15:10:43 2018] "echo 0 > 
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [Fri Nov 30 15:10:43 2018] java            D ffffffff816a4180     0 11662  
>>> 10409 0x00000180
>>> [Fri Nov 30 15:10:43 2018]  ffff8805f4febbd0 0000000000000082 
>>> ffff8807e2edcf10 ffff8805f4febfd8
>>> [Fri Nov 30 15:10:43 2018]  ffff8805f4febfd8 ffff8805f4febfd8 
>>> ffff8807e2edcf10 ffff88086df56cc0
>>> [Fri Nov 30 15:10:43 2018]  0000000000000000 7fffffffffffffff 
>>> ffff88087ffc3be8 ffffffff816a4180
>>> [Fri Nov 30 15:10:43 2018] Call Trace:
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a6059>] schedule+0x29/0x70
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a3b69>] 
>>> schedule_timeout+0x239/0x2c0
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81190541>] ? 
>>> pagevec_lookup_tag+0x21/0x30
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffffc082b598>] ? 
>>> ceph_writepages_start+0x1298/0x1520 [ceph]
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff810e939c>] ? ktime_get_ts64+0x4c/0xf0
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a56dd>] 
>>> io_schedule_timeout+0xad/0x130
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a5778>] io_schedule+0x18/0x20
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a4191>] bit_wait_io+0x11/0x50
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a3cb5>] __wait_on_bit+0x65/0x90
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff810b19d0>] ? 
>>> wake_bit_function+0x40/0x40
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81181de1>] 
>>> __filemap_fdatawait_range+0x111/0x190
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff811b0d79>] ? do_numa_page+0x159/0x1e0
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff8118f42e>] ? do_writepages+0x1e/0x40
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81181e74>] 
>>> filemap_fdatawait_range+0x14/0x30
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81183db6>] 
>>> filemap_write_and_wait_range+0x56/0x90
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffffc083326c>] ceph_fsync+0x6c/0x510 
>>> [ceph]
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81137273>] ? 
>>> __secure_computing+0x73/0x240
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff8111f5c6>] ? 
>>> __audit_syscall_exit+0x1e6/0x280
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81231d85>] do_fsync+0x65/0xa0
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81232083>] SyS_fdatasync+0x13/0x20
>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816b1d4e>] tracesys+0xd9/0xde
>>> 
>>> 
>>> [Fri Nov 30 15:12:43 2018] INFO: task prometheus:9919 blocked for more than 
>>> 120 seconds.
>>> [Fri Nov 30 15:12:43 2018] "echo 0 > 
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [Fri Nov 30 15:12:43 2018] prometheus      D ffff8807e87dbad8     0  9919   
>>> 9879 0x00000080
>>> [Fri Nov 30 15:12:43 2018]  ffff8807e87db970 0000000000000082 
>>> ffff8803e273cf10 ffff8807e87dbfd8
>>> [Fri Nov 30 15:12:43 2018]  ffff8807e87dbfd8 ffff8807e87dbfd8 
>>> ffff8803e273cf10 ffff88046e716cc0
>>> [Fri Nov 30 15:12:43 2018]  0000000000000000 7fffffffffffffff 
>>> ffffffff816a4180 ffff8807e87dbad8
>>> [Fri Nov 30 15:12:43 2018] Call Trace:
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a6059>] schedule+0x29/0x70
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a3b69>] 
>>> schedule_timeout+0x239/0x2c0
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff81090007>] ? local_bh_enable+0x17/0x20
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a56dd>] 
>>> io_schedule_timeout+0xad/0x130
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a5778>] io_schedule+0x18/0x20
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a4191>] bit_wait_io+0x11/0x50
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a3eaf>] 
>>> __wait_on_bit_lock+0x5f/0xc0
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff81182094>] __lock_page+0x74/0x90
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff810b19d0>] ? 
>>> wake_bit_function+0x40/0x40
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff81182a34>] __find_lock_page+0x54/0x70
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff811836e2>] 
>>> grab_cache_page_write_begin+0x62/0xd0
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffffc082a203>] ceph_write_begin+0x43/0xe0 
>>> [ceph]
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff811824ce>] 
>>> generic_file_buffered_write+0x11e/0x2a0
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffffc08259c4>] ceph_aio_write+0xac4/0xc60 
>>> [ceph]
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff8160417a>] ? inet_recvmsg+0x7a/0xa0
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff811feb3d>] do_sync_write+0x8d/0xd0
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff811ff5fd>] vfs_write+0xbd/0x1e0
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff8120040f>] SyS_write+0x7f/0xe0
>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816b1b49>] 
>>> system_call_fastpath+0x16/0x1b
>>> 
>>> 
>>> But the cephfs status shows healthy.
>>> # ceph -s
>>>  cluster:
>>>    id:     b27bc9a3-2e32-4782-b60f-b7901b0b26e5
>>>    health: HEALTH_OK
>>> 
>>>  services:
>>>    mon: 2 daemons, quorum skyaxe-storage-0,skyaxe-storage-1
>>>    mgr: skyaxe-storage-0(active), standbys: skyaxe-storage-1
>>>    mds: cephfs-1/1/1 up  {0=skyaxe-storage-1=up:active}, 1 up:standby
>>>    osd: 8 osds: 8 up, 8 in
>>> 
>>>  data:
>>>    pools:   2 pools, 512 pgs
>>>    objects: 599.7 k objects, 876 GiB
>>>    usage:   2.6 TiB used, 13 TiB / 15 TiB avail
>>>    pgs:     512 active+clean
>>> 
>>>  io:
>>>    client:   3.7 KiB/s wr, 0 op/s rd, 1 op/s wr
>>> 
>>> 
>>> We can see some unfinished requests in osdc.
>>> 
>>> #  cat /sys/kernel/debug/ceph/xxx/osdc
>>> REQUESTS 98 homeless 0
>>> 8529    osd1    1.1bfdee5       [1,7]/1 [1,7]/1 100000011be.00000000    
>>> 0x400024        1       0'0     write
>>> 8532    osd1    1.2d81c338      [1,6]/1 [1,6]/1 100000011c0.00000000    
>>> 0x400024        1       0'0     write
>>> 8536    osd1    1.49324f26      [1,4]/1 [1,4]/1 100000011c4.00000000    
>>> 0x400024        1       0'0     write
>>> 8542    osd1    1.1403106f      [1,7]/1 [1,7]/1 100000011c8.00000000    
>>> 0x400024        1       0'0     write
>>> 8549    osd1    1.ea273113      [1,4]/1 [1,4]/1 100000011ce.00000000    
>>> 0x400024        1       0'0     write
>>> 8562    osd1    1.f2d68e9a      [1,5]/1 [1,5]/1 100000011d8.00000000    
>>> 0x400024        1       0'0     write
>>> 8566    osd1    1.1f212283      [1,4]/1 [1,4]/1 100000011db.00000000    
>>> 0x400024        1       0'0     write
>>> 8568    osd1    1.2e1984e7      [1,5]/1 [1,5]/1 100000011de.00000000    
>>> 0x400024        1       0'0     write
>>> 8576    osd1    1.da7e63d0      [1,5]/1 [1,5]/1 100000011e1.00000000    
>>> 0x400024        1       0'0     write
>>> 8581    osd1    1.3b0d280       [1,4]/1 [1,4]/1 100000011e8.00000000    
>>> 0x400024        1       0'0     write
>>> 8582    osd1    1.80227f0b      [1,5]/1 [1,5]/1 100000011ea.00000000    
>>> 0x400024        1       0'0     write
>>> 8593    osd1    1.1efd9391      [1,6]/1 [1,6]/1 100000011ef.00000000    
>>> 0x400024        1       0'0     write
>>> 8594    osd1    1.3db6e880      [1,4]/1 [1,4]/1 100000011f0.00000000    
>>> 0x400024        1       0'0     write
>>> 8611    osd1    1.6d20c41d      [1,4]/1 [1,4]/1 1000000028c.00000000    
>>> 0x400024        1       0'0     write
>>> 8612    osd1    1.6d20c41d      [1,4]/1 [1,4]/1 1000000028c.00000000    
>>> 0x400024        1       0'0     write
>>> 8616    osd1    1.f4d90a4d      [1,7]/1 [1,7]/1 10000001206.00000000    
>>> 0x400024        1       0'0     write
>>> 8619    osd1    1.740514fe      [1,5]/1 [1,5]/1 10000001208.00000000    
>>> 0x400024        1       0'0     write
>>> 8630    osd1    1.d5fbabe0      [1,6]/1 [1,6]/1 10000001216.00000000    
>>> 0x400024        1       0'0     write
>>> 8631    osd1    1.daf28f83      [1,4]/1 [1,4]/1 10000001217.00000000    
>>> 0x400024        1       0'0     write
>>> 8635    osd1    1.d29e01a6      [1,5]/1 [1,5]/1 1000000121a.00000000    
>>> 0x400024        1       0'0     write
>>> 8640    osd1    1.2de0ff63      [1,7]/1 [1,7]/1 10000001220.00000000    
>>> 0x400024        1       0'0     write
>>> 8658    osd1    1.2f91206f      [1,7]/1 [1,7]/1 1000000122c.00000000    
>>> 0x400064        1       0'0     write,startsync
>>> ...
>>> 
>>> 
>>> Here is our ceph config：
>>> 
>>> # cat /etc/ceph/ceph.conf
>>> [global]
>>> fsid = b27bc9a3-2e32-4782-b60f-b7901b0b26e5
>>> mon_initial_members = skyaxe-storage-0, skyaxe-storage-1
>>> mon_host = 10.0.30.111,10.0.30.112
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>> 
>>> osd_pool_default_size = 2
>>> public_network = 10.0.30.0/24 <http://10.0.30.0/24>
>>> cluster_network = 10.0.40.0/24 <http://10.0.40.0/24>
>>> osd_pool_default_min_size = 1
>>> 
>>> max_open_files = 131072
>>> 
>>> [osd]
>>> osd_journal_size = 56320
>>> 
>>> journal_max_write_bytes = 1073741824
>>> journal_max_write_entries = 10000
>>> journal_queue_max_ops = 50000
>>> journal_queue_max_bytes = 10737418240
>>> 
>>> osd_mkfs_type = xfs
>>> osd_mkfs_options_xfs = -f
>>> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
>>> 
>>> osd_max_write_size = 512
>>> osd_client_message_size_cap = 2147483648
>>> osd_deep_scrub_stride = 131072
>>> osd_op_threads = 8
>>> osd_disk_threads = 4
>>> osd_map_cache_size = 1024
>>> osd_map_cache_bl_size = 128
>>> osd_recovery_op_priority = 4
>>> osd_recovery_max_active = 10
>>> osd_max_backfills = 4
>>> 
>>> —
>>> Best Regards
>>> Li, Ning
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 【cephfs】cephfs hung when scp/rsync large files

Reply via email to