Thanks for your expertise sharing. We have already been using blue store and SSD for journal actually.
The mds_cache_memory_limit I will have a try. — Best Regards Li, Ning > On Dec 6, 2018, at 00:44, Zhenshi Zhou <deader...@gmail.com> wrote: > > Hi > > I have the same issue a few months ago. One of my client hung on waiting for > a file writing. > And other clients seems not being effected by it. However, if other clients > access to the same > hanged direcotry, it would hang there as well. > > My cluster is 12.2.8 and I use kernel client on other servers. > > What I did is migrating filestore to bluestore and add SSD to each server. > Besides, I increased > mds_cache_memory_limit. I know these operations are not the right solution > direct to this issue. But > I do haven't encountered this issue again from then on. > > Hope it would help you. > > Thanks > > Li,Ning <lining916740...@icloud.com <mailto:lining916740...@icloud.com>> > 于2018年12月5日周三 下午10:10写道: > Hi Zheng, > > > Thanks for replying. > > > Not the same machine. Our problem is other writing cephfs process will hung > for a long time. > > The application node and storage nodes are independent . The cephfs mounts on > application node. > We copy large files to the application node’s cephfs mount point from two > other machines. > These writing cephfs process like nginx、prometheus will stay hung for a long > time when copying. > > The phenomenon can’t be understand. Because the incoming data traffic > brandwidth is just about 100MB/s, however > the ceph read/write brandwith is at least 200MB/s. And I’m sure that no other > people or process share ceph brandwith when we copy. > > > It is worth mentioning that the memory of the application node is just 30G. > > According to the stack in dmesg,The hung process seems to being waiting for > data writeback. > > I guess the problem maybe at client side,because when hung happen,I login to > other cephfs client and it’s normal to do some direct writing to cephfs. > > > Please share me some ideas or suggestion about troubleshoting if you have. > > — > Best Regards > Li, Ning > > > >> On Dec 5, 2018, at 17:57, Yan, Zheng <uker...@gmail.com >> <mailto:uker...@gmail.com>> wrote: >> >> Is the cephfs mount on the same machine that run OSD? >> >> On Wed, Dec 5, 2018 at 2:33 PM NingLi <lining916740...@icloud.com >> <mailto:lining916740...@icloud.com>> wrote: >>> >>> Hi all, >>> >>> We found that some process writing cephfs will hang for a long time (> >>> 120s) when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app >>> node's cephfs mountpoint. >>> >>> This problem is not always reproduciable. But when this problem occurs, the >>> web(nginx) or some other services can’t be access, it’s embarrassing. >>> >>> >>> Please give some tips. >>> >>> >>> We are using ceph version 13.2.1 and 3.10.0-693.el7.x86_64 kernel. >>> >>> >>> When this problem occurs, the memory pressure is a bit high. >>> # free -h >>> total used free shared buff/cache >>> available >>> Mem: 30G 11G 230M 3.3M 18G 18G >>> Swap: 0B 0B 0B >>> >>> # mpstat -P ALL >>> Average: CPU %usr %nice %sys %iowait %irq %soft %steal >>> %guest %gnice %idle >>> Average: all 3.80 0.04 1.46 55.12 0.00 0.08 >>> 0.00 0.00 0.00 39.49 >>> Average: 0 1.52 0.00 0.51 97.98 0.00 0.00 >>> 0.00 0.00 0.00 0.00 >>> Average: 1 13.07 0.00 1.51 0.00 0.00 0.00 >>> 0.00 0.00 0.00 85.43 >>> Average: 2 3.00 0.00 3.00 0.50 0.00 0.00 >>> 0.00 0.00 0.00 93.50 >>> Average: 3 2.50 0.00 2.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 95.50 >>> Average: 4 1.50 0.50 1.50 96.00 0.00 0.50 >>> 0.00 0.00 0.00 0.00 >>> Average: 5 0.50 0.00 1.01 32.16 0.00 0.00 >>> 0.00 0.00 0.00 66.33 >>> Average: 6 2.49 0.00 2.49 0.00 0.00 0.50 >>> 0.00 0.00 0.00 94.53 >>> Average: 7 1.52 0.00 0.51 97.98 0.00 0.00 >>> 0.00 0.00 0.00 0.00 >>> Average: 8 1.50 0.00 1.50 97.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 >>> Average: 9 14.07 0.00 2.01 46.23 0.00 0.00 >>> 0.00 0.00 0.00 37.69 >>> Average: 10 2.00 0.00 0.50 97.50 0.00 0.00 >>> 0.00 0.00 0.00 0.00 >>> Average: 11 2.01 0.00 1.01 96.98 0.00 0.00 >>> 0.00 0.00 0.00 0.00 >>> >>> We can see some process in state ‘D’ for a long time. here are 2 fragments >>> in dmesg. >>> >>> [Fri Nov 30 15:10:43 2018] INFO: task java:11662 blocked for more than 120 >>> seconds. >>> [Fri Nov 30 15:10:43 2018] "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> [Fri Nov 30 15:10:43 2018] java D ffffffff816a4180 0 11662 >>> 10409 0x00000180 >>> [Fri Nov 30 15:10:43 2018] ffff8805f4febbd0 0000000000000082 >>> ffff8807e2edcf10 ffff8805f4febfd8 >>> [Fri Nov 30 15:10:43 2018] ffff8805f4febfd8 ffff8805f4febfd8 >>> ffff8807e2edcf10 ffff88086df56cc0 >>> [Fri Nov 30 15:10:43 2018] 0000000000000000 7fffffffffffffff >>> ffff88087ffc3be8 ffffffff816a4180 >>> [Fri Nov 30 15:10:43 2018] Call Trace: >>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a6059>] schedule+0x29/0x70 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a3b69>] >>> schedule_timeout+0x239/0x2c0 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff81190541>] ? >>> pagevec_lookup_tag+0x21/0x30 >>> [Fri Nov 30 15:10:43 2018] [<ffffffffc082b598>] ? >>> ceph_writepages_start+0x1298/0x1520 [ceph] >>> [Fri Nov 30 15:10:43 2018] [<ffffffff810e939c>] ? ktime_get_ts64+0x4c/0xf0 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a56dd>] >>> io_schedule_timeout+0xad/0x130 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a5778>] io_schedule+0x18/0x20 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a4191>] bit_wait_io+0x11/0x50 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a3cb5>] __wait_on_bit+0x65/0x90 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff810b19d0>] ? >>> wake_bit_function+0x40/0x40 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff81181de1>] >>> __filemap_fdatawait_range+0x111/0x190 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff811b0d79>] ? do_numa_page+0x159/0x1e0 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff8118f42e>] ? do_writepages+0x1e/0x40 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff81181e74>] >>> filemap_fdatawait_range+0x14/0x30 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff81183db6>] >>> filemap_write_and_wait_range+0x56/0x90 >>> [Fri Nov 30 15:10:43 2018] [<ffffffffc083326c>] ceph_fsync+0x6c/0x510 >>> [ceph] >>> [Fri Nov 30 15:10:43 2018] [<ffffffff81137273>] ? >>> __secure_computing+0x73/0x240 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff8111f5c6>] ? >>> __audit_syscall_exit+0x1e6/0x280 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff81231d85>] do_fsync+0x65/0xa0 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff81232083>] SyS_fdatasync+0x13/0x20 >>> [Fri Nov 30 15:10:43 2018] [<ffffffff816b1d4e>] tracesys+0xd9/0xde >>> >>> >>> [Fri Nov 30 15:12:43 2018] INFO: task prometheus:9919 blocked for more than >>> 120 seconds. >>> [Fri Nov 30 15:12:43 2018] "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> [Fri Nov 30 15:12:43 2018] prometheus D ffff8807e87dbad8 0 9919 >>> 9879 0x00000080 >>> [Fri Nov 30 15:12:43 2018] ffff8807e87db970 0000000000000082 >>> ffff8803e273cf10 ffff8807e87dbfd8 >>> [Fri Nov 30 15:12:43 2018] ffff8807e87dbfd8 ffff8807e87dbfd8 >>> ffff8803e273cf10 ffff88046e716cc0 >>> [Fri Nov 30 15:12:43 2018] 0000000000000000 7fffffffffffffff >>> ffffffff816a4180 ffff8807e87dbad8 >>> [Fri Nov 30 15:12:43 2018] Call Trace: >>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a6059>] schedule+0x29/0x70 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a3b69>] >>> schedule_timeout+0x239/0x2c0 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff81090007>] ? local_bh_enable+0x17/0x20 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a56dd>] >>> io_schedule_timeout+0xad/0x130 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a5778>] io_schedule+0x18/0x20 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a4191>] bit_wait_io+0x11/0x50 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a3eaf>] >>> __wait_on_bit_lock+0x5f/0xc0 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff81182094>] __lock_page+0x74/0x90 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff810b19d0>] ? >>> wake_bit_function+0x40/0x40 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff81182a34>] __find_lock_page+0x54/0x70 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff811836e2>] >>> grab_cache_page_write_begin+0x62/0xd0 >>> [Fri Nov 30 15:12:43 2018] [<ffffffffc082a203>] ceph_write_begin+0x43/0xe0 >>> [ceph] >>> [Fri Nov 30 15:12:43 2018] [<ffffffff811824ce>] >>> generic_file_buffered_write+0x11e/0x2a0 >>> [Fri Nov 30 15:12:43 2018] [<ffffffffc08259c4>] ceph_aio_write+0xac4/0xc60 >>> [ceph] >>> [Fri Nov 30 15:12:43 2018] [<ffffffff8160417a>] ? inet_recvmsg+0x7a/0xa0 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff811feb3d>] do_sync_write+0x8d/0xd0 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff811ff5fd>] vfs_write+0xbd/0x1e0 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff8120040f>] SyS_write+0x7f/0xe0 >>> [Fri Nov 30 15:12:43 2018] [<ffffffff816b1b49>] >>> system_call_fastpath+0x16/0x1b >>> >>> >>> But the cephfs status shows healthy. >>> # ceph -s >>> cluster: >>> id: b27bc9a3-2e32-4782-b60f-b7901b0b26e5 >>> health: HEALTH_OK >>> >>> services: >>> mon: 2 daemons, quorum skyaxe-storage-0,skyaxe-storage-1 >>> mgr: skyaxe-storage-0(active), standbys: skyaxe-storage-1 >>> mds: cephfs-1/1/1 up {0=skyaxe-storage-1=up:active}, 1 up:standby >>> osd: 8 osds: 8 up, 8 in >>> >>> data: >>> pools: 2 pools, 512 pgs >>> objects: 599.7 k objects, 876 GiB >>> usage: 2.6 TiB used, 13 TiB / 15 TiB avail >>> pgs: 512 active+clean >>> >>> io: >>> client: 3.7 KiB/s wr, 0 op/s rd, 1 op/s wr >>> >>> >>> We can see some unfinished requests in osdc. >>> >>> # cat /sys/kernel/debug/ceph/xxx/osdc >>> REQUESTS 98 homeless 0 >>> 8529 osd1 1.1bfdee5 [1,7]/1 [1,7]/1 100000011be.00000000 >>> 0x400024 1 0'0 write >>> 8532 osd1 1.2d81c338 [1,6]/1 [1,6]/1 100000011c0.00000000 >>> 0x400024 1 0'0 write >>> 8536 osd1 1.49324f26 [1,4]/1 [1,4]/1 100000011c4.00000000 >>> 0x400024 1 0'0 write >>> 8542 osd1 1.1403106f [1,7]/1 [1,7]/1 100000011c8.00000000 >>> 0x400024 1 0'0 write >>> 8549 osd1 1.ea273113 [1,4]/1 [1,4]/1 100000011ce.00000000 >>> 0x400024 1 0'0 write >>> 8562 osd1 1.f2d68e9a [1,5]/1 [1,5]/1 100000011d8.00000000 >>> 0x400024 1 0'0 write >>> 8566 osd1 1.1f212283 [1,4]/1 [1,4]/1 100000011db.00000000 >>> 0x400024 1 0'0 write >>> 8568 osd1 1.2e1984e7 [1,5]/1 [1,5]/1 100000011de.00000000 >>> 0x400024 1 0'0 write >>> 8576 osd1 1.da7e63d0 [1,5]/1 [1,5]/1 100000011e1.00000000 >>> 0x400024 1 0'0 write >>> 8581 osd1 1.3b0d280 [1,4]/1 [1,4]/1 100000011e8.00000000 >>> 0x400024 1 0'0 write >>> 8582 osd1 1.80227f0b [1,5]/1 [1,5]/1 100000011ea.00000000 >>> 0x400024 1 0'0 write >>> 8593 osd1 1.1efd9391 [1,6]/1 [1,6]/1 100000011ef.00000000 >>> 0x400024 1 0'0 write >>> 8594 osd1 1.3db6e880 [1,4]/1 [1,4]/1 100000011f0.00000000 >>> 0x400024 1 0'0 write >>> 8611 osd1 1.6d20c41d [1,4]/1 [1,4]/1 1000000028c.00000000 >>> 0x400024 1 0'0 write >>> 8612 osd1 1.6d20c41d [1,4]/1 [1,4]/1 1000000028c.00000000 >>> 0x400024 1 0'0 write >>> 8616 osd1 1.f4d90a4d [1,7]/1 [1,7]/1 10000001206.00000000 >>> 0x400024 1 0'0 write >>> 8619 osd1 1.740514fe [1,5]/1 [1,5]/1 10000001208.00000000 >>> 0x400024 1 0'0 write >>> 8630 osd1 1.d5fbabe0 [1,6]/1 [1,6]/1 10000001216.00000000 >>> 0x400024 1 0'0 write >>> 8631 osd1 1.daf28f83 [1,4]/1 [1,4]/1 10000001217.00000000 >>> 0x400024 1 0'0 write >>> 8635 osd1 1.d29e01a6 [1,5]/1 [1,5]/1 1000000121a.00000000 >>> 0x400024 1 0'0 write >>> 8640 osd1 1.2de0ff63 [1,7]/1 [1,7]/1 10000001220.00000000 >>> 0x400024 1 0'0 write >>> 8658 osd1 1.2f91206f [1,7]/1 [1,7]/1 1000000122c.00000000 >>> 0x400064 1 0'0 write,startsync >>> ... >>> >>> >>> Here is our ceph config: >>> >>> # cat /etc/ceph/ceph.conf >>> [global] >>> fsid = b27bc9a3-2e32-4782-b60f-b7901b0b26e5 >>> mon_initial_members = skyaxe-storage-0, skyaxe-storage-1 >>> mon_host = 10.0.30.111,10.0.30.112 >>> auth_cluster_required = cephx >>> auth_service_required = cephx >>> auth_client_required = cephx >>> >>> osd_pool_default_size = 2 >>> public_network = 10.0.30.0/24 <http://10.0.30.0/24> >>> cluster_network = 10.0.40.0/24 <http://10.0.40.0/24> >>> osd_pool_default_min_size = 1 >>> >>> max_open_files = 131072 >>> >>> [osd] >>> osd_journal_size = 56320 >>> >>> journal_max_write_bytes = 1073741824 >>> journal_max_write_entries = 10000 >>> journal_queue_max_ops = 50000 >>> journal_queue_max_bytes = 10737418240 >>> >>> osd_mkfs_type = xfs >>> osd_mkfs_options_xfs = -f >>> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier" >>> >>> osd_max_write_size = 512 >>> osd_client_message_size_cap = 2147483648 >>> osd_deep_scrub_stride = 131072 >>> osd_op_threads = 8 >>> osd_disk_threads = 4 >>> osd_map_cache_size = 1024 >>> osd_map_cache_bl_size = 128 >>> osd_recovery_op_priority = 4 >>> osd_recovery_max_active = 10 >>> osd_max_backfills = 4 >>> >>> — >>> Best Regards >>> Li, Ning >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com