[ceph-users] slow pwrite64()s to ceph

Kelly, Mark (RIS-BCT) Tue, 09 Apr 2024 10:45:21 -0700

Hi,

Thank you for any time.


We are tracking some very slow pwrite64() calls to a ceph filesystem -

20965 11:04:24.049186 <... pwrite64 resumed>) = 65536 <4.489594>
20966 11:04:24.069765 <... pwrite64 resumed>) = 65536 <4.508859>
20967 11:04:24.090354 <... pwrite64 resumed>) = 65536 <4.510256>

But other pwrite64()s from the same program in other threads to other files on 
the same ceph fs seem fine; we cannot really reproduce this, but it happens 
occasionally.

It seems we are spending some time in ceph_aio_write() when this is happening 
(see call graph below)
I've noticed THP (transparent Huge Pages) is enabled.
We are running version 15.2.17 on CentOS 7.9
Do not seem to be under any significant memory pressure when this happens, just 
many threads of this app blocked on i/o in pwrite64()s.

I am suggesting an upgrade, but until then, do you think this situation 
involves ceph and could be improved if we disable THP ?

Thanks for any advice or suggestion,
-mark

Call graph of app when slow pwrites64()s are happening -

--87.08%--system_call_fastpath
           |--58.20%--sys_pwrite64
           |           --58.03%--vfs_write
           |                     do_sync_write
           |                     ceph_aio_write
           |                     |--54.33%--generic_file_buffered_write
           |                     |          |--27.09%--ceph_write_begin
           |                     |          |          
|--17.01%--grab_cache_page_write_begin
           |                     |          |          |          
|--7.51%--add_to_page_cache_lru
           |                     |          |          |          |          
|--4.28%--__add_to_page_cache_locked
           |                     |          |          |          |          |  
         --3.95%--mem_cgroup_cache_charge
           |                     |          |          |          |          |  
                   mem_cgroup_charge_common
           |                     |          |          |          |          |  
                    --3.75%--__mem_cgroup_commit_charge
           |                     |          |          |          |           
--3.23%--lru_cache_add
           |                     |          |          |          |             
        __lru_cache_add
           |                     |          |          |          |             
         --2.94%--pagevec_lru_move_fn
           |                     |          |          |          |             
                   |--0.70%--mem_cgroup_page_lruvec
           |                     |          |          |          |             
                   |--0.67%--__pagevec_lru_add_fn
           |                     |          |          |          |             
                    --0.57%--release_pages
           |                     |          |          |          
|--5.31%--__page_cache_alloc
           |                     |          |          |          |           
--5.00%--alloc_pages_current
           |                     |          |          |          |             
        __alloc_pages_nodemask
           |                     |          |          |          |             
         --4.23%--get_page_from_freelist
           |                     |          |          |          |             
                   |--1.85%--__rmqueue
           |                     |          |          |          |             
                   |           --1.49%--list_del
           |                     |          |          |          |             
                   |                     __list_del_entry
           |                     |          |          |          |             
                    --1.74%--list_del
           |                     |          |          |          |             
                              __list_del_entry
           |                     |          |          |           
--3.92%--__find_lock_page
           |                     |          |          |                     
__find_get_page
           |                     |          |          |                      
--3.46%--radix_tree_lookup_slot
           |                     |          |          |                        
        |--2.76%--radix_tree_descend
           |                     |          |          |                        
         --0.70%--__radix_tree_lookup
           |                     |          |          |                        
                   radix_tree_descend
           |                     |          |           
--9.45%--ceph_update_writeable_page
           |                     |          |                      
--8.94%--readpage_nounlock
           |                     |          |                                 
--8.60%--ceph_osdc_readpages
           |                     |          |                                   
         --8.18%--submit_request
           |                     |          |                                   
                   __submit_request
           |                     |          |                                   
                   calc_target.isra.50
           |                     |          |                                   
                   ceph_pg_to_up_acting_osds
           |                     |          |                                   
                   crush_do_rule
           |                     |          |                                   
                   crush_choose_firstn
           |                     |          |                                   
                   |--4.89%--crush_choose_firstn
           |                     |          |                                   
                   |          is_out.isra.2.part.3
           |                     |          |                                   
                    --3.30%--crush_bucket_choose
           |                     |          
|--14.50%--copy_user_enhanced_fast_string
           |                     |          |--6.34%--ceph_write_end
           |                     |          |          |--3.76%--set_page_dirty
           |                     |          |          |          
ceph_set_page_dirty
           |                     |          |          |           
--2.86%--__set_page_dirty_nobuffers
           |                     |          |          |                     
|--0.92%--_raw_spin_unlock_irqrestore
           |                     |          |          |                      
--0.59%--radix_tree_tag_set
           |                     |          |           --1.88%--unlock_page
           |                     |          |                     __wake_up_bit
           |                     |          |--3.44%--iov_iter_fault_in_readable
           |                     |           --2.70%--mark_page_accessed
           |                     |--2.54%--mutex_lock
           |                     |          __mutex_lock_slowpath
           |                     |           --2.14%--schedule_preempt_disabled
           |                     |                     __schedule
           |                     |                     |
           |                     |                     
|--0.82%--finish_task_switch
           |                     |                     |          
__perf_event_task_sched_in
           |                     |                     |          
perf_pmu_enable
           |                     |                     |          x86_pmu_enable
           |                     |                     |           
--0.80%--intel_pmu_enable_all
           |                     |                     |                      
--0.74%--__intel_pmu_enable_all.isra.23
           |                     |                     |                        
         --0.69%--native_write_msr_safe
           |                     |                      
--0.75%--__perf_event_task_sched_out
           |                      --0.56%--mutex_unlock
           |                                __mutex_unlock_slowpath



________________________________
The information contained in this e-mail message is intended only for the 
personal and confidential use of the recipient(s) named above. This message may 
be an attorney-client communication and/or work product and as such is 
privileged and confidential. If the reader of this message is not the intended 
recipient or an agent responsible for delivering it to the intended recipient, 
you are hereby notified that you have received this document in error and that 
any review, dissemination, distribution, or copying of this message is strictly 
prohibited. If you have received this communication in error, please notify us 
immediately by e-mail, and delete the original message.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] slow pwrite64()s to ceph

Reply via email to