Hi Sage,
        Thanks for your mail.When turn on filestore sync flush, it seems works 
and OSD process doesn't suicide any more . I have already disabled flusher long 
age since both Mark's and my report show disable flusher seems to improve 
performance(so my original configuration is filestore_flusher=false, 
filestore_sync_flush=false(default)), but now we have to reconsider on this. I 
would like to see the internal code of ::sync_file_range() to learn more about 
how it works. First guess is ::sync_file_range will push request to disk queue 
and if the disk queue is full, this call will block and wait, but not sure.

        But from the code path,(BTW, these lines of codes are a bit hard to 
follow)
                        if (!should_flush ||    !m_filestore_flusher || 
!queue_flusher(fd, offset, len)) 
                        {
                        if (should_flush && m_filestore_sync_flush)
                                        ::sync_file_range(fd, offset, len, 
SYNC_FILE_RANGE_WRITE);
                        lfn_close(fd);
                }
        With the default setting (m_filestore_flusher = true) , the flusher 
queue will soon burn out, in this situation, if user doesn't turn on " 
m_filestore_sync_flush = ture ", he/she will likely to hit the same situation 
that writes remain in page cache and OSD daemon died when trying to sync. I 
suppose the right logical should be(persuade code), :
                        if (should_flush) 
                        {
                                If(m_filestore_flusher)
                                        If(queue_flusher(fd, offset, len)
                                                Do nothing
                                        Else
                                                ::sync_file_range(fd, offset, 
len, SYNC_FILE_RANGE_WRITE);
                                Else
                                        if (m_filestore_sync_flush )
                                                ::sync_file_range(fd, offset, 
len, SYNC_FILE_RANGE_WRITE);
                        lfn_close(fd);
                }

                                                                                
                                                                                
                                                                        Xiaoxi
-----Original Message-----
From: Sage Weil [mailto:[email protected]] 
Sent: 2013年3月25日 23:35
To: Chen, Xiaoxi
Cc: '[email protected]' ([email protected]); 
[email protected]
Subject: Re: [ceph-users] Ceph Crach at sync_thread_timeout after heavy random 
writes.

Hi Xiaoxi,

On Mon, 25 Mar 2013, Chen, Xiaoxi wrote:
>          From Ceph-w , ceph reports a very high Ops (10000+ /s) , but 
> technically , 80 spindles can provide up to 150*80/2=6000 IOPS for 4K 
> random write.
> 
>          When digging into the code, I found that the OSD write data 
> to Pagecache than returned, although it called ::sync_file_range, but 
> this syscall doesn?t actually sync data to disk when it return,it?s an aync 
> call.
> So the situation is , the random write will be extremely fast since it 
> only write to journal and pagecache, but once syncing , it will take 
> very long time. The speed gap between journal and OSDs exist, the 
> amount of data that need to be sync keep increasing, and it will certainly 
> exceed 600s.

The sync_file_range is only there to push things to disk sooner, so that the 
eventual syncfs(2) takes less time.  When the async flushing is enabled, there 
is a limit to the number of flushes that are in the queue, but if it hits the 
max it just does

    dout(10) << "queue_flusher ep " << sync_epoch << " fd " << fd << " " << off 
<< "~" << len
             << " qlen " << flusher_queue_len 
             << " hit flusher_max_fds " << m_filestore_flusher_max_fds
             << ", skipping async flush" << dendl;

Can you confirm that the filestore is taking this path?  (debug filestore = 10 
and then reproduce.)

You may want to try

 filestore flusher = false
 filestore sync flush = true

and see if that changes things--it will make the sync_file_range() happen 
inline after the write.

Anyway, it sounds like you may be queueing up so many random writes that the 
sync takes forever.  I've never actually seen that happen, so if we can confirm 
that's what is going on that will be very interesting.

Thanks-
sage


> 
>  
> 
>          For more information, I have tried to reproduce this by rados 
> bench,but failed.
> 
>  
> 
>          Could you please let me know if you need any more 
> informations & have some solutions? Thanks
> 
>                                                                           
?? ? ?? ? ?? ?                           Xiaoxi
> 
> 
> 

Reply via email to