On 02/05, Christoph Hellwig wrote:
> On Mon, Jan 16, 2017 at 09:32:20AM -0800, Christoph Hellwig wrote:
> > On Fri, Jan 13, 2017 at 11:12:11AM -0800, Jaegeuk Kim wrote:
> > > Previously, I've done to issue discard bios asynchronously. But the 
> > > problem that
> > > I've got is that was not enough. When testing nvme SSD with noop IO 
> > > scheduler,
> > > submit_bio() was blocked at every 8 async discard bios, resulting in very 
> > > slow
> > > checkpoint process which blocks most of other FS operations.
> > 
> > Where does it block?  Are you running out of request?  What driver is
> > this on top of?
> 
> Ping?  I'm currently spending a lot of effort on fs and block dÑ–scard
> code, and I'd like to make sure we get common infrastructure instead
> of local hacks.

Sorry for the late response due to the travel.

When doing fstrim with a fresh f2fs image fomatted on Intel NVMe SSD whose
model name is SSDPE2MW012T4, I've got the following trace.

...
fstrim-12620 [000] .... 334572.907534: f2fs_issue_discard: dev = (259,1), 
blkstart = 0x902900, blklen = 0x400
fstrim-12620 [000] .... 334572.907535: block_bio_remap: 259,0 D 75583488 + 8192 
<- (259,1) 75581440
fstrim-12620 [000] .... 334572.907535: block_bio_queue: 259,0 D 75583488 + 8192 
[fstrim]
fstrim-12620 [000] .... 334572.907535: block_getrq: 259,0 D 75583488 + 8192 
[fstrim]
fstrim-12620 [000] .... 334572.907536: block_unplug: [fstrim] 1
fstrim-12620 [000] .... 334572.907536: block_rq_insert: 259,0 D 0 () 75583488 + 
8192 [fstrim]
fstrim-12620 [000] .... 334572.907536: block_rq_issue: 259,0 D 0 () 75583488 + 
8192 [fstrim]
 < repeat 6 times >
fstrim-12620 [000] .... 334572.907620: f2fs_issue_discard: dev = (259,1), 
blkstart = 0x904500, blklen = 0x400
fstrim-12620 [000] .... 334572.907620: block_bio_remap: 259,0 D 75640832 + 8192 
<- (259,1) 75638784
fstrim-12620 [000] .... 334572.907620: block_bio_queue: 259,0 D 75640832 + 8192 
[fstrim]
fstrim-12620 [000] .... 334572.907621: block_getrq: 259,0 D 75640832 + 8192 
[fstrim]
<idle>-0     [000] d.h. 334572.907723: block_rq_complete: 259,0 D () 67260416 + 
8192 [0]
<idle>-0     [000] d.h. 334572.907942: block_rq_complete: 259,0 D () 67268608 + 
8192 [0]
<idle>-0     [000] d.h. 334572.908155: block_rq_complete: 259,0 D () 67276800 + 
8192 [0]
<idle>-0     [000] d.h. 334572.908374: block_rq_complete: 259,0 D () 67284992 + 
8192 [0]
<idle>-0     [000] d.h. 334572.908597: block_rq_complete: 259,0 D () 67293184 + 
8192 [0]
<idle>-0     [000] d.h. 334572.908823: block_rq_complete: 259,0 D () 67301376 + 
8192 [0]
<idle>-0     [000] d.h. 334572.909033: block_rq_complete: 259,0 D () 67309568 + 
8192 [0]
<idle>-0     [000] d.h. 334572.909216: block_rq_complete: 259,0 D () 67317760 + 
8192 [0]
fstrim-12620 [000] .... 334572.909222: block_unplug: [fstrim] 1
fstrim-12620 [000] .... 334572.909223: block_rq_insert: 259,0 D 0 () 75640832 + 
8192 [fstrim]
fstrim-12620 [000] .... 334572.909224: block_rq_issue: 259,0 D 0 () 75640832 + 
8192 [fstrim]
fstrim-12620 [000] .... 334572.909240: f2fs_issue_discard: dev = (259,1), 
blkstart = 0x904900, blklen = 0x400
fstrim-12620 [000] .... 334572.909241: block_bio_remap: 259,0 D 75649024 + 8192 
<- (259,1) 75646976
fstrim-12620 [000] .... 334572.909241: block_bio_queue: 259,0 D 75649024 + 8192 
[fstrim]
fstrim-12620 [000] .... 334572.909241: block_getrq: 259,0 D 75649024 + 8192 
[fstrim]
fstrim-12620 [000] .... 334572.909242: block_unplug: [fstrim] 1
fstrim-12620 [000] .... 334572.909242: block_rq_insert: 259,0 D 0 () 75649024 + 
8192 [fstrim]
fstrim-12620 [000] .... 334572.909242: block_rq_issue: 259,0 D 0 () 75649024 + 
8192 [fstrim]
 < repeat >

So, I investigated why block_rq_complete() happened in more detail.

The root-caused call path looks like:
 - submit_bio
  - generic_make_request
   - q->make_request_fn
    - blk_mq_make_request
     - blk_mq_map_request
      - blk_mq_alloc_request
       - blk_mq_get_tag
        - __blk_mq_get_tag
         - bt_get
          - blk_mq_run_hw_queue
          - finish_wait
          --> this waits for pending 8 discard bios!

It seems the problem comes from the storage processing discard commands too
slowly comparing to normal read/write IOs.

Any thoughts?

Thanks,

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to