On 05.06.2014 22:54, Matthew Ahrens wrote:
On Thu, Jun 5, 2014 at 12:50 PM, Alexander Motin <[email protected]
<mailto:[email protected]>> wrote:

    On 05.06.2014 22:37, Matthew Ahrens wrote:

        Interesting, what platform are you testing on?  I have not seen
        substantial congestion on this lock on illumos, testing with up
        to ~1
        million IOPS (reads of cached 8k blocks).


    Now I am testing this on dual IvyBridge Xeon E5-2690 v2 system (40
    (2x10x2) logical cores).

    Mentioned earlier SPEC NFS test was running dual Westmere Xeon E5645
    system (24 (2x6x2) logical cores). There problem was much less
    noticeable, but IOPS in that test were much lower too.

    I'd like to note that I've seen quite a lot of examples already,
    when congestion barely measurable on 24 Westmere cores just explodes
    on 40 IvyBridge. This looks like one of them.


Ah, interesting.  I have been testing with up to 24 CPUs.  Will have to
find a larger machine :)

Have you seen contention on the arcs_mtx, when manipulating the arc
lists from add_reference() and remove_reference()?  I typically see
contention on that before I see contention on e.g. the godfather zio.  I
am working on a fix for the arcs_mtx, but it is much more involved
because we need to split the list into per-CPU lists.

With ARC large enough to cover all the dataset to avoid disk I/O I see quite little congestion in this test after the two last patches:

09.91%  [241]      _sx_xlock_hard @ /boot/kernel/kernel
 44.81%  [108]       dbuf_find
  94.44%  [102]        dbuf_hold_impl
   100.0%  [102]         dbuf_hold
  05.56%  [6]          dbuf_prefetch
   100.0%  [6]           dmu_zfetch_dofetch
 30.71%  [74]        dbuf_read
  100.0%  [74]         dnode_hold_impl
   100.0%  [74]          dmu_read_uio
 19.50%  [47]        dbuf_rele
  100.0%  [47]         dnode_hold_impl
   100.0%  [47]          dmu_read_uio
 02.07%  [5]         add_reference
  100.0%  [5]          arc_buf_add_ref
   100.0%  [5]           dbuf_hold_impl
 01.24%  [3]         remove_reference
  100.0%  [3]          arc_buf_remove_ref
   100.0%  [3]           dbuf_rele_and_unlock

With ARC limited to 4GB to force almost uncacheable I/O with tons of ARC evictions I see significant congestion around ARC locks:

45.18%  [3935]     _sx_xlock_hard @ /boot/kernel/kernel
 33.70%  [1326]      arc_evict
  100.0%  [1326]       arc_get_data_buf
   100.0%  [1326]        arc_read
    99.92%  [1325]         dbuf_prefetch
     100.0%  [1325]          dmu_zfetch_dofetch
      100.0%  [1325]           dmu_zfetch
       100.0%  [1325]            dbuf_read
    00.08%  [1]            dbuf_read
     100.0%  [1]             dmu_buf_hold_array_by_dnode
      100.0%  [1]              dmu_read_uio
       100.0%  [1]               zfs_freebsd_read
 26.63%  [1048]      buf_hash_find
  66.41%  [696]        arc_read
   53.74%  [374]         dbuf_prefetch
    100.0%  [374]          dmu_zfetch_dofetch
     100.0%  [374]           dmu_zfetch
      100.0%  [374]            dbuf_read
       100.0%  [374]             dmu_buf_hold_array_by_dnode
   46.26%  [322]         dbuf_read
    99.07%  [319]          dmu_buf_hold_array_by_dnode
     100.0%  [319]           dmu_read_uio
      100.0%  [319]            zfs_freebsd_read
       100.0%  [319]             VOP_READ_APV
    00.93%  [3]            dbuf_findbp
     100.0%  [3]             dbuf_prefetch
      100.0%  [3]              dmu_zfetch_dofetch
       100.0%  [3]               dmu_zfetch
  33.59%  [352]        arc_read_done
   100.0%  [352]         zio_done
    100.0%  [352]          zio_execute
     100.0%  [352]           zio_done
      100.0%  [352]            zio_execute
       78.12%  [275]             taskqueue_run_locked
       21.88%  [77]              zio_done
 13.93%  [548]       arc_change_state
  100.0%  [548]        arc_access
   53.83%  [295]         arc_read
    100.0%  [295]          dbuf_prefetch
     100.0%  [295]           dmu_zfetch_dofetch
      100.0%  [295]            dmu_zfetch
       100.0%  [295]             dbuf_read
   46.17%  [253]         arc_read_done
    100.0%  [253]          zio_done
     100.0%  [253]           zio_execute
      100.0%  [253]            zio_done
       100.0%  [253]             zio_execute
 12.12%  [477]       add_reference
  100.0%  [477]        arc_read
   100.0%  [477]         dbuf_read
    100.0%  [477]          dmu_buf_hold_array_by_dnode
     100.0%  [477]           dmu_read_uio
      100.0%  [477]            zfs_freebsd_read
       100.0%  [477]             VOP_READ_APV
 06.81%  [268]       remove_reference
  100.0%  [268]        arc_buf_remove_ref
   100.0%  [268]         dbuf_rele_and_unlock
    100.0%  [268]          dmu_read_uio


        On Thu, Jun 5, 2014 at 12:34 PM, Alexander Motin
        <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

             Hi again! Another day, another patch. :)

             Testing the same setup as in earlier "Godfather ZIO lock
        congestion"
             thread (small strided reads from 256 threads concurrently
        on 40-core
             machine) but with ARC size sufficient to fit all test data,
        I hit
             another lock congestion on RRW lock of zfsvfs->z_teardown_lock.
             Earlier I've seen the same congestion even on smaller
        machine while
             profiling SPEC NFS benchmark.

             To avoid the congestion with attached patch I've replaced
        single
             teardown RRW lock per struct vfszfs with bunch (17) of
        them. Read
             acquisitions are randomly distributed among them based on
        curthread
             pointer to avoid any measurable congestion in a hot path. Write
             acquisition are going to all the locks, but they should be rare
             enough to not bother.

             As result, performance on this test setup increased from
        ~475K IOPS
             to ~1.3M IOPS.

             Any comments?

             --
             Alexander Motin

             _________________________________________________
             developer mailing list
        [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>__>
        http://lists.open-zfs.org/__mailman/listinfo/developer
        <http://lists.open-zfs.org/mailman/listinfo/developer>




    --
    Alexander Motin




--
Alexander Motin
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to