On 05.06.2014 22:54, Matthew Ahrens wrote:
On Thu, Jun 5, 2014 at 12:50 PM, Alexander Motin <[email protected]
<mailto:[email protected]>> wrote:
On 05.06.2014 22:37, Matthew Ahrens wrote:
Interesting, what platform are you testing on? I have not seen
substantial congestion on this lock on illumos, testing with up
to ~1
million IOPS (reads of cached 8k blocks).
Now I am testing this on dual IvyBridge Xeon E5-2690 v2 system (40
(2x10x2) logical cores).
Mentioned earlier SPEC NFS test was running dual Westmere Xeon E5645
system (24 (2x6x2) logical cores). There problem was much less
noticeable, but IOPS in that test were much lower too.
I'd like to note that I've seen quite a lot of examples already,
when congestion barely measurable on 24 Westmere cores just explodes
on 40 IvyBridge. This looks like one of them.
Ah, interesting. I have been testing with up to 24 CPUs. Will have to
find a larger machine :)
Have you seen contention on the arcs_mtx, when manipulating the arc
lists from add_reference() and remove_reference()? I typically see
contention on that before I see contention on e.g. the godfather zio. I
am working on a fix for the arcs_mtx, but it is much more involved
because we need to split the list into per-CPU lists.
With ARC large enough to cover all the dataset to avoid disk I/O I see
quite little congestion in this test after the two last patches:
09.91% [241] _sx_xlock_hard @ /boot/kernel/kernel
44.81% [108] dbuf_find
94.44% [102] dbuf_hold_impl
100.0% [102] dbuf_hold
05.56% [6] dbuf_prefetch
100.0% [6] dmu_zfetch_dofetch
30.71% [74] dbuf_read
100.0% [74] dnode_hold_impl
100.0% [74] dmu_read_uio
19.50% [47] dbuf_rele
100.0% [47] dnode_hold_impl
100.0% [47] dmu_read_uio
02.07% [5] add_reference
100.0% [5] arc_buf_add_ref
100.0% [5] dbuf_hold_impl
01.24% [3] remove_reference
100.0% [3] arc_buf_remove_ref
100.0% [3] dbuf_rele_and_unlock
With ARC limited to 4GB to force almost uncacheable I/O with tons of ARC
evictions I see significant congestion around ARC locks:
45.18% [3935] _sx_xlock_hard @ /boot/kernel/kernel
33.70% [1326] arc_evict
100.0% [1326] arc_get_data_buf
100.0% [1326] arc_read
99.92% [1325] dbuf_prefetch
100.0% [1325] dmu_zfetch_dofetch
100.0% [1325] dmu_zfetch
100.0% [1325] dbuf_read
00.08% [1] dbuf_read
100.0% [1] dmu_buf_hold_array_by_dnode
100.0% [1] dmu_read_uio
100.0% [1] zfs_freebsd_read
26.63% [1048] buf_hash_find
66.41% [696] arc_read
53.74% [374] dbuf_prefetch
100.0% [374] dmu_zfetch_dofetch
100.0% [374] dmu_zfetch
100.0% [374] dbuf_read
100.0% [374] dmu_buf_hold_array_by_dnode
46.26% [322] dbuf_read
99.07% [319] dmu_buf_hold_array_by_dnode
100.0% [319] dmu_read_uio
100.0% [319] zfs_freebsd_read
100.0% [319] VOP_READ_APV
00.93% [3] dbuf_findbp
100.0% [3] dbuf_prefetch
100.0% [3] dmu_zfetch_dofetch
100.0% [3] dmu_zfetch
33.59% [352] arc_read_done
100.0% [352] zio_done
100.0% [352] zio_execute
100.0% [352] zio_done
100.0% [352] zio_execute
78.12% [275] taskqueue_run_locked
21.88% [77] zio_done
13.93% [548] arc_change_state
100.0% [548] arc_access
53.83% [295] arc_read
100.0% [295] dbuf_prefetch
100.0% [295] dmu_zfetch_dofetch
100.0% [295] dmu_zfetch
100.0% [295] dbuf_read
46.17% [253] arc_read_done
100.0% [253] zio_done
100.0% [253] zio_execute
100.0% [253] zio_done
100.0% [253] zio_execute
12.12% [477] add_reference
100.0% [477] arc_read
100.0% [477] dbuf_read
100.0% [477] dmu_buf_hold_array_by_dnode
100.0% [477] dmu_read_uio
100.0% [477] zfs_freebsd_read
100.0% [477] VOP_READ_APV
06.81% [268] remove_reference
100.0% [268] arc_buf_remove_ref
100.0% [268] dbuf_rele_and_unlock
100.0% [268] dmu_read_uio
On Thu, Jun 5, 2014 at 12:34 PM, Alexander Motin
<[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>> wrote:
Hi again! Another day, another patch. :)
Testing the same setup as in earlier "Godfather ZIO lock
congestion"
thread (small strided reads from 256 threads concurrently
on 40-core
machine) but with ARC size sufficient to fit all test data,
I hit
another lock congestion on RRW lock of zfsvfs->z_teardown_lock.
Earlier I've seen the same congestion even on smaller
machine while
profiling SPEC NFS benchmark.
To avoid the congestion with attached patch I've replaced
single
teardown RRW lock per struct vfszfs with bunch (17) of
them. Read
acquisitions are randomly distributed among them based on
curthread
pointer to avoid any measurable congestion in a hot path. Write
acquisition are going to all the locks, but they should be rare
enough to not bother.
As result, performance on this test setup increased from
~475K IOPS
to ~1.3M IOPS.
Any comments?
--
Alexander Motin
_________________________________________________
developer mailing list
[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>__>
http://lists.open-zfs.org/__mailman/listinfo/developer
<http://lists.open-zfs.org/mailman/listinfo/developer>
--
Alexander Motin
--
Alexander Motin
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer