On Thu, Jun 5, 2014 at 12:50 PM, Alexander Motin <[email protected]> wrote:

> On 05.06.2014 22:37, Matthew Ahrens wrote:
>
>> Interesting, what platform are you testing on?  I have not seen
>> substantial congestion on this lock on illumos, testing with up to ~1
>> million IOPS (reads of cached 8k blocks).
>>
>
> Now I am testing this on dual IvyBridge Xeon E5-2690 v2 system (40
> (2x10x2) logical cores).
>
> Mentioned earlier SPEC NFS test was running dual Westmere Xeon E5645
> system (24 (2x6x2) logical cores). There problem was much less noticeable,
> but IOPS in that test were much lower too.
>
> I'd like to note that I've seen quite a lot of examples already, when
> congestion barely measurable on 24 Westmere cores just explodes on 40
> IvyBridge. This looks like one of them.
>

Ah, interesting.  I have been testing with up to 24 CPUs.  Will have to
find a larger machine :)

Have you seen contention on the arcs_mtx, when manipulating the arc lists
from add_reference() and remove_reference()?  I typically see contention on
that before I see contention on e.g. the godfather zio.  I am working on a
fix for the arcs_mtx, but it is much more involved because we need to split
the list into per-CPU lists.

--matt


>
>  On Thu, Jun 5, 2014 at 12:34 PM, Alexander Motin <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     Hi again! Another day, another patch. :)
>>
>>     Testing the same setup as in earlier "Godfather ZIO lock congestion"
>>     thread (small strided reads from 256 threads concurrently on 40-core
>>     machine) but with ARC size sufficient to fit all test data, I hit
>>     another lock congestion on RRW lock of zfsvfs->z_teardown_lock.
>>     Earlier I've seen the same congestion even on smaller machine while
>>     profiling SPEC NFS benchmark.
>>
>>     To avoid the congestion with attached patch I've replaced single
>>     teardown RRW lock per struct vfszfs with bunch (17) of them. Read
>>     acquisitions are randomly distributed among them based on curthread
>>     pointer to avoid any measurable congestion in a hot path. Write
>>     acquisition are going to all the locks, but they should be rare
>>     enough to not bother.
>>
>>     As result, performance on this test setup increased from ~475K IOPS
>>     to ~1.3M IOPS.
>>
>>     Any comments?
>>
>>     --
>>     Alexander Motin
>>
>>     _______________________________________________
>>     developer mailing list
>>     [email protected] <mailto:[email protected]>
>>     http://lists.open-zfs.org/mailman/listinfo/developer
>>
>>
>>
>
> --
> Alexander Motin
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to