On 9/2/15 4:59 PM, Malahal Naineni wrote:
> Matt Benjamin [mbenja...@redhat.com] wrote:
>> Hi Bill,
>>
>> There has not been griping.  Malahal has done performance measurement.
>>
>> IIUC (per IRC) Malahal:
>>
>> 1. has empirical evidence that moving the current Ganesha -dispatch queue- 
>> bands into lanes
>> measurably improves throughput, when the number of worker threads is large 
>> (apparently, 64 and (!) 256)
>
> Matt, correct! "perf record" showed time spent in pthread_mutex_lock,
> unlock routines there, and having 256 lanes did help with some
> performance.
>
Looking forward to the patch.  Can probably merge it with mine.


>> That is expected, so I'm looking to Malahal to send a change (and some other 
>> tuning changes) for review.
>>
>> 2. Malahal indicated he found a bug of some kind in the thread fridge, 
>> provoked by the shutdown check
>> in the current dispatch queue code, which he says he fixed, so if he hasn't 
>> already sent a change, I'm
>> expecting to see one soon which addresses this.
>
> I didn't find any bug, but each worker thread calling
> fridgethr_you_should_break() repeatedly prior to taking each request
> showed too much time spent in the pthread_mutex_lock. I just removed the
> lock/unlock around transitioning field check.
>
Yeah, assuming you changed to atomic?  Or was it idempotent?

There are too many locks around setting 1 variable.

>> 3. Malahal described in IRC an additional change he made to split the ntirpc 
>> output ioq into
>> lanes, and believed he saw improvement (as of ~2 weeks ago), but was still 
>> benchmarking in order to
>> split out the impact of this change relative to others.
>
> I did this as well, but the gains seemed marginal at best!
>
As I'd suspected.  My patch there will eventually eliminate the
lock entirely.


> We were desperate and make all kinds of changes here and there. The
> system is busy now. Once I get the system back, I will validate
> individual patches, and post upstream.
>
> perf record shows that too much time is spent in malloc/free functions.
> Reported functions are alloc_nfs_request, alloc_nfs_res, and few objects
> in src/xdr_ioq.c file. alloc_nfs_res seems thread specific, so could be
> allocated one per thread. If we can make other pools lockless (instead
> of malloc), that would be great!
>
Oh, yeah!  We know about this!  (It's not really a pool, just a
spot where a pool would be good, so Adam wrapped some pool logic
around it.)

Those 3 can be combined into 1.  But there's a spot of code that
changes the pointer to the msg.  That needs to be worked out.

I've already reduced the number of malloc/frees per by at least 1/3
earlier in this dev cycle.  Various auth arrays are pre-allocated,
appended onto the request message.  Worker stubs are pre-allocated
into the contexts they'll be queing.


------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to