Re: Request for testing malloc and multi-threaded applications

2022-09-27 Thread Otto Moerbeek
On Tue, Sep 27, 2022 at 03:31:12PM +0200, Renaud Allard wrote:

> On 1/16/19 19:09, Otto Moerbeek wrote:
> > On Wed, Jan 16, 2019 at 01:25:25PM +, Stuart Henderson wrote:
> > 
> > > On 2019/01/04 08:09, Otto Moerbeek wrote:
> > > > On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote:
> > > > 
> > > > > 
> > > > > Very little feedback so far. This diff can only give me valid feedback
> > > > > if the coverage of systems and use cases is wide.  If I do not get
> > > > > more feedback, I have to base my decisions on my own testing, which
> > > > > will benefit my systems and use cases, but might harm yours.
> > > > > 
> > > > > So, ladies and gentlemen, start your tests!
> > > > 
> > > > Another reminder. I like to make progress on this. That means I need
> > > > tests for various use-cases.
> > > 
> > > I have a map based website I use that is quite good at stressing things
> > > (high spin% cpu) and have been timing from opening chromium (I'm using
> > > this for the test because it typically performs less well than firefox).
> > > Time is real time from starting the browser set to 'start with previously
> > > opened windows' and the page open, until when the page reports that it's
> > > finished loading (i.e. fetching data from the server and rendering it).
> > > 
> > > It's not a perfect test - depends on network/server conditions etc - and
> > > it's a visualisation of conditions in a game so may change slightly from
> > > run to run but there shouldn't be huge changes between the times I've
> > > run it - but is a bit more repeatable than a subjective "does the browser
> > > feel slow".
> > > 
> > > 4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap).
> > > 
> > > I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in 
> > > order,
> > > more like +++, -, '', -, ++ etc.
> > > 
> > >   +++ 90  98  68
> > >   ++  85  82
> > >   +   87  56  71
> > >   ''  76  60  69  88
> > >   -   77  74  85
> > >   --  48  86  77  67
> > > 
> > > So while it's not very consistent, the fastest times I've seen are on
> > > runs with fewer pools, and the slowest times on runs with more pools,
> > > with '' possibly seeming a bit more consistent from run to run. But
> > > there's not enough consistency with any of it to be able to make any
> > > clear conclusion (and I get the impression it would be hard to
> > > tell without some automated test that can be repeated many times
> > > and carrying out a statistical analysis on results).
> > > 
> > 
> > Thanks for testing. To be clear: this is with the diff I posted and not the
> > committed code, right? (There is a small change in the committed code
> > to change the default to what 1 plus was with the diff).
> > 
> > -Otto
> > 
> 
> Hello,
> 
> Given that code is in base for about 4 years, shouldn't be the man page
> modified to add an explanation for those ++--? Or is there a reason why it's
> not documented?
> 
> Best Regards
> 


No, this is for internal/development use only and might be removed any time.
It's undocumented on purpose.

-Otto



Re: Request for testing malloc and multi-threaded applications

2022-09-27 Thread Renaud Allard

On 1/16/19 19:09, Otto Moerbeek wrote:

On Wed, Jan 16, 2019 at 01:25:25PM +, Stuart Henderson wrote:


On 2019/01/04 08:09, Otto Moerbeek wrote:

On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote:



Very little feedback so far. This diff can only give me valid feedback
if the coverage of systems and use cases is wide.  If I do not get
more feedback, I have to base my decisions on my own testing, which
will benefit my systems and use cases, but might harm yours.

So, ladies and gentlemen, start your tests!


Another reminder. I like to make progress on this. That means I need
tests for various use-cases.


I have a map based website I use that is quite good at stressing things
(high spin% cpu) and have been timing from opening chromium (I'm using
this for the test because it typically performs less well than firefox).
Time is real time from starting the browser set to 'start with previously
opened windows' and the page open, until when the page reports that it's
finished loading (i.e. fetching data from the server and rendering it).

It's not a perfect test - depends on network/server conditions etc - and
it's a visualisation of conditions in a game so may change slightly from
run to run but there shouldn't be huge changes between the times I've
run it - but is a bit more repeatable than a subjective "does the browser
feel slow".

4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap).

I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in order,
more like +++, -, '', -, ++ etc.

  +++   90  98  68
  ++85  82
  + 87  56  71
  ''76  60  69  88
  - 77  74  85
  --48  86  77  67

So while it's not very consistent, the fastest times I've seen are on
runs with fewer pools, and the slowest times on runs with more pools,
with '' possibly seeming a bit more consistent from run to run. But
there's not enough consistency with any of it to be able to make any
clear conclusion (and I get the impression it would be hard to
tell without some automated test that can be repeated many times
and carrying out a statistical analysis on results).



Thanks for testing. To be clear: this is with the diff I posted and not the
committed code, right? (There is a small change in the committed code
to change the default to what 1 plus was with the diff).

-Otto



Hello,

Given that code is in base for about 4 years, shouldn't be the man page 
modified to add an explanation for those ++--? Or is there a reason why 
it's not documented?


Best Regards



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Request for testing malloc and multi-threaded applications

2019-01-18 Thread Otto Moerbeek
On Fri, Jan 18, 2019 at 08:41:57AM +0100, Alexandr Nedvedicky wrote:

> Hello Otto,
> 
> I gave it a try with firefox. according to my subjective tests
> I could not spot any differences with various setting.
> 
> I've decided to try with some memory benchmarks I could find on github [1]. I
> did create a fork [2] with my own test runner to try out your diff. To run it
> just do something like:
> 
> git clone https://github.com/Sashan/Hoard.git
> cd Hoard/benchmarks/
> make
> 
> the benchmarks are from 90's. Description can be found in paper kept along to
> Hoard project [3] 
> 
> the box where I did run tests 4 CPUs:
> cpu0: Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz, \
>   2997.38 MHz, 06-17-0a
> with 8GB of RAM.
> 
> I used time(1) to measure to running time of test-run.sh with particular
> MALLOC_OPTIONS set. The results are as follows:
> 
> Running with MALLOC_OPTIONS=
>  1730.27 real  3289.41 user  3574.28 sys
> Running with MALLOC_OPTIONS=-
>  1726.16 real  3279.37 user  3575.26 sys
> Running with MALLOC_OPTIONS=+
>  1712.40 real  3296.65 user  3483.03 sys
> Running with MALLOC_OPTIONS=--
>  1741.42 real  3290.89 user  3616.37 sys
> Running with MALLOC_OPTIONS=++
>  1765.02 real  3287.75 user  3665.30 sys
> Running with MALLOC_OPTIONS=+++
>  1758.06 real  3300.00 user  3631.57 sys
> 
> As you can see differences are insignificant, spread is ~1 minute. On round of
> test took ~30 minutes.
> 
> regards
> sashan

Thanks,

-Otto

> 
> [1] https://github.com/emeryberger/Hoard/tree/master/benchmarks
> 
> [2] https://github.com/Sashan/Hoard
> 
> [3] https://github.com/emeryberger/Hoard/blob/master/doc/berger-asplos2000.pdf
> 
> On Wed, Dec 19, 2018 at 11:20:19AM +0100, Otto Moerbeek wrote:
> > On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote:
> > 
> > > Hi,
> > > 
> > > This diff implements a more flexible approach for the number of pools
> > > malloc uses in the multi-threaded case. At the momemt I do not intend
> > > to commit this as-is, I first need this to get some feedback on what
> > > the proper default should be.
> > > 
> > > Currently the number of pools is fixed at 4. More pools mean less
> > > contention for allocations, but free becomes more expensive since a
> > > thread might need to check other pools increasing contention.
> > > 
> > > I'd like to know how this diff behaves using your favorite
> > > mutli-threaded application. Often this will be a web-browser I guess.
> > > 
> > > Test instructions:
> > > 
> > > 0. Make sure you are running current.
> > > 
> > > 1. Do a baseline test of your application.
> > > 
> > > 2. Apply diff, build and install userland.
> > > 
> > > 3. Run your test application with MALLOC_OPTIONS=value, where value is: 
> > > "", +, -, ++, -- and +++.
> > > 
> > > e.g. 
> > > 
> > >   MALLOC_OPTIONS=++ chrome
> > > 
> > > Note performance. Do multiple tests to get better statistics.
> > > 
> > > If you're not able to do full tests, at least general observations are
> > > welcome. Tell a bit about the system you tested on (e.g. number of
> > > cores). Note that due to randomization, different runs might show
> > > different performance numbers since the pools shared by subsets of
> > > threads can turn out differently.
> > > 
> > > Thanks,
> > > 
> > >   -Otto
> > 
> > New diff with problem noted by  Janne Johansson fixed.
> > 
> > Index: include/thread_private.h
> > ===
> > RCS file: /cvs/src/lib/libc/include/thread_private.h,v
> > retrieving revision 1.33
> > diff -u -p -r1.33 thread_private.h
> > --- include/thread_private.h5 Dec 2017 13:45:31 -   1.33
> > +++ include/thread_private.h19 Dec 2018 10:18:38 -
> > @@ -7,7 +7,7 @@
> >  
> >  #include  /* for FILE and __isthreaded */
> >  
> > -#define _MALLOC_MUTEXES 4
> > +#define _MALLOC_MUTEXES 32
> >  void _malloc_init(int);
> >  #ifdef __LIBC__
> >  PROTO_NORMAL(_malloc_init);
> > Index: stdlib/malloc.c
> > ===
> > RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> > retrieving revision 1.257
> > diff -u -p -r1.257 malloc.c
> > --- stdlib/malloc.c 10 Dec 2018 07:57:49 -  1.257
> > +++ stdlib/malloc.c 19 Dec 2018 10:18:38 -
> > @@ -143,6 +143,8 @@ struct dir_info {
> > size_t cheap_reallocs;
> > size_t malloc_used; /* bytes allocated */
> > size_t malloc_guarded;  /* bytes used for guards */
> > +   size_t pool_searches;   /* searches for pool */
> > +   size_t other_pool;  /* searches in other pool */
> >  #define STATS_ADD(x,y) ((x) += (y))
> >  #define STATS_SUB(x,y) ((x) -= (y))
> >  #define STATS_INC(x)   ((x)++)
> > @@ -179,7 +181,9 @@ struct chunk_info {
> >  };
> >  
> >  struct malloc_readonly {
> > -   struct dir_info *malloc_pool[_MALLOC_MUTEXES];  /* Main 

Re: Request for testing malloc and multi-threaded applications

2019-01-17 Thread Alexandr Nedvedicky
Hello Otto,

I gave it a try with firefox. according to my subjective tests
I could not spot any differences with various setting.

I've decided to try with some memory benchmarks I could find on github [1]. I
did create a fork [2] with my own test runner to try out your diff. To run it
just do something like:

git clone https://github.com/Sashan/Hoard.git
cd Hoard/benchmarks/
make

the benchmarks are from 90's. Description can be found in paper kept along to
Hoard project [3] 

the box where I did run tests 4 CPUs:
cpu0: Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz, \
2997.38 MHz, 06-17-0a
with 8GB of RAM.

I used time(1) to measure to running time of test-run.sh with particular
MALLOC_OPTIONS set. The results are as follows:

Running with MALLOC_OPTIONS=
 1730.27 real  3289.41 user  3574.28 sys
Running with MALLOC_OPTIONS=-
 1726.16 real  3279.37 user  3575.26 sys
Running with MALLOC_OPTIONS=+
 1712.40 real  3296.65 user  3483.03 sys
Running with MALLOC_OPTIONS=--
 1741.42 real  3290.89 user  3616.37 sys
Running with MALLOC_OPTIONS=++
 1765.02 real  3287.75 user  3665.30 sys
Running with MALLOC_OPTIONS=+++
 1758.06 real  3300.00 user  3631.57 sys

As you can see differences are insignificant, spread is ~1 minute. On round of
test took ~30 minutes.

regards
sashan

[1] https://github.com/emeryberger/Hoard/tree/master/benchmarks

[2] https://github.com/Sashan/Hoard

[3] https://github.com/emeryberger/Hoard/blob/master/doc/berger-asplos2000.pdf

On Wed, Dec 19, 2018 at 11:20:19AM +0100, Otto Moerbeek wrote:
> On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > This diff implements a more flexible approach for the number of pools
> > malloc uses in the multi-threaded case. At the momemt I do not intend
> > to commit this as-is, I first need this to get some feedback on what
> > the proper default should be.
> > 
> > Currently the number of pools is fixed at 4. More pools mean less
> > contention for allocations, but free becomes more expensive since a
> > thread might need to check other pools increasing contention.
> > 
> > I'd like to know how this diff behaves using your favorite
> > mutli-threaded application. Often this will be a web-browser I guess.
> > 
> > Test instructions:
> > 
> > 0. Make sure you are running current.
> > 
> > 1. Do a baseline test of your application.
> > 
> > 2. Apply diff, build and install userland.
> > 
> > 3. Run your test application with MALLOC_OPTIONS=value, where value is: 
> > "", +, -, ++, -- and +++.
> > 
> > e.g. 
> > 
> > MALLOC_OPTIONS=++ chrome
> > 
> > Note performance. Do multiple tests to get better statistics.
> > 
> > If you're not able to do full tests, at least general observations are
> > welcome. Tell a bit about the system you tested on (e.g. number of
> > cores). Note that due to randomization, different runs might show
> > different performance numbers since the pools shared by subsets of
> > threads can turn out differently.
> > 
> > Thanks,
> > 
> > -Otto
> 
> New diff with problem noted by  Janne Johansson fixed.
> 
> Index: include/thread_private.h
> ===
> RCS file: /cvs/src/lib/libc/include/thread_private.h,v
> retrieving revision 1.33
> diff -u -p -r1.33 thread_private.h
> --- include/thread_private.h  5 Dec 2017 13:45:31 -   1.33
> +++ include/thread_private.h  19 Dec 2018 10:18:38 -
> @@ -7,7 +7,7 @@
>  
>  #include/* for FILE and __isthreaded */
>  
> -#define _MALLOC_MUTEXES 4
> +#define _MALLOC_MUTEXES 32
>  void _malloc_init(int);
>  #ifdef __LIBC__
>  PROTO_NORMAL(_malloc_init);
> Index: stdlib/malloc.c
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> retrieving revision 1.257
> diff -u -p -r1.257 malloc.c
> --- stdlib/malloc.c   10 Dec 2018 07:57:49 -  1.257
> +++ stdlib/malloc.c   19 Dec 2018 10:18:38 -
> @@ -143,6 +143,8 @@ struct dir_info {
>   size_t cheap_reallocs;
>   size_t malloc_used; /* bytes allocated */
>   size_t malloc_guarded;  /* bytes used for guards */
> + size_t pool_searches;   /* searches for pool */
> + size_t other_pool;  /* searches in other pool */
>  #define STATS_ADD(x,y)   ((x) += (y))
>  #define STATS_SUB(x,y)   ((x) -= (y))
>  #define STATS_INC(x) ((x)++)
> @@ -179,7 +181,9 @@ struct chunk_info {
>  };
>  
>  struct malloc_readonly {
> - struct dir_info *malloc_pool[_MALLOC_MUTEXES];  /* Main bookkeeping 
> information */
> + /* Main bookkeeping information */
> + struct dir_info *malloc_pool[_MALLOC_MUTEXES];
> + u_int   malloc_mutexes; /* how much in actual use? */
>   int malloc_mt;  /* multi-threaded mode? */
>   int malloc_freecheck;   /* Extensive 

Re: Request for testing malloc and multi-threaded applications

2019-01-16 Thread Stuart Henderson
On 2019/01/16 19:09, Otto Moerbeek wrote:
> On Wed, Jan 16, 2019 at 01:25:25PM +, Stuart Henderson wrote:
> 
> > On 2019/01/04 08:09, Otto Moerbeek wrote:
> > > On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote:
> > > 
> > > > 
> > > > Very little feedback so far. This diff can only give me valid feedback
> > > > if the coverage of systems and use cases is wide.  If I do not get
> > > > more feedback, I have to base my decisions on my own testing, which
> > > > will benefit my systems and use cases, but might harm yours.
> > > > 
> > > > So, ladies and gentlemen, start your tests!
> > > 
> > > Another reminder. I like to make progress on this. That means I need
> > > tests for various use-cases.
> > 
> > I have a map based website I use that is quite good at stressing things
> > (high spin% cpu) and have been timing from opening chromium (I'm using
> > this for the test because it typically performs less well than firefox).
> > Time is real time from starting the browser set to 'start with previously
> > opened windows' and the page open, until when the page reports that it's
> > finished loading (i.e. fetching data from the server and rendering it).
> > 
> > It's not a perfect test - depends on network/server conditions etc - and
> > it's a visualisation of conditions in a game so may change slightly from
> > run to run but there shouldn't be huge changes between the times I've
> > run it - but is a bit more repeatable than a subjective "does the browser
> > feel slow".
> > 
> > 4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap).
> > 
> > I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in order,
> > more like +++, -, '', -, ++ etc.
> > 
> >  +++90  98  68
> >  ++ 85  82
> >  +  87  56  71
> >  '' 76  60  69  88
> >  -  77  74  85
> >  -- 48  86  77  67
> > 
> > So while it's not very consistent, the fastest times I've seen are on
> > runs with fewer pools, and the slowest times on runs with more pools,
> > with '' possibly seeming a bit more consistent from run to run. But
> > there's not enough consistency with any of it to be able to make any
> > clear conclusion (and I get the impression it would be hard to
> > tell without some automated test that can be repeated many times
> > and carrying out a statistical analysis on results).
> > 
> 
> Thanks for testing. To be clear: this is with the diff I posted and not the
> committed code, right? (There is a small change in the committed code
> to change the default to what 1 plus was with the diff).
> 
>   -Otto
> 

Ah I missed that it was committed (and thought that the diff as sent
was in snapshots) - this was the committed version then.

(It took a while to test as I was trying to think of something where
I actually had a chance of noticing a difference!).



Re: Request for testing malloc and multi-threaded applications

2019-01-16 Thread Otto Moerbeek
On Wed, Jan 16, 2019 at 01:25:25PM +, Stuart Henderson wrote:

> On 2019/01/04 08:09, Otto Moerbeek wrote:
> > On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote:
> > 
> > > 
> > > Very little feedback so far. This diff can only give me valid feedback
> > > if the coverage of systems and use cases is wide.  If I do not get
> > > more feedback, I have to base my decisions on my own testing, which
> > > will benefit my systems and use cases, but might harm yours.
> > > 
> > > So, ladies and gentlemen, start your tests!
> > 
> > Another reminder. I like to make progress on this. That means I need
> > tests for various use-cases.
> 
> I have a map based website I use that is quite good at stressing things
> (high spin% cpu) and have been timing from opening chromium (I'm using
> this for the test because it typically performs less well than firefox).
> Time is real time from starting the browser set to 'start with previously
> opened windows' and the page open, until when the page reports that it's
> finished loading (i.e. fetching data from the server and rendering it).
> 
> It's not a perfect test - depends on network/server conditions etc - and
> it's a visualisation of conditions in a game so may change slightly from
> run to run but there shouldn't be huge changes between the times I've
> run it - but is a bit more repeatable than a subjective "does the browser
> feel slow".
> 
> 4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap).
> 
> I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in order,
> more like +++, -, '', -, ++ etc.
> 
>  +++  90  98  68
>  ++   85  82
>  +87  56  71
>  ''   76  60  69  88
>  -77  74  85
>  --   48  86  77  67
> 
> So while it's not very consistent, the fastest times I've seen are on
> runs with fewer pools, and the slowest times on runs with more pools,
> with '' possibly seeming a bit more consistent from run to run. But
> there's not enough consistency with any of it to be able to make any
> clear conclusion (and I get the impression it would be hard to
> tell without some automated test that can be repeated many times
> and carrying out a statistical analysis on results).
> 

Thanks for testing. To be clear: this is with the diff I posted and not the
committed code, right? (There is a small change in the committed code
to change the default to what 1 plus was with the diff).

-Otto



Re: Request for testing malloc and multi-threaded applications

2019-01-16 Thread Stuart Henderson
On 2019/01/04 08:09, Otto Moerbeek wrote:
> On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote:
> 
> > 
> > Very little feedback so far. This diff can only give me valid feedback
> > if the coverage of systems and use cases is wide.  If I do not get
> > more feedback, I have to base my decisions on my own testing, which
> > will benefit my systems and use cases, but might harm yours.
> > 
> > So, ladies and gentlemen, start your tests!
> 
> Another reminder. I like to make progress on this. That means I need
> tests for various use-cases.

I have a map based website I use that is quite good at stressing things
(high spin% cpu) and have been timing from opening chromium (I'm using
this for the test because it typically performs less well than firefox).
Time is real time from starting the browser set to 'start with previously
opened windows' and the page open, until when the page reports that it's
finished loading (i.e. fetching data from the server and rendering it).

It's not a perfect test - depends on network/server conditions etc - and
it's a visualisation of conditions in a game so may change slightly from
run to run but there shouldn't be huge changes between the times I've
run it - but is a bit more repeatable than a subjective "does the browser
feel slow".

4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap).

I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in order,
more like +++, -, '', -, ++ etc.

 +++90  98  68
 ++ 85  82
 +  87  56  71
 '' 76  60  69  88
 -  77  74  85
 -- 48  86  77  67

So while it's not very consistent, the fastest times I've seen are on
runs with fewer pools, and the slowest times on runs with more pools,
with '' possibly seeming a bit more consistent from run to run. But
there's not enough consistency with any of it to be able to make any
clear conclusion (and I get the impression it would be hard to
tell without some automated test that can be repeated many times
and carrying out a statistical analysis on results).



Re: Request for testing malloc and multi-threaded applications

2019-01-03 Thread Otto Moerbeek
On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote:

> 
> Very little feedback so far. This diff can only give me valid feedback
> if the coverage of systems and use cases is wide.  If I do not get
> more feedback, I have to base my decisions on my own testing, which
> will benefit my systems and use cases, but might harm yours.
> 
> So, ladies and gentlemen, start your tests!

Another reminder. I like to make progress on this. That means I need
tests for various use-cases.

Thanks

-Otto
> 
> 
> On Wed, Dec 19, 2018 at 11:20:19AM +0100, Otto Moerbeek wrote:
> 
> > On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote:
> > 
> > > Hi,
> > > 
> > > This diff implements a more flexible approach for the number of pools
> > > malloc uses in the multi-threaded case. At the momemt I do not intend
> > > to commit this as-is, I first need this to get some feedback on what
> > > the proper default should be.
> > > 
> > > Currently the number of pools is fixed at 4. More pools mean less
> > > contention for allocations, but free becomes more expensive since a
> > > thread might need to check other pools increasing contention.
> > > 
> > > I'd like to know how this diff behaves using your favorite
> > > mutli-threaded application. Often this will be a web-browser I guess.
> > > 
> > > Test instructions:
> > > 
> > > 0. Make sure you are running current.
> > > 
> > > 1. Do a baseline test of your application.
> > > 
> > > 2. Apply diff, build and install userland.
> > > 
> > > 3. Run your test application with MALLOC_OPTIONS=value, where value is: 
> > > "", +, -, ++, -- and +++.
> > > 
> > > e.g. 
> > > 
> > >   MALLOC_OPTIONS=++ chrome
> > > 
> > > Note performance. Do multiple tests to get better statistics.
> > > 
> > > If you're not able to do full tests, at least general observations are
> > > welcome. Tell a bit about the system you tested on (e.g. number of
> > > cores). Note that due to randomization, different runs might show
> > > different performance numbers since the pools shared by subsets of
> > > threads can turn out differently.
> > > 
> > > Thanks,
> > > 
> > >   -Otto
> > 
> > New diff with problem noted by  Janne Johansson fixed.
> > 
> > Index: include/thread_private.h
> > ===
> > RCS file: /cvs/src/lib/libc/include/thread_private.h,v
> > retrieving revision 1.33
> > diff -u -p -r1.33 thread_private.h
> > --- include/thread_private.h5 Dec 2017 13:45:31 -   1.33
> > +++ include/thread_private.h19 Dec 2018 10:18:38 -
> > @@ -7,7 +7,7 @@
> >  
> >  #include  /* for FILE and __isthreaded */
> >  
> > -#define _MALLOC_MUTEXES 4
> > +#define _MALLOC_MUTEXES 32
> >  void _malloc_init(int);
> >  #ifdef __LIBC__
> >  PROTO_NORMAL(_malloc_init);
> > Index: stdlib/malloc.c
> > ===
> > RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> > retrieving revision 1.257
> > diff -u -p -r1.257 malloc.c
> > --- stdlib/malloc.c 10 Dec 2018 07:57:49 -  1.257
> > +++ stdlib/malloc.c 19 Dec 2018 10:18:38 -
> > @@ -143,6 +143,8 @@ struct dir_info {
> > size_t cheap_reallocs;
> > size_t malloc_used; /* bytes allocated */
> > size_t malloc_guarded;  /* bytes used for guards */
> > +   size_t pool_searches;   /* searches for pool */
> > +   size_t other_pool;  /* searches in other pool */
> >  #define STATS_ADD(x,y) ((x) += (y))
> >  #define STATS_SUB(x,y) ((x) -= (y))
> >  #define STATS_INC(x)   ((x)++)
> > @@ -179,7 +181,9 @@ struct chunk_info {
> >  };
> >  
> >  struct malloc_readonly {
> > -   struct dir_info *malloc_pool[_MALLOC_MUTEXES];  /* Main bookkeeping 
> > information */
> > +   /* Main bookkeeping information */
> > +   struct dir_info *malloc_pool[_MALLOC_MUTEXES];
> > +   u_int   malloc_mutexes; /* how much in actual use? */
> > int malloc_mt;  /* multi-threaded mode? */
> > int malloc_freecheck;   /* Extensive double free check */
> > int malloc_freeunmap;   /* mprotect free pages PROT_NONE? */
> > @@ -267,7 +271,7 @@ getpool(void)
> > return mopts.malloc_pool[0];
> > else
> > return mopts.malloc_pool[TIB_GET()->tib_tid &
> > -   (_MALLOC_MUTEXES - 1)];
> > +   (mopts.malloc_mutexes - 1)];
> >  }
> >  
> >  static __dead void
> > @@ -316,6 +320,16 @@ static void
> >  omalloc_parseopt(char opt)
> >  {
> > switch (opt) {
> > +   case '+':
> > +   mopts.malloc_mutexes <<= 1;
> > +   if (mopts.malloc_mutexes > _MALLOC_MUTEXES)
> > +   mopts.malloc_mutexes = _MALLOC_MUTEXES;
> > +   break;
> > +   case '-':
> > +   mopts.malloc_mutexes >>= 1;
> > +   if (mopts.malloc_mutexes < 1)
> > +   mopts.malloc_mutexes = 1;
> > +   break;
> > 

Re: Request for testing malloc and multi-threaded applications

2018-12-27 Thread Otto Moerbeek


Very little feedback so far. This diff can only give me valid feedback
if the coverage of systems and use cases is wide.  If I do not get
more feedback, I have to base my decisions on my own testing, which
will benefit my systems and use cases, but might harm yours.

So, ladies and gentlemen, start your tests!

-Otto


On Wed, Dec 19, 2018 at 11:20:19AM +0100, Otto Moerbeek wrote:

> On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > This diff implements a more flexible approach for the number of pools
> > malloc uses in the multi-threaded case. At the momemt I do not intend
> > to commit this as-is, I first need this to get some feedback on what
> > the proper default should be.
> > 
> > Currently the number of pools is fixed at 4. More pools mean less
> > contention for allocations, but free becomes more expensive since a
> > thread might need to check other pools increasing contention.
> > 
> > I'd like to know how this diff behaves using your favorite
> > mutli-threaded application. Often this will be a web-browser I guess.
> > 
> > Test instructions:
> > 
> > 0. Make sure you are running current.
> > 
> > 1. Do a baseline test of your application.
> > 
> > 2. Apply diff, build and install userland.
> > 
> > 3. Run your test application with MALLOC_OPTIONS=value, where value is: 
> > "", +, -, ++, -- and +++.
> > 
> > e.g. 
> > 
> > MALLOC_OPTIONS=++ chrome
> > 
> > Note performance. Do multiple tests to get better statistics.
> > 
> > If you're not able to do full tests, at least general observations are
> > welcome. Tell a bit about the system you tested on (e.g. number of
> > cores). Note that due to randomization, different runs might show
> > different performance numbers since the pools shared by subsets of
> > threads can turn out differently.
> > 
> > Thanks,
> > 
> > -Otto
> 
> New diff with problem noted by  Janne Johansson fixed.
> 
> Index: include/thread_private.h
> ===
> RCS file: /cvs/src/lib/libc/include/thread_private.h,v
> retrieving revision 1.33
> diff -u -p -r1.33 thread_private.h
> --- include/thread_private.h  5 Dec 2017 13:45:31 -   1.33
> +++ include/thread_private.h  19 Dec 2018 10:18:38 -
> @@ -7,7 +7,7 @@
>  
>  #include/* for FILE and __isthreaded */
>  
> -#define _MALLOC_MUTEXES 4
> +#define _MALLOC_MUTEXES 32
>  void _malloc_init(int);
>  #ifdef __LIBC__
>  PROTO_NORMAL(_malloc_init);
> Index: stdlib/malloc.c
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> retrieving revision 1.257
> diff -u -p -r1.257 malloc.c
> --- stdlib/malloc.c   10 Dec 2018 07:57:49 -  1.257
> +++ stdlib/malloc.c   19 Dec 2018 10:18:38 -
> @@ -143,6 +143,8 @@ struct dir_info {
>   size_t cheap_reallocs;
>   size_t malloc_used; /* bytes allocated */
>   size_t malloc_guarded;  /* bytes used for guards */
> + size_t pool_searches;   /* searches for pool */
> + size_t other_pool;  /* searches in other pool */
>  #define STATS_ADD(x,y)   ((x) += (y))
>  #define STATS_SUB(x,y)   ((x) -= (y))
>  #define STATS_INC(x) ((x)++)
> @@ -179,7 +181,9 @@ struct chunk_info {
>  };
>  
>  struct malloc_readonly {
> - struct dir_info *malloc_pool[_MALLOC_MUTEXES];  /* Main bookkeeping 
> information */
> + /* Main bookkeeping information */
> + struct dir_info *malloc_pool[_MALLOC_MUTEXES];
> + u_int   malloc_mutexes; /* how much in actual use? */
>   int malloc_mt;  /* multi-threaded mode? */
>   int malloc_freecheck;   /* Extensive double free check */
>   int malloc_freeunmap;   /* mprotect free pages PROT_NONE? */
> @@ -267,7 +271,7 @@ getpool(void)
>   return mopts.malloc_pool[0];
>   else
>   return mopts.malloc_pool[TIB_GET()->tib_tid &
> - (_MALLOC_MUTEXES - 1)];
> + (mopts.malloc_mutexes - 1)];
>  }
>  
>  static __dead void
> @@ -316,6 +320,16 @@ static void
>  omalloc_parseopt(char opt)
>  {
>   switch (opt) {
> + case '+':
> + mopts.malloc_mutexes <<= 1;
> + if (mopts.malloc_mutexes > _MALLOC_MUTEXES)
> + mopts.malloc_mutexes = _MALLOC_MUTEXES;
> + break;
> + case '-':
> + mopts.malloc_mutexes >>= 1;
> + if (mopts.malloc_mutexes < 1)
> + mopts.malloc_mutexes = 1;
> + break;
>   case '>':
>   mopts.malloc_cache <<= 1;
>   if (mopts.malloc_cache > MALLOC_MAXCACHE)
> @@ -395,6 +409,7 @@ omalloc_init(void)
>   /*
>* Default options
>*/
> + mopts.malloc_mutexes = 4;
>   mopts.malloc_junk = 1;
>   mopts.malloc_cache = MALLOC_DEFAULT_CACHE;
>  
> @@ -485,7 +500,7 @@ 

Re: Request for testing malloc and multi-threaded applications

2018-12-19 Thread Otto Moerbeek
On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote:

> Hi,
> 
> This diff implements a more flexible approach for the number of pools
> malloc uses in the multi-threaded case. At the momemt I do not intend
> to commit this as-is, I first need this to get some feedback on what
> the proper default should be.
> 
> Currently the number of pools is fixed at 4. More pools mean less
> contention for allocations, but free becomes more expensive since a
> thread might need to check other pools increasing contention.
> 
> I'd like to know how this diff behaves using your favorite
> mutli-threaded application. Often this will be a web-browser I guess.
> 
> Test instructions:
> 
> 0. Make sure you are running current.
> 
> 1. Do a baseline test of your application.
> 
> 2. Apply diff, build and install userland.
> 
> 3. Run your test application with MALLOC_OPTIONS=value, where value is: 
> "", +, -, ++, -- and +++.
> 
> e.g. 
> 
>   MALLOC_OPTIONS=++ chrome
> 
> Note performance. Do multiple tests to get better statistics.
> 
> If you're not able to do full tests, at least general observations are
> welcome. Tell a bit about the system you tested on (e.g. number of
> cores). Note that due to randomization, different runs might show
> different performance numbers since the pools shared by subsets of
> threads can turn out differently.
> 
> Thanks,
> 
>   -Otto

New diff with problem noted by  Janne Johansson fixed.

Index: include/thread_private.h
===
RCS file: /cvs/src/lib/libc/include/thread_private.h,v
retrieving revision 1.33
diff -u -p -r1.33 thread_private.h
--- include/thread_private.h5 Dec 2017 13:45:31 -   1.33
+++ include/thread_private.h19 Dec 2018 10:18:38 -
@@ -7,7 +7,7 @@
 
 #include  /* for FILE and __isthreaded */
 
-#define _MALLOC_MUTEXES 4
+#define _MALLOC_MUTEXES 32
 void _malloc_init(int);
 #ifdef __LIBC__
 PROTO_NORMAL(_malloc_init);
Index: stdlib/malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.257
diff -u -p -r1.257 malloc.c
--- stdlib/malloc.c 10 Dec 2018 07:57:49 -  1.257
+++ stdlib/malloc.c 19 Dec 2018 10:18:38 -
@@ -143,6 +143,8 @@ struct dir_info {
size_t cheap_reallocs;
size_t malloc_used; /* bytes allocated */
size_t malloc_guarded;  /* bytes used for guards */
+   size_t pool_searches;   /* searches for pool */
+   size_t other_pool;  /* searches in other pool */
 #define STATS_ADD(x,y) ((x) += (y))
 #define STATS_SUB(x,y) ((x) -= (y))
 #define STATS_INC(x)   ((x)++)
@@ -179,7 +181,9 @@ struct chunk_info {
 };
 
 struct malloc_readonly {
-   struct dir_info *malloc_pool[_MALLOC_MUTEXES];  /* Main bookkeeping 
information */
+   /* Main bookkeeping information */
+   struct dir_info *malloc_pool[_MALLOC_MUTEXES];
+   u_int   malloc_mutexes; /* how much in actual use? */
int malloc_mt;  /* multi-threaded mode? */
int malloc_freecheck;   /* Extensive double free check */
int malloc_freeunmap;   /* mprotect free pages PROT_NONE? */
@@ -267,7 +271,7 @@ getpool(void)
return mopts.malloc_pool[0];
else
return mopts.malloc_pool[TIB_GET()->tib_tid &
-   (_MALLOC_MUTEXES - 1)];
+   (mopts.malloc_mutexes - 1)];
 }
 
 static __dead void
@@ -316,6 +320,16 @@ static void
 omalloc_parseopt(char opt)
 {
switch (opt) {
+   case '+':
+   mopts.malloc_mutexes <<= 1;
+   if (mopts.malloc_mutexes > _MALLOC_MUTEXES)
+   mopts.malloc_mutexes = _MALLOC_MUTEXES;
+   break;
+   case '-':
+   mopts.malloc_mutexes >>= 1;
+   if (mopts.malloc_mutexes < 1)
+   mopts.malloc_mutexes = 1;
+   break;
case '>':
mopts.malloc_cache <<= 1;
if (mopts.malloc_cache > MALLOC_MAXCACHE)
@@ -395,6 +409,7 @@ omalloc_init(void)
/*
 * Default options
 */
+   mopts.malloc_mutexes = 4;
mopts.malloc_junk = 1;
mopts.malloc_cache = MALLOC_DEFAULT_CACHE;
 
@@ -485,7 +500,7 @@ omalloc_poolinit(struct dir_info **dp)
for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
LIST_INIT(>chunk_dir[i][j]);
}
-   STATS_ADD(d->malloc_used, regioninfo_size);
+   STATS_ADD(d->malloc_used, regioninfo_size + 3 * MALLOC_PAGESIZE);
d->canary1 = mopts.malloc_canary ^ (u_int32_t)(uintptr_t)d;
d->canary2 = ~d->canary1;
 
@@ -1196,7 +1211,7 @@ _malloc_init(int from_rthreads)
if (!mopts.malloc_canary)
omalloc_init();
 
-   max = from_rthreads ? _MALLOC_MUTEXES : 1;
+   max = 

Request for testing malloc and multi-threaded applications

2018-12-19 Thread Otto Moerbeek
Hi,

This diff implements a more flexible approach for the number of pools
malloc uses in the multi-threaded case. At the momemt I do not intend
to commit this as-is, I first need this to get some feedback on what
the proper default should be.

Currently the number of pools is fixed at 4. More pools mean less
contention for allocations, but free becomes more expensive since a
thread might need to check other pools increasing contention.

I'd like to know how this diff behaves using your favorite
mutli-threaded application. Often this will be a web-browser I guess.

Test instructions:

0. Make sure you are running current.

1. Do a baseline test of your application.

2. Apply diff, build and install userland.

3. Run your test application with MALLOC_OPTIONS=value, where value is: 
"", +, -, ++, -- and +++.

e.g. 

MALLOC_OPTIONS=++ chrome

Note performance. Do multiple tests to get better statistics.

If you're not able to do full tests, at least general observations are
welcome. Tell a bit about the system you tested on (e.g. number of
cores). Note that due to randomization, different runs might show
different performance numbers since the pools shared by subsets of
threads can turn out differently.

Thanks,

-Otto

Index: include/thread_private.h
===
RCS file: /cvs/src/lib/libc/include/thread_private.h,v
retrieving revision 1.33
diff -u -p -r1.33 thread_private.h
--- include/thread_private.h5 Dec 2017 13:45:31 -   1.33
+++ include/thread_private.h19 Dec 2018 06:52:07 -
@@ -7,7 +7,7 @@
 
 #include  /* for FILE and __isthreaded */
 
-#define _MALLOC_MUTEXES 4
+#define _MALLOC_MUTEXES 32
 void _malloc_init(int);
 #ifdef __LIBC__
 PROTO_NORMAL(_malloc_init);
Index: stdlib/malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.257
diff -u -p -r1.257 malloc.c
--- stdlib/malloc.c 10 Dec 2018 07:57:49 -  1.257
+++ stdlib/malloc.c 19 Dec 2018 06:52:07 -
@@ -143,6 +143,8 @@ struct dir_info {
size_t cheap_reallocs;
size_t malloc_used; /* bytes allocated */
size_t malloc_guarded;  /* bytes used for guards */
+   size_t pool_searches;   /* searches for pool */
+   size_t other_pool;  /* searches in other pool */
 #define STATS_ADD(x,y) ((x) += (y))
 #define STATS_SUB(x,y) ((x) -= (y))
 #define STATS_INC(x)   ((x)++)
@@ -179,7 +181,9 @@ struct chunk_info {
 };
 
 struct malloc_readonly {
-   struct dir_info *malloc_pool[_MALLOC_MUTEXES];  /* Main bookkeeping 
information */
+   /* Main bookkeeping information */
+   struct dir_info *malloc_pool[_MALLOC_MUTEXES];
+   u_int   malloc_mutexes; /* how much in actual use? */
int malloc_mt;  /* multi-threaded mode? */
int malloc_freecheck;   /* Extensive double free check */
int malloc_freeunmap;   /* mprotect free pages PROT_NONE? */
@@ -267,7 +271,7 @@ getpool(void)
return mopts.malloc_pool[0];
else
return mopts.malloc_pool[TIB_GET()->tib_tid &
-   (_MALLOC_MUTEXES - 1)];
+   (mopts.malloc_mutexes - 1)];
 }
 
 static __dead void
@@ -316,6 +320,16 @@ static void
 omalloc_parseopt(char opt)
 {
switch (opt) {
+   case '+':
+   mopts.malloc_mutexes <<= 1;
+   if (mopts.malloc_mutexes > _MALLOC_MUTEXES)
+   mopts.malloc_mutexes = _MALLOC_MUTEXES;
+   break;
+   case '-':
+   mopts.malloc_mutexes >>= 1;
+   if (mopts.malloc_mutexes < 1)
+   mopts.malloc_mutexes = 1;
+   break;
case '>':
mopts.malloc_cache <<= 1;
if (mopts.malloc_cache > MALLOC_MAXCACHE)
@@ -395,6 +409,7 @@ omalloc_init(void)
/*
 * Default options
 */
+   mopts.malloc_mutexes = 4;
mopts.malloc_junk = 1;
mopts.malloc_cache = MALLOC_DEFAULT_CACHE;
 
@@ -485,7 +500,7 @@ omalloc_poolinit(struct dir_info **dp)
for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
LIST_INIT(>chunk_dir[i][j]);
}
-   STATS_ADD(d->malloc_used, regioninfo_size);
+   STATS_ADD(d->malloc_used, regioninfo_size + 3 * MALLOC_PAGESIZE);
d->canary1 = mopts.malloc_canary ^ (u_int32_t)(uintptr_t)d;
d->canary2 = ~d->canary1;
 
@@ -1196,7 +1211,7 @@ _malloc_init(int from_rthreads)
if (!mopts.malloc_canary)
omalloc_init();
 
-   max = from_rthreads ? _MALLOC_MUTEXES : 1;
+   max = from_rthreads ? mopts.malloc_mutexes : 1;
if (((uintptr_t)_readonly & MALLOC_PAGEMASK) == 0)
mprotect(_readonly, sizeof(malloc_readonly),
PROT_READ | PROT_WRITE);