Re: parallel cost-centre profiling

2010-07-01 Thread Simon Marlow

On 23/06/2010 18:44, Henrique Ferreiro wrote:

2010/6/15 Simon Marlowmarlo...@gmail.com:

On 15/06/2010 16:28, Henrique Ferreiro wrote:


I got the most important pieces working (I think). The question now
is, are you interested in this?


It all depends on whether other people would find it useful or not.  At the
moment I'm not convinced that profiling each Capability separately will give
results that are useful, because the assignment of work to Capabilities is
quite arbitrary and will change from run to run.  The only way to get
meaningful results would be to use forkOnIO; this won't be very useful for
par/pseq or Strategies.


I've been thinking about this and came to the conclusion that
profiling per capability isn't only useless but wrong as it might
build stacks of completely unrelated code.

You said before that per-thread profiling is much harder. So, what
would be the problem of saving and restoring the profiling information
each time a thread is run?


So one option is to make a new CCS root for each thread, and that way 
each thread would end up creating its own tree of CCSs.  That would seem 
to work nicely - you get per-thread stacks almost for free. 
Unfortunately it's not really per-thread profiling, because if one 
thread happens to evaluate a thunk created by another thread then the 
costs of doing so would be attributed to the thread that created the 
thunk (maybe that's what you want, and maybe it's consistent with the 
CCS view of the world, I'm not sure).  This also means you still need to 
lock access to the CCS structures, because two threads might be 
accessing the same one simultaneously.


Do you really want per-thread profiling, anyway?  What happens when 
there are thousands of threads?



I didn't know about this. I've done some really small tests and the
overhead of the HPC system seems lower the the one from the profiling
system. The problem I see is that it may have to be changed a lot to
comply with the cost centre semantics (subsuming costs and the like).


Well, it would be a completely different cost semantics.  Whether that's 
good or bad I can't say.


Cheers,
Simon

___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


Re: parallel cost-centre profiling

2010-07-01 Thread Henrique Ferreiro
2010/7/1 Simon Marlow marlo...@gmail.com:
 So one option is to make a new CCS root for each thread, and that way each
 thread would end up creating its own tree of CCSs.  That would seem to work
 nicely - you get per-thread stacks almost for free. Unfortunately it's not
 really per-thread profiling, because if one thread happens to evaluate a
 thunk created by another thread then the costs of doing so would be
 attributed to the thread that created the thunk (maybe that's what you want,
 and maybe it's consistent with the CCS view of the world, I'm not sure).
  This also means you still need to lock access to the CCS structures,
 because two threads might be accessing the same one simultaneously.

Yes, I was thinking about the new CCS root you mention. To solve the
sharing problem I was thinking about storing CCS ids instead of
pointers and make each thread allocate a new CCS each time it finds a
new id. I think it shouldn't be much work as there shouldn't be that
many shared CCSs.

 Do you really want per-thread profiling, anyway?  What happens when there
 are thousands of threads?

Anyway I think I can provide valuable information by doing offline
analysis of the data or hiding useless information. Also, integration
with threadscope of similar tools will only look at the information
relevant to what is being displayed, in the same way you use different
zoom levels.

 I didn't know about this. I've done some really small tests and the
 overhead of the HPC system seems lower the the one from the profiling
 system. The problem I see is that it may have to be changed a lot to
 comply with the cost centre semantics (subsuming costs and the like).

 Well, it would be a completely different cost semantics.  Whether that's
 good or bad I can't say.

 Cheers,
        Simon


___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


Re: parallel cost-centre profiling

2010-06-23 Thread Henrique Ferreiro
2010/6/15 Simon Marlow marlo...@gmail.com:
 On 15/06/2010 16:28, Henrique Ferreiro wrote:

 I got the most important pieces working (I think). The question now
 is, are you interested in this?

 It all depends on whether other people would find it useful or not.  At the
 moment I'm not convinced that profiling each Capability separately will give
 results that are useful, because the assignment of work to Capabilities is
 quite arbitrary and will change from run to run.  The only way to get
 meaningful results would be to use forkOnIO; this won't be very useful for
 par/pseq or Strategies.

I've been thinking about this and came to the conclusion that
profiling per capability isn't only useless but wrong as it might
build stacks of completely unrelated code.

You said before that per-thread profiling is much harder. So, what
would be the problem of saving and restoring the profiling information
each time a thread is run?

 So the question really is: what information do you want out?  I can see a
 use for just doing ordinary profiling on parallel programs, although the
 overheads of profiling may well get in the way of getting useful information
 out.

I definitely want per-thread profiling information. I also wanted to
do event logging at each interval so I get timing information.

 One thing you could do is to use HPC for profiling.  The idea would be to
 record the last tick, and sample the thread/tick at each interval. Then you
 get a per-thread profile, but without the stack information of the
 cost-centre profiler.  Perhaps the overhead of this might be too high
 though.

I didn't know about this. I've done some really small tests and the
overhead of the HPC system seems lower the the one from the profiling
system. The problem I see is that it may have to be changed a lot to
comply with the cost centre semantics (subsuming costs and the like).

 Cheers,
        Simon


___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


Re: parallel cost-centre profiling

2010-06-15 Thread Simon Marlow

On 14/06/2010 08:39, Henrique Ferreiro wrote:

Sorry for the late replay, I thought I made it work but I am still
fighting with it.


Do you really need to do this?  Why not share the stacks and use a mutex to
protect the operations?


My idea is to allow for profiling of each capability, so I need to
keep the stacks independent of each other. Otherwise, we would get the
same output as if it had been run sequentially.


I'm not sure it's useful to profile each Capability separately.  Threads 
migrate between Capabilities under the control of the runtime system, so 
you won't get the same results from run to run.


Perhaps what you really wanted was per-thread profiling?  But that's 
much harder -you'd need per-thread stacks.  It would be a big change to 
the profiling system, and I'm not really sure whether the benefit is 
worth it.


Cheers,
Simon

___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


Re: parallel cost-centre profiling

2010-06-15 Thread Simon Marlow

On 15/06/2010 16:28, Henrique Ferreiro wrote:


I got the most important pieces working (I think). The question now
is, are you interested in this?


It all depends on whether other people would find it useful or not.  At 
the moment I'm not convinced that profiling each Capability separately 
will give results that are useful, because the assignment of work to 
Capabilities is quite arbitrary and will change from run to run.  The 
only way to get meaningful results would be to use forkOnIO; this won't 
be very useful for par/pseq or Strategies.


So the question really is: what information do you want out?  I can see 
a use for just doing ordinary profiling on parallel programs, although 
the overheads of profiling may well get in the way of getting useful 
information out.


One thing you could do is to use HPC for profiling.  The idea would be 
to record the last tick, and sample the thread/tick at each interval. 
Then you get a per-thread profile, but without the stack information of 
the cost-centre profiler.  Perhaps the overhead of this might be too 
high though.


Cheers,
Simon

___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


Re: parallel cost-centre profiling

2010-06-14 Thread Henrique Ferreiro
Sorry for the late replay, I thought I made it work but I am still
fighting with it.

 Do you really need to do this?  Why not share the stacks and use a mutex to
 protect the operations?

My idea is to allow for profiling of each capability, so I need to
keep the stacks independent of each other. Otherwise, we would get the
same output as if it had been run sequentially.

 Cheers,
        Simon

___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


Re: parallel cost-centre profiling

2010-06-10 Thread Henrique Ferreiro
Hi!

I manage to make it work with one capability. That is, now the CCCS is
stored per capability but I can't enable parallel profiling because
the stacks are still shared.

Now I wanted to have one instance of each stack and cost centre per
capability, so that I can track the behaviour of each capability
independently. Doing that is easy in the runtime system but I found
out that I also have to change the code generator in the same way. I
need emitCostCentreDecl and costCentreStackDecl to declare an
array of structs instead of a single struct. Can someone help me with
this?

2010/6/4 Henrique Ferreiro hferre...@udc.es:
 Hi again!

 I need some help with this.

 2010/3/17 Simon Marlow marlo...@gmail.com:
 On 16/03/2010 19:34, Henrique Ferreiro wrote:

 Hello!

 I am trying to make cost centre profiling work in the threaded rts
 build in order to use that information to better understand parallel
 behaviour.

 Currently I am learning about the internals of GHC and I am thinking
 about how this could be done. The main blocker is that the current
 cost centre stack is a shared global variable. The simplest solution I
 came up with is to convert it to a thread local variable. The problem
 would be how to access it from the global timer.

 Yes, basically what you want to do is put CCCS into the StgRegs structure,
 which will make it thread-local.  In the timer signal you want to bump the
 counters for the CCCS on each Capability - so just iterate through the array
 of Capabilities and bump each one.

 I tried adding a new register CCCS to StgRegTable in stg/Regs.h but
 there is too much hidden knowledge in the code and it isn't working.

 I got to the point where I have changed every reference to the global
 variable to this register. The problem is that it isn't getting
 updated. Debugging a bit it seems that the register is used as it
 should (the calls to PushCostCentre and AppendCCS from the generated
 code use the correct value in CCCS) but it isn't getting stored in the
 register table because every time the timer is called, the value
 stored is CCS_SYSTEM, the one used in initialization.

 I tried to mimic how the other registers are implemented but there is
 no documentation at all so I wasn't sure what was exactly required.

 Could someone tell where exactly I have to change things or should I
 post my changes and ask about specific details?


___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


Re: parallel cost-centre profiling

2010-06-10 Thread Simon Marlow

On 10/06/10 18:06, Henrique Ferreiro wrote:

Hi!

I manage to make it work with one capability. That is, now the CCCS is
stored per capability but I can't enable parallel profiling because
the stacks are still shared.

Now I wanted to have one instance of each stack and cost centre per
capability, so that I can track the behaviour of each capability
independently. Doing that is easy in the runtime system but I found
out that I also have to change the code generator in the same way. I
need emitCostCentreDecl and costCentreStackDecl to declare an
array of structs instead of a single struct. Can someone help me with
this?


Do you really need to do this?  Why not share the stacks and use a mutex 
to protect the operations?


Cheers,
Simon



2010/6/4 Henrique Ferreirohferre...@udc.es:

Hi again!

I need some help with this.

2010/3/17 Simon Marlowmarlo...@gmail.com:

On 16/03/2010 19:34, Henrique Ferreiro wrote:


Hello!

I am trying to make cost centre profiling work in the threaded rts
build in order to use that information to better understand parallel
behaviour.

Currently I am learning about the internals of GHC and I am thinking
about how this could be done. The main blocker is that the current
cost centre stack is a shared global variable. The simplest solution I
came up with is to convert it to a thread local variable. The problem
would be how to access it from the global timer.


Yes, basically what you want to do is put CCCS into the StgRegs structure,
which will make it thread-local.  In the timer signal you want to bump the
counters for the CCCS on each Capability - so just iterate through the array
of Capabilities and bump each one.


I tried adding a new register CCCS to StgRegTable in stg/Regs.h but
there is too much hidden knowledge in the code and it isn't working.

I got to the point where I have changed every reference to the global
variable to this register. The problem is that it isn't getting
updated. Debugging a bit it seems that the register is used as it
should (the calls to PushCostCentre and AppendCCS from the generated
code use the correct value in CCCS) but it isn't getting stored in the
register table because every time the timer is called, the value
stored is CCS_SYSTEM, the one used in initialization.

I tried to mimic how the other registers are implemented but there is
no documentation at all so I wasn't sure what was exactly required.

Could someone tell where exactly I have to change things or should I
post my changes and ask about specific details?



___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


Re: parallel cost-centre profiling

2010-06-04 Thread Henrique Ferreiro
Hi again!

I need some help with this.

2010/3/17 Simon Marlow marlo...@gmail.com:
 On 16/03/2010 19:34, Henrique Ferreiro wrote:

 Hello!

 I am trying to make cost centre profiling work in the threaded rts
 build in order to use that information to better understand parallel
 behaviour.

 Currently I am learning about the internals of GHC and I am thinking
 about how this could be done. The main blocker is that the current
 cost centre stack is a shared global variable. The simplest solution I
 came up with is to convert it to a thread local variable. The problem
 would be how to access it from the global timer.

 Yes, basically what you want to do is put CCCS into the StgRegs structure,
 which will make it thread-local.  In the timer signal you want to bump the
 counters for the CCCS on each Capability - so just iterate through the array
 of Capabilities and bump each one.

I tried adding a new register CCCS to StgRegTable in stg/Regs.h but
there is too much hidden knowledge in the code and it isn't working.

I got to the point where I have changed every reference to the global
variable to this register. The problem is that it isn't getting
updated. Debugging a bit it seems that the register is used as it
should (the calls to PushCostCentre and AppendCCS from the generated
code use the correct value in CCCS) but it isn't getting stored in the
register table because every time the timer is called, the value
stored is CCS_SYSTEM, the one used in initialization.

I tried to mimic how the other registers are implemented but there is
no documentation at all so I wasn't sure what was exactly required.

Could someone tell where exactly I have to change things or should I
post my changes and ask about specific details?

___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


Re: parallel cost-centre profiling

2010-03-17 Thread Simon Marlow

On 16/03/2010 19:34, Henrique Ferreiro wrote:

Hello!

I am trying to make cost centre profiling work in the threaded rts
build in order to use that information to better understand parallel
behaviour.

Currently I am learning about the internals of GHC and I am thinking
about how this could be done. The main blocker is that the current
cost centre stack is a shared global variable. The simplest solution I
came up with is to convert it to a thread local variable. The problem
would be how to access it from the global timer.


Yes, basically what you want to do is put CCCS into the StgRegs 
structure, which will make it thread-local.  In the timer signal you 
want to bump the counters for the CCCS on each Capability - so just 
iterate through the array of Capabilities and bump each one.


I'm sure this isn't all that needs to be done, though.  The cost-center 
stack data structure probably needs some locks; perhaps one global lock 
will do for a start.  The danger here is that contention in the 
profiling subsystem will obscure the real profilng results you were 
trying to obtain.


Keep us posted!

Cheers,
Simon


As I don't have the full picture yet, I would greatly appreciate if
some of you could give me some advice on how to tackle this problem.
Would it be possible to use thread local storage for this? Is there a
better design?


___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc