RE: [Caml-list] SMP multithreading

2010-11-20 Thread Jon Harrop
 This is actually a quick way to use multiple cores with ocaml. Find a
 often called function that takes considerable time and offload it to C

Or HLVM, F#, Scala, Clojure or any of the other languages that permit shared
memory parallelism. C is particularly poor in this regard so I would not
just restrict yourself to C...

Cheers,
Jon.


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-20 Thread Goswin von Brederlow
Jon Harrop jonathandeanhar...@googlemail.com writes:

 This is actually a quick way to use multiple cores with ocaml. Find a
 often called function that takes considerable time and offload it to C

 Or HLVM, F#, Scala, Clojure or any of the other languages that permit shared
 memory parallelism. C is particularly poor in this regard so I would not
 just restrict yourself to C...

 Cheers,
 Jon.

I'm not talking about any shared memory parallelism here. The
parallelism is completly restricted to the ocaml side. You just find
some single threaded job that takes long, rewrite it as external
function and release the ocaml lock while it is running.

For example in my code I compute the sha256 sum of a block of
data. Since I use a C library for sha256 anyway the function is already
external. All I had to do was switch the interface from using string to
Bigarray and add enter/leave_blocking_section(). After that multiple
threads can compute the sha256 sum for blocks of data in parallel and my
code run 3.7 times faster with 4 cores.

MfG
Goswin


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-19 Thread Christophe TROESTLER
On Thu, 18 Nov 2010 00:08:19 +0100, Christophe Raffalli wrote:
 
 And OCaml on GPU ? We just tested a recent GPU card with 480
 processors at 900Mhz ... this is qui impressive ... and supported by
 matlab via cuda-lapack (http://www.culatools.com/) ...  I imagine we
 could at least use cuda-lapack from OCaml ?

This is certainly possible since they say that the standard LAPACK
functions are available.  If you try, let us know!

Best,
C.

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-19 Thread Eray Ozkural
There seem to be solutions in theory. I think a colleague had pointed out
one of the papers below, so there is indeed something like a lock-free
garbage collector. Then, why do we worry about much synchronization
overhead? I don't quite understand.

Maurice 
Herlihyhttp://www.informatik.uni-trier.de/~ley/db/indices/a-tree/h/Herlihy:Maurice.html,
J. Eliot B. Moss: Lock-Free Garbage Collection for Multiprocessors. IEEE
Trans. Parallel Distrib. Syst.
3http://www.informatik.uni-trier.de/~ley/db/journals/tpds/tpds3.html#HerlihyM92(3):
304-311 (1992)
http://doi.ieeecomputersociety.org/10.1109/71.139204

Hui Gao, Jan Friso
Grootehttp://www.informatik.uni-trier.de/~ley/db/indices/a-tree/g/Groote:Jan_Friso.html
, Wim H. 
Hesselinkhttp://www.informatik.uni-trier.de/~ley/db/indices/a-tree/h/Hesselink:Wim_H=.html:
Lock-free parallel and concurrent garbage collection by marksweep. Sci.
Comput. Program.
64http://www.informatik.uni-trier.de/~ley/db/journals/scp/scp64.html#GaoGH07(3):
341-374 (2007)
http://portal.acm.org/citation.cfm?id=1223239

Java's new garbage collector is lock-free? At any rate, we really needn't
fall behind a mega-lame language like Java :) The first paper is from 1992,
enough time for the knowledge to diffuse. The second 2007 paper is probably
what Jon was referring to earlier.

In my mind, you can use one of these, and use special pool allocation
algorithms for small objects, and also use static lifetime analysis to
bypass the garbage collection in many cases. Since there are many runtime
designers here, I wonder, is there a language runtime that does all three of
these?

Cheers,

Eray
  http://doi.ieeecomputersociety.org/10.1109/71.139204

On Wed, Nov 17, 2010 at 6:34 PM, David Allsopp dra-n...@metastack.comwrote:

 Edgar Friendly wrote:
  It looks like high-performance computing of the near future will be built
  out of many machines (message passing), each with many cores (SMP).  One
  could use message passing for all communication in such a system, but a
  hybrid approach might be best for this architecture, with use of shared
  memory within each box and message passing between.  Of course the best
  choice depends strongly on the particular task.

 Absolutely - and the problem in OCaml seems to be that shared memory
 parallelism is just branded as evil and ignored...

  In the long run, it'll likely be a combination of a few large, powerful
  cores (Intel-CPU style w/ the capability to run a single thread as fast
 as
  possible) with many many smaller compute engines (GPGPUs or the like,
  optimized for power and area, closely coupled with memory) that provides
  the highest performance density.

 I think the central thing that we can be utterly sure about is that
 desktops will always have * 1* general purpose CPU. Maybe not be an
 ever-increasing number of cores but definitely more than one.

  The question of how to program such an architecture seems as if it's
 being
  answered without the functional community's input. What can we
 contribute?

 It has often seemed to me when SMP has been discussed in the past on this
 list that it almost gets dismissed out of hand because it doesn't look
 future-proof or because we're worried about what's round the corner in
 technology terms.

 To me the principal question is not about whether a parallel/thread-safe GC
 will scale to 12, 16 or even the 2048 cores on something like
 http://www.hpc.cam.ac.uk/services/darwin.html but whether it will hurt a
 single-threaded application - i.e. whether you will still be able to
 implement message passing libraries and other scalable techniques without
 the parallel GC getting in the way of what you're doing. A
 parallel/thread-safe GC should be aiming to provide the same sort of
 contract as the present one - it just works for most things and in a few
 borderline cases (like HPC - yes, it's a borderline case) you'll need to
 tune your code or tweak GC parameters because it's causing some problems or
 because in your particular application squeezing every cycle out of the CPU
 is important. As long as the GC isn't (hugely) slower than the present one
 in OCaml then we can continue to use libraries, frameworks and
 technologies-still-to-come on top of a parallel/thread-safe GC which simply
 ignores shared memory thread-level parallelism just by not instantiating
 threads.

 The argument always seems to focus on utterly maxing out all possible
 available resources (CPU time, memory bandwidth, etc.) rather than on
 whether it's simply faster than what we're doing able to do at the moment on
 the same system. Of course, it may be that the only way to do that is to
 have different garbage collectors - one invoked when threads.cmxa is linked
 and then the normal one otherwise (that's so easy to type out as a sentence,
 summarising a vast amount of potential work!!)

 Multithreading in OCaml seems to be focused on jumping the entire width of
 the river of concurrency in one go, rather than coming up with stepping
 

Re: [Caml-list] SMP multithreading

2010-11-19 Thread Goswin von Brederlow
Christophe TROESTLER christophe.troestler+oc...@umh.ac.be writes:

 On Thu, 18 Nov 2010 00:08:19 +0100, Christophe Raffalli wrote:
 
 And OCaml on GPU ? We just tested a recent GPU card with 480
 processors at 900Mhz ... this is qui impressive ... and supported by
 matlab via cuda-lapack (http://www.culatools.com/) ...  I imagine we
 could at least use cuda-lapack from OCaml ?

 This is certainly possible since they say that the standard LAPACK
 functions are available.  If you try, let us know!

 Best,
 C.

And the functions should enter/leave_blocking_section() in the C stubs
so you can have 480 ocaml threads. All of them can run some lapack code
while always only one can run ocaml code at any one time. If the lapack
functions take long enough almost all threads will be running.


This is actually a quick way to use multiple cores with ocaml. Find a
often called function that takes considerable time and offload it to C
with enter/leave_blocking_section() around it. Isn't always possible and
you need to use BigArray for data or copy the arguments.

MfG
Goswin

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-18 Thread Eray Ozkural
Yes, actually. :P

On Wed, Nov 17, 2010 at 11:15 PM, Jon Harrop 
jonathandeanhar...@googlemail.com wrote:

 Can you cite any papers from this century? ;-)



 Cheers,

 Jon.



 *From:* Eray Ozkural [mailto:examach...@gmail.com]
 *Sent:* 17 November 2010 13:41
 *To:* Eray Ozkural; Jon Harrop; caml-list@yquem.inria.fr

 *Subject:* Re: [Caml-list] SMP multithreading



 On Wed, Nov 17, 2010 at 8:50 AM, Gabriel Kerneis kern...@pps.jussieu.fr
 wrote:

 On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote:
  As I said even in C good results can be achieved, I've seen that, so I
  know it's doable with ocaml, just a difficult kind of compiler. The
  functional features would expose more concurrency.

 Could you share a pointer to a paper describing this compiler?


 I can't reveal much, but just to point out that there are indeed more
 sophisticated compilers than gcc:
 http://www.research.ibm.com/vliw/compiler.html

 So, uh, there are compilers that turn loops into threads, and also
 parallelize independent blocks Both coarse-grain and fine-grain
 parallelization strategies in existing compiler research can be effectively
 applied to the multi-core architectures. In fact, some of the more advanced
 compilers (like that of the RAW architecture) must be able to target it
 already, but who knows. :) Just consider that most of the parallelization
 technology is language independent, they can be applied to any imperative
 language. So, would such a thing be able to work on ocaml generated
 binaries? Most definitely, I believe, it is in principle possible to start
 from the sequential binary and emit parallel code!

 Best,



 --
 Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
 http://groups.yahoo.com/group/ai-philosophy
 http://myspace.com/arizanesil http://myspace.com/malfunct




-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


RE: [Caml-list] SMP multithreading

2010-11-18 Thread David Allsopp
Edgar Friendly wrote:
 It looks like high-performance computing of the near future will be built
 out of many machines (message passing), each with many cores (SMP).  One
 could use message passing for all communication in such a system, but a
 hybrid approach might be best for this architecture, with use of shared
 memory within each box and message passing between.  Of course the best
 choice depends strongly on the particular task.

Absolutely - and the problem in OCaml seems to be that shared memory 
parallelism is just branded as evil and ignored...

 In the long run, it'll likely be a combination of a few large, powerful
 cores (Intel-CPU style w/ the capability to run a single thread as fast as
 possible) with many many smaller compute engines (GPGPUs or the like,
 optimized for power and area, closely coupled with memory) that provides
 the highest performance density.

I think the central thing that we can be utterly sure about is that desktops 
will always have * 1* general purpose CPU. Maybe not be an ever-increasing 
number of cores but definitely more than one.

 The question of how to program such an architecture seems as if it's being
 answered without the functional community's input. What can we contribute?

It has often seemed to me when SMP has been discussed in the past on this list 
that it almost gets dismissed out of hand because it doesn't look future-proof 
or because we're worried about what's round the corner in technology terms.

To me the principal question is not about whether a parallel/thread-safe GC 
will scale to 12, 16 or even the 2048 cores on something like 
http://www.hpc.cam.ac.uk/services/darwin.html but whether it will hurt a 
single-threaded application - i.e. whether you will still be able to implement 
message passing libraries and other scalable techniques without the parallel GC 
getting in the way of what you're doing. A parallel/thread-safe GC should be 
aiming to provide the same sort of contract as the present one - it just 
works for most things and in a few borderline cases (like HPC - yes, it's a 
borderline case) you'll need to tune your code or tweak GC parameters because 
it's causing some problems or because in your particular application squeezing 
every cycle out of the CPU is important. As long as the GC isn't (hugely) 
slower than the present one in OCaml then we can continue to use libraries, 
frameworks and technologies-still-to-come on top of a parallel/thread-safe GC 
which simply ignores shared memory thread-level parallelism just by not 
instantiating threads.

The argument always seems to focus on utterly maxing out all possible available 
resources (CPU time, memory bandwidth, etc.) rather than on whether it's simply 
faster than what we're doing able to do at the moment on the same system. Of 
course, it may be that the only way to do that is to have different garbage 
collectors - one invoked when threads.cmxa is linked and then the normal one 
otherwise (that's so easy to type out as a sentence, summarising a vast amount 
of potential work!!)

Multithreading in OCaml seems to be focused on jumping the entire width of the 
river of concurrency in one go, rather than coming up with stepping stones to 
cross it in bits...


David

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-18 Thread Christophe Raffalli

Hello,

And OCaml on GPU ? We just tested a recent GPU card with 480 processors
at 900Mhz ... this is
qui impressive ... and supported by matlab via cuda-lapack
(http://www.culatools.com/) ...
I imagine we could at least use cuda-lapack from OCaml ?

Cheers,
Christophe




signature.asc
Description: OpenPGP digital signature
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


RE: [Caml-list] SMP multithreading

2010-11-18 Thread Jon Harrop
Can you cite any papers from this century? ;-)

 

Cheers,

Jon.

 

From: Eray Ozkural [mailto:examach...@gmail.com] 
Sent: 17 November 2010 13:41
To: Eray Ozkural; Jon Harrop; caml-list@yquem.inria.fr
Subject: Re: [Caml-list] SMP multithreading

 

On Wed, Nov 17, 2010 at 8:50 AM, Gabriel Kerneis kern...@pps.jussieu.fr
wrote:

On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote:
 As I said even in C good results can be achieved, I've seen that, so I
 know it's doable with ocaml, just a difficult kind of compiler. The
 functional features would expose more concurrency.

Could you share a pointer to a paper describing this compiler?


I can't reveal much, but just to point out that there are indeed more
sophisticated compilers than gcc:
http://www.research.ibm.com/vliw/compiler.html  

So, uh, there are compilers that turn loops into threads, and also
parallelize independent blocks Both coarse-grain and fine-grain
parallelization strategies in existing compiler research can be effectively
applied to the multi-core architectures. In fact, some of the more advanced
compilers (like that of the RAW architecture) must be able to target it
already, but who knows. :) Just consider that most of the parallelization
technology is language independent, they can be applied to any imperative
language. So, would such a thing be able to work on ocaml generated
binaries? Most definitely, I believe, it is in principle possible to start
from the sequential binary and emit parallel code!

Best,



-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-18 Thread Eray Ozkural
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1650134

This is one of the more recent papers a quick search turns up, but you have
to keep in mind that thread extraction is only one problem among many for a
parallelizing compiler. I think the keyword you are looking for is thread
extraction. And here probably, it's the simplest kind of extraction... Food
for some thought: assume that you have a very good compiler pass that
extracts all possible threads in the sequential code, can you name any other
problems the compiler must solve to achieve good performance?

I can't talk at all about the project I worked on, but as I mentioned
previously, familiarize yourself with the RAW project, it was similar in
some respects to the project I worked in:
http://groups.csail.mit.edu/cag/raw/

This should, at least a bit, dispel the illusion that parallelizing
compilers are helpless when they confront C code. Reading the OS/400 book
had opened my mind about OS design, perhaps reading about recent computer
architecture research projects will open others' eyes about compilers, and
how useful they really can be!

Also, I believe there ought to be some textbooks about multi-core
architectures and relevant compilation strategies, let me post it if I find
a comprehensive reference.

The dream compiler would have all the cool linear algebra capabilities of
HPF + the more general/free-form kinds of parallelization strategies in
recent compilers.

Ok, so what you really want to do is, parallelize applications that can
benefit from them. Not file utils or web browsers. If you are so curious,
stuff like povray would be in the test suite. Sometimes the parallelizing
compiler parallelizes computations that a programmer wouldn't bother due to
program complexity, here a basic block, there a basic block, some pipelining
communication/computation overlap there I think it's a safe bet to say
that, with all the general lameness surrounding parallel programming
languages, parallelizing compilers will be very important in the near
future.

Cheers,

On Thu, Nov 18, 2010 at 2:28 AM, Eray Ozkural examach...@gmail.com wrote:

 Yes, actually. :P


 On Wed, Nov 17, 2010 at 11:15 PM, Jon Harrop 
 jonathandeanhar...@googlemail.com wrote:

 Can you cite any papers from this century? ;-)



 Cheers,

 Jon.



 *From:* Eray Ozkural [mailto:examach...@gmail.com]
 *Sent:* 17 November 2010 13:41
 *To:* Eray Ozkural; Jon Harrop; caml-list@yquem.inria.fr

 *Subject:* Re: [Caml-list] SMP multithreading



 On Wed, Nov 17, 2010 at 8:50 AM, Gabriel Kerneis kern...@pps.jussieu.fr
 wrote:

 On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote:
  As I said even in C good results can be achieved, I've seen that, so I
  know it's doable with ocaml, just a difficult kind of compiler. The
  functional features would expose more concurrency.

 Could you share a pointer to a paper describing this compiler?


 I can't reveal much, but just to point out that there are indeed more
 sophisticated compilers than gcc:
 http://www.research.ibm.com/vliw/compiler.html

 So, uh, there are compilers that turn loops into threads, and also
 parallelize independent blocks Both coarse-grain and fine-grain
 parallelization strategies in existing compiler research can be effectively
 applied to the multi-core architectures. In fact, some of the more advanced
 compilers (like that of the RAW architecture) must be able to target it
 already, but who knows. :) Just consider that most of the parallelization
 technology is language independent, they can be applied to any imperative
 language. So, would such a thing be able to work on ocaml generated
 binaries? Most definitely, I believe, it is in principle possible to start
 from the sequential binary and emit parallel code!

 Best,



 --
 Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
 http://groups.yahoo.com/group/ai-philosophy
 http://myspace.com/arizanesil http://myspace.com/malfunct




 --
 Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
 http://groups.yahoo.com/group/ai-philosophy
 http://myspace.com/arizanesil http://myspace.com/malfunct




-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-17 Thread Eray Ozkural
On Wed, Nov 17, 2010 at 8:50 AM, Gabriel Kerneis kern...@pps.jussieu.frwrote:

 On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote:
  As I said even in C good results can be achieved, I've seen that, so I
  know it's doable with ocaml, just a difficult kind of compiler. The
  functional features would expose more concurrency.

 Could you share a pointer to a paper describing this compiler?


I can't reveal much, but just to point out that there are indeed more
sophisticated compilers than gcc:
http://www.research.ibm.com/vliw/compiler.html

So, uh, there are compilers that turn loops into threads, and also
parallelize independent blocks Both coarse-grain and fine-grain
parallelization strategies in existing compiler research can be effectively
applied to the multi-core architectures. In fact, some of the more advanced
compilers (like that of the RAW architecture) must be able to target it
already, but who knows. :) Just consider that most of the parallelization
technology is language independent, they can be applied to any imperative
language. So, would such a thing be able to work on ocaml generated
binaries? Most definitely, I believe, it is in principle possible to start
from the sequential binary and emit parallel code!

Best,


-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-17 Thread Wolfgang Draxinger
Am Mon, 15 Nov 2010 22:05:52 +0100
schrieb Philippe Wang m...@philippewang.info:

 Take the current Apple Mac Pro for instance (I take this reference
 because it's easy to find and it doesn't evolve very often), with
 12-core configuration.
 - Two 2.93GHz 6-Core Intel Xeon “Westmere” (12 cores)
 - 1333MHz DDR3 ECC SDRAM (whatever the capacity)
 = with HT, there are 24 logical units, which all share a tiny
 bandwidth for CPU-RAM communications.
 Let's say bandwidth is about 2400MHz : 2400MHz/24Thread =
 100MHz/Thread. It's kind of ridiculous...

You're assuming that there'd be a lot of communication between cores
and RAM. Which is not (or should not) be the case in well written
multithreaded programs.

 OCaml is not (at least not yet) a language for HPC (high performance
 computing), it is very efficient (compared to so many other languages)
 and yet doesn't not take advantage of SMP. Well, sooner or later it
 will actually probably need to support SMP. (But somehow it already
 does, via C code boxed in blocking sections).

Which is a pitty, since especially functional languages could much
better parallelize tasks implicitly.

 Well, if you take casual OCaml programs, and put them on SMP
 architectures (on which indeed they often already are) while giving
 them capacity to take advantage of SMP (via POSIX-C threads in
 blocking sections, message-passing style, or OCaml-for-multicore, or
 whatever else), they quickly become less efficient because there is a
 bottleneck on the CPU-RAM bus.

Suppose you were to implement a convolution in n dimensions on a large
data set. This is a prime example of where multithreading can help and
where main-memory bandwidth is not the limiting factor. One can split
up the whole task in small tasklets dispatching them tho individual
cores. As long as the dataset, which are the payload data i.e.
input, convolution kernel (and output buffer if not in-place) plus code,
fit into the L1 cache everything will be executed on-cache. On current
Intel CPUs this are 32kB, AMD it's even 64kB -- per core!

And all the cores on the same die share L2 cache, which has far more
bandwidth, about an order of magnitude, than to system memory. Modern
OS schedulers thus try to keep together threads of the same process
on CPU dice in the system. And further group it by NUMA.

 I want to believe you're right to ask for SMP support, even if now I'm
 pretty convinced that current state of OCaml is not compatible with I
 want to write HPC programs in pure OCaml. (One should implement a
 brand new compiler maybe??)

This is not just about HPC but about resource utilization. A single
core running at full speed consumes far more power, than 4 cores,
clocked down to minimal frequency. Even worse only the most recent CPU
generations can clock cores individually. So a single core running at
full speed will significantly increase power consumption (and thermal
output).
 
 There are people studying how to have HPC with OCaml, but it has quite
 a little to do with SMP matters. Instead, it's more about (static or
 dynamic) specialized-code generation for GPUs etc. We'll see in some
 time what it produces...

For the time being I'm more interested in what's actually preventing
proper SMP in OCaml right now. I've read something about issues with
the garbage collector, which surprises be, as I switched over to use
Boehm-GC in my C programs to resolve problems in memory deallocation in
multithreaded programs -- this of course was possible only after
Boehm-GC became thread safe.


Wolfgang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-16 Thread Gerd Stolpmann
Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly:
 On 11/15/2010 09:27 AM, Wolfgang Draxinger wrote:
  Hi,
 
  I've just read
  http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html
  in particular this paragraph:
  | What about hyperthreading?  Well, I believe it's the last convulsive
  | movement of SMP's corpse :-)  We'll see how it goes market-wise.  At
  | any rate, the speedups announced for hyperthreading in the Pentium 4
  | are below a factor of 1.5; probably not enough to offset the overhead
  | of making the OCaml runtime system thread-safe.
 
  This reads just like the 640k ought be enough for everyone. Multicore
  systems are the standard today. Even the cheapest consumer machines
  come with at least two cores. Once can easily get 6 core machines today.
 
  Still thinking SMP was a niche and was dying?
 
  So, what're the developments regarding SMP multithreading OCaml?
 
 
  Cheers
 
  Wolfgang
 
 At the risk of feeding a (possibly unintentional) troll, I'd like to 
 share some possibly new thoughts on this ever-living topic.
 
 It looks like high-performance computing of the near future will be 
 built out of many machines (message passing), each with many cores 
 (SMP).  One could use message passing for all communication in such a 
 system, but a hybrid approach might be best for this architecture, with 
 use of shared memory within each box and message passing between.  Of 
 course the best choice depends strongly on the particular task.
 
 In the long run, it'll likely be a combination of a few large, powerful 
 cores (Intel-CPU style w/ the capability to run a single thread as fast 
 as possible) with many many smaller compute engines (GPGPUs or the like, 
 optimized for power and area, closely coupled with memory) that provides 
 the highest performance density.
 
 The question of how to program such an architecture seems as if it's 
 being answered without the functional community's input. What can we 
 contribute?

Yes, that's generally the right question. Current hardware is a kind of
experiment - vendors have only taken the multicore path because it is
right now the easiest way of improving the performance potential,
although it is questionable whether (non-server) applications can really
benefit from it (excluding here server apps because for these
parallelization is relatively easy to get). Future hardware will
probably be even more different - however, it is still unclear which
design paths will be taken. Could be manycores (many CPUs with
non-uniform RAM), could be specialized compute units. Maybe we'll see
again a separation of consumer and datacenter markets - the former
optimizing for numeric simulation applications (i.e. games), the latter
for high-throughput data paths and parallel CPU power. The problem here
is that this is all speculation.

There are some things we can do to improve the situation (and some ideas
are not realistic):

  * A probably not-so-difficult improvement would be better message
passing between independent but local processes. I've started an
experiment for such a mechanism

(http://projects.camlcity.org/projects/dl/ocamlnet-3.0.3/doc/html-main/Netcamlbox.html),
 which tries to exploit that GC-managed memory has a well-known structure. With 
more help from the GC this could be made even better (safer, fewer corner 
cases).
  * We need more frameworks for parallel programming. I'm currently
developing Plasma, a Map/Reduce framework. Using a framework has
the big advantage that the whole program is structured so it
profits from parallelization, and that it is possible to train
developers for it that have no idea about parallelization. There
are probably more algorithm schemes where this is possible.
  * I have a lot of doubts whether FP languages ever run well on SMP
with a bigger number of cores. The problem is the relatively
high memory allocation rate - the GC has to work a lot harder
than in imperative languages. The OC4MC project uses
thread-local minor heaps because of this. Probably this is not
enough, and one even needs thread-local major heaps (plus a
third generation for values accessed by several threads). All in
all you could get the same effect by instantiating the ocaml
runtime several times (if this were possible), let each runtime
run in its own thread, and provide some extra functionality for
passing values between threads and for sharing values. This
would not be exactly the SMP model, but would allow a number of
parallelization techniques, and is probably future-proof as it
encourages message passing over sharing. This is certainly worth
experimentation.
  * One can also tackle the problem from the multi-processing side:
Provide better mechanisms for message passing (see above) and

Re: [Caml-list] SMP multithreading

2010-11-16 Thread Norman Hardy

On 2010 Nov 15, at 22:46 , Edgar Friendly wrote:

 It looks like high-performance computing of the near future will be built out 
 of many machines (message passing), each with many cores (SMP).  One could 
 use message passing for all communication in such a system, but a hybrid 
 approach might be best for this architecture, with use of shared memory 
 within each box and message passing between.  Of course the best choice 
 depends strongly on the particular task.
 
 In the long run, it'll likely be a combination of a few large, powerful cores 
 (Intel-CPU style w/ the capability to run a single thread as fast as 
 possible) with many many smaller compute engines (GPGPUs or the like, 
 optimized for power and area, closely coupled with memory) that provides the 
 highest performance density.

OCaml code should be able to share immutable OCaml data with other processes 
just as it shares libraries.
See http://cap-lore.com/Software/pch.html .
Some of the ideas there might be improved with hardware support.

Admission: If I had read all of the interesting pointers given on this thread I 
would never finish sending this e-mail.
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-16 Thread Eray Ozkural
On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann i...@gerd-stolpmann.dewrote:

 Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly:
  * As somebody mentioned implicit parallelization: Don't expect
anything from this. Even if a good compiler finds ways to
parallelize 20% of the code (which would be a lot), the runtime
effect would be marginal. 80% of the code is run at normal speed
(hopefully) and dominates the runtime behavior. The point is
that such compiler-driven code improvements are only local
optimizations. For getting good parallelization results you need
to restructure the design of the program - well, maybe
compiler2.0 can do this at some time, but this is not in sight.


I think you are underestimating parallelizing compilers.


-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-16 Thread Gerd Stolpmann
Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural:
 
 
 On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann
 i...@gerd-stolpmann.de wrote:
 Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar
 Friendly: 
  * As somebody mentioned implicit parallelization: Don't
 expect
anything from this. Even if a good compiler finds ways
 to
parallelize 20% of the code (which would be a lot), the
 runtime
effect would be marginal. 80% of the code is run at
 normal speed
(hopefully) and dominates the runtime behavior. The
 point is
that such compiler-driven code improvements are only
 local
optimizations. For getting good parallelization results
 you need
to restructure the design of the program - well, maybe
compiler2.0 can do this at some time, but this is not
 in sight. 
 
 I think you are underestimating parallelizing compilers. 

I was more citing Amdahl's law, and did not want to criticize any effort
in this area. It's more the usefulness for the majority of problems. How
useful is the best parallelizing compiler if only a small part of the
program _can_ actually benefit from it? Think about it. If you are not
working in an area where many subroutines can be sped up, you consider
this way of parallelizing as a waste of time. And this is still true for
the majority of problems. Also, for many problems that can be tackled,
the scalability is very limited.

Gerd 
-- 

Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
g...@gerd-stolpmann.de  http://www.gerd-stolpmann.de
Phone: +49-6151-153855  Fax: +49-6151-997714


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-16 Thread Eray Ozkural
On Wed, Nov 17, 2010 at 12:13 AM, Gerd Stolpmann i...@gerd-stolpmann.dewrote:

 Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural:
 
 
  On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann
  i...@gerd-stolpmann.de wrote:
  Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar
  Friendly:
   * As somebody mentioned implicit parallelization: Don't
  expect
 anything from this. Even if a good compiler finds ways
  to
 parallelize 20% of the code (which would be a lot), the
  runtime
 effect would be marginal. 80% of the code is run at
  normal speed
 (hopefully) and dominates the runtime behavior. The
  point is
 that such compiler-driven code improvements are only
  local
 optimizations. For getting good parallelization results
  you need
 to restructure the design of the program - well, maybe
 compiler2.0 can do this at some time, but this is not
  in sight.
 
  I think you are underestimating parallelizing compilers.

 I was more citing Amdahl's law, and did not want to criticize any effort
 in this area. It's more the usefulness for the majority of problems. How
 useful is the best parallelizing compiler if only a small part of the
 program _can_ actually benefit from it? Think about it. If you are not
 working in an area where many subroutines can be sped up, you consider
 this way of parallelizing as a waste of time. And this is still true for
 the majority of problems. Also, for many problems that can be tackled,
 the scalability is very limited.



What makes you think only 20% of the code can be parallelized? I've worked
in such a compiler project, and there were way too many opportunities for
parallelization in ordinary C code, let alone a functional language;
implicit parallelism would work wonders there. Of course you may think
whatever you were thinking, but I know that a high degree of parallelism can
be achieved through a functional language. I can't tell you much more
though. If you think that a very small portion of the code can be
parallelized you probably do not appreciate the kinds of static and dynamic
analysis those compilers perform. Of course if you are thinking of applying
it to some office or e-mail application, this might not be the case, but
automatic parallelization strategies would work best when you apply them to
a computationally intensive program.

The really limiting factor for current functional languages would be their
reliance on inherently sequential primitives like list processing, which may
in some cases limit the compiler to only pipeline parallelism (something
non-trivial to do by hand actually). Instead they would have to get their
basic forms from high-level parallel PLs (which might mean rewriting a lot
of things). The programs could look like something out of a category theory
textbook then. But I think even without modification you could get a lot of
speedup from ocaml code by applying state-of-the-art automatic
parallelization techniques. By the way, how much of the serial work is
parallelized that's what matters rather than the code length that is. Even
if the parallelism is not so obvious, the analysis can find out which
iterations are parallelizable, which variables have which kinds of
dependencies, memory dependencies, etc. etc. In the public, there is this
perception that automatic parallelization does not work, which is wrong,
while it is true that the popular compiler vendors do not understand much in
this regard, all I've seen (in popular products) are lame attempts to
generate vector instructions

That is not to say, that implicit parallelization of current code can
replace parallel algorithm design (which is AI-complete). Rather, I think,
implicit parallelization is one of the things that will help parallel
computing people: by having them work at a more abstract level, exposing
more concurrency through functional forms, yet avoiding writing low-level
comms code. High-level explicit parallelism is also quite favorable, as
there may be many situations where, say, dynamic load-balancing approaches
are suitable. The approach of HPF might be relevant here, with the
programmer making annotations to guide the distribution of data structures,
and the rest inferred from code.

So whoever says that isn't possible, probably hasn't read up much in the
computer architecture community. Probably even expert PL researchers may be
misled here, as they have made the quoted remark or similar remarks about
multi-core/SMP  architectures. It was known for a very long time (20+ years)
that the clock speeds would hit a wall and then we'd have to expend more
area. This is true regardless of the underlying architecture/technology
actually. Mother nature is parallel. It is the sequence that is an
abstraction. And I wonder what is more natural than 

Re: [Caml-list] SMP multithreading

2010-11-16 Thread Wolfgang Draxinger
On Wed, 17 Nov 2010 01:04:54 +0200
Eray Ozkural examach...@gmail.com wrote:

 [readworthy text]

I'd like to point out how the big competitor to OCaml deals with it.
The GHC Haskell system has SMP parallization built in for some time,
and it does it quite well.


Wolfgang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-16 Thread Eray Ozkural
On Wed, Nov 17, 2010 at 1:52 AM, Wolfgang Draxinger 
wdraxinger.maill...@draxit.de wrote:

 On Wed, 17 Nov 2010 01:04:54 +0200
 Eray Ozkural examach...@gmail.com wrote:

  [readworthy text]

 I'd like to point out how the big competitor to OCaml deals with it.
 The GHC Haskell system has SMP parallization built in for some time,
 and it does it quite well.


I think I tested the parallel features just once in the distant past,
something like that would be so useful for ocaml :) Explicit threading that
is suited to functional programming with a syntax independent of the actual
thread implementation. The par combinator looks like fun to use. Wishlist
item definitely :)

Cheers,

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


RE: [Caml-list] SMP multithreading

2010-11-16 Thread Jon Harrop
Wolfgang wrote:
 I'd like to point out how the big competitor to OCaml deals with it.
 The GHC Haskell system has SMP parallization built in for some time,
 and it does it quite well.

I beg to differ. Upon trying to reproduce many of the Haskell community's
results, I found that even their own parallel Haskell programs often exhibit
huge slowdowns. This is because Haskell's unpredictable performance leads to
unpredictable granularity and, consequently, more time can be spent
administering the tiny parallel computations than is gained by doing so.

The results I found here are typical:

 
http://flyingfrogblog.blogspot.com/2010/01/naive-parallelism-with-hlvm.html

Note that the absolute performance peaks at an unpredictable number of cores
only in the case of Haskell. This is because the GC does not scale beyond
about 4 cores for any Haskell programs doing significant amounts of
allocation, which is basically all Haskell programs because allocations are
everywhere in Haskell.

Ultimately, running on all cores attains no speedup at all with Haskell in
that case. This was branded the last core slowdown but the slowdown
clearly started well before all 8 cores. There was a significant development
towards improving this situation but it won't fix the granularity problem:

  http://hackage.haskell.org/trac/ghc/blog/new-gc-preview

The paper Regular, shape-polymorphic, parallel arrays in Haskell cites
2.5x speedups when existing techniques were not only already getting 7x
speedups but better absolute performance as well. Cache complexity is the
problem, as I explained here:

 
http://flyingfrogblog.blogspot.com/2010/06/regular-shape-polymorphic-paralle
l.html

Probably the best solution for multicore programming is Cilk. This technique
has already been adopted both in Intel's TBB and Microsoft's .NET 4 but,
AFAIK, the only functional language with access to it is F#. There are some
great papers on multicore-friendly cache oblivious algorithms written in
Cilk:

  http://www.fftw.org/~athena/papers/tocs08.pdf

Note, in particular, that Cilk is not only much faster but also much easier
to use than explicit message passing.

To do something like this, threads need to be able to run in parallel and
mutate the same shared heap. Although that is objectively easy (I did it in
HLVM), OCaml's reliance upon very high allocation rates, efficient
collection of young garbage and a ridiculous density of pointers in the heap
make it a *lot* harder.

Cheers,
Jon.


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


RE: [Caml-list] SMP multithreading

2010-11-16 Thread Jon Harrop
Granularity and cache complexity are the reasons why not. If you find
anything and everything that can be done in parallel and parallelize it then
you generally obtain only slowdowns. An essential trick is to exploit
locality via mutation but, of course, purely functional programming sucks at
that by design not least because it is striving to abstract that concept
away.

 

I share your dream but I doubt it will ever be realized.

 

Cheers,

Jon.

 

From: caml-list-boun...@yquem.inria.fr
[mailto:caml-list-boun...@yquem.inria.fr] On Behalf Of Eray Ozkural
Sent: 16 November 2010 23:05
To: Gerd Stolpmann
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] SMP multithreading

 

 

On Wed, Nov 17, 2010 at 12:13 AM, Gerd Stolpmann i...@gerd-stolpmann.de
wrote:

Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural:



 On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann
 i...@gerd-stolpmann.de wrote:
 Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar
 Friendly:
  * As somebody mentioned implicit parallelization: Don't
 expect
anything from this. Even if a good compiler finds ways
 to
parallelize 20% of the code (which would be a lot), the
 runtime
effect would be marginal. 80% of the code is run at
 normal speed
(hopefully) and dominates the runtime behavior. The
 point is
that such compiler-driven code improvements are only
 local
optimizations. For getting good parallelization results
 you need
to restructure the design of the program - well, maybe
compiler2.0 can do this at some time, but this is not
 in sight.

 I think you are underestimating parallelizing compilers.

I was more citing Amdahl's law, and did not want to criticize any effort
in this area. It's more the usefulness for the majority of problems. How
useful is the best parallelizing compiler if only a small part of the
program _can_ actually benefit from it? Think about it. If you are not
working in an area where many subroutines can be sped up, you consider
this way of parallelizing as a waste of time. And this is still true for
the majority of problems. Also, for many problems that can be tackled,
the scalability is very limited.
  


What makes you think only 20% of the code can be parallelized? I've worked
in such a compiler project, and there were way too many opportunities for
parallelization in ordinary C code, let alone a functional language;
implicit parallelism would work wonders there. Of course you may think
whatever you were thinking, but I know that a high degree of parallelism can
be achieved through a functional language. I can't tell you much more
though. If you think that a very small portion of the code can be
parallelized you probably do not appreciate the kinds of static and dynamic
analysis those compilers perform. Of course if you are thinking of applying
it to some office or e-mail application, this might not be the case, but
automatic parallelization strategies would work best when you apply them to
a computationally intensive program.

The really limiting factor for current functional languages would be their
reliance on inherently sequential primitives like list processing, which may
in some cases limit the compiler to only pipeline parallelism (something
non-trivial to do by hand actually). Instead they would have to get their
basic forms from high-level parallel PLs (which might mean rewriting a lot
of things). The programs could look like something out of a category theory
textbook then. But I think even without modification you could get a lot of
speedup from ocaml code by applying state-of-the-art automatic
parallelization techniques. By the way, how much of the serial work is
parallelized that's what matters rather than the code length that is. Even
if the parallelism is not so obvious, the analysis can find out which
iterations are parallelizable, which variables have which kinds of
dependencies, memory dependencies, etc. etc. In the public, there is this
perception that automatic parallelization does not work, which is wrong,
while it is true that the popular compiler vendors do not understand much in
this regard, all I've seen (in popular products) are lame attempts to
generate vector instructions

That is not to say, that implicit parallelization of current code can
replace parallel algorithm design (which is AI-complete). Rather, I think,
implicit parallelization is one of the things that will help parallel
computing people: by having them work at a more abstract level, exposing
more concurrency through functional forms, yet avoiding writing low-level
comms code. High-level explicit parallelism is also quite favorable, as
there may be many situations where, say, dynamic load-balancing approaches
are suitable. The approach of HPF might be relevant here, with the
programmer

Re: [Caml-list] SMP multithreading

2010-11-16 Thread Eray Ozkural
Oh well, I'm not so surprised that the fine-grain task-parallelism with (?)
dynamic load-balancing strategy doesn't get much speedup. Doing HPC with
Haskell is a bit like using Java for writing parallel programs, you might as
well use a C-64 and Commodore BASIC. And yes, some people do use Java with
MPI. Java people have benchmarks too :) But for some reason I had difficulty
using Java and Haskell even with medium size problems.

On the other hand, a lot more can be achieved with a parallelizing compiler
that uses profiling and static analysis. As I said even in C good results
can be achieved, I've seen that, so I know it's doable with ocaml, just a
difficult kind of compiler. The functional features would expose more
concurrency.

At any rate, implicit parallelism isn't the same as a parallelizing
compiler, it's better, because you would be using primitives, that the
compiler knows to its heart. That's like combining the best of both worlds,
I think, because obviously parallelizing compilers can work best on the
easier kinds of parallelism. It can pull more tricks than many assume, but
it would still not replace a parallel algorithm designer.  You don't really
expect to give quicksort as input and get hypercube quicksort as output in
these parallelizing compilers that apply a number of heuristic
transformations to the code, but in many problems they can be made to
generate pretty good code. The best part is once you have it you can apply
it to every program, it's one of the cheapest ways to get speedup, so I'd
say it's worthwhile for ocaml right now. Just not the way GHC does.

On Wed, Nov 17, 2010 at 5:47 AM, Jon Harrop 
jonathandeanhar...@googlemail.com wrote:

 Granularity and cache complexity are the reasons why not. If you find
 anything and everything that can be done in parallel and parallelize it then
 you generally obtain only slowdowns. An essential trick is to exploit
 locality via mutation but, of course, purely functional programming sucks at
 that by design not least because it is striving to abstract that concept
 away.



 I share your dream but I doubt it will ever be realized.



 Cheers,

 Jon.



 *From:* caml-list-boun...@yquem.inria.fr [mailto:
 caml-list-boun...@yquem.inria.fr] *On Behalf Of *Eray Ozkural
 *Sent:* 16 November 2010 23:05
 *To:* Gerd Stolpmann
 *Cc:* caml-list@yquem.inria.fr
 *Subject:* Re: [Caml-list] SMP multithreading





 On Wed, Nov 17, 2010 at 12:13 AM, Gerd Stolpmann i...@gerd-stolpmann.de
 wrote:

 Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural:

 
 
  On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann
  i...@gerd-stolpmann.de wrote:
  Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar
  Friendly:
   * As somebody mentioned implicit parallelization: Don't
  expect
 anything from this. Even if a good compiler finds ways
  to
 parallelize 20% of the code (which would be a lot), the
  runtime
 effect would be marginal. 80% of the code is run at
  normal speed
 (hopefully) and dominates the runtime behavior. The
  point is
 that such compiler-driven code improvements are only
  local
 optimizations. For getting good parallelization results
  you need
 to restructure the design of the program - well, maybe
 compiler2.0 can do this at some time, but this is not
  in sight.
 
  I think you are underestimating parallelizing compilers.

 I was more citing Amdahl's law, and did not want to criticize any effort
 in this area. It's more the usefulness for the majority of problems. How
 useful is the best parallelizing compiler if only a small part of the
 program _can_ actually benefit from it? Think about it. If you are not
 working in an area where many subroutines can be sped up, you consider
 this way of parallelizing as a waste of time. And this is still true for
 the majority of problems. Also, for many problems that can be tackled,
 the scalability is very limited.



 What makes you think only 20% of the code can be parallelized? I've worked
 in such a compiler project, and there were way too many opportunities for
 parallelization in ordinary C code, let alone a functional language;
 implicit parallelism would work wonders there. Of course you may think
 whatever you were thinking, but I know that a high degree of parallelism can
 be achieved through a functional language. I can't tell you much more
 though. If you think that a very small portion of the code can be
 parallelized you probably do not appreciate the kinds of static and dynamic
 analysis those compilers perform. Of course if you are thinking of applying
 it to some office or e-mail application, this might not be the case, but
 automatic parallelization strategies would work best when you apply them to
 a computationally intensive program.

 The really limiting factor

Re: [Caml-list] SMP multithreading

2010-11-16 Thread Gabriel Kerneis
On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote:
 As I said even in C good results can be achieved, I've seen that, so I
 know it's doable with ocaml, just a difficult kind of compiler. The
 functional features would expose more concurrency.

Could you share a pointer to a paper describing this compiler?

Thanks,
-- 
Gabriel Kerneis

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] SMP multithreading

2010-11-15 Thread Edgar Friendly

On 11/15/2010 09:27 AM, Wolfgang Draxinger wrote:

Hi,

I've just read
http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html
in particular this paragraph:
| What about hyperthreading?  Well, I believe it's the last convulsive
| movement of SMP's corpse :-)  We'll see how it goes market-wise.  At
| any rate, the speedups announced for hyperthreading in the Pentium 4
| are below a factor of 1.5; probably not enough to offset the overhead
| of making the OCaml runtime system thread-safe.

This reads just like the 640k ought be enough for everyone. Multicore
systems are the standard today. Even the cheapest consumer machines
come with at least two cores. Once can easily get 6 core machines today.

Still thinking SMP was a niche and was dying?

So, what're the developments regarding SMP multithreading OCaml?


Cheers

Wolfgang

At the risk of feeding a (possibly unintentional) troll, I'd like to 
share some possibly new thoughts on this ever-living topic.


It looks like high-performance computing of the near future will be 
built out of many machines (message passing), each with many cores 
(SMP).  One could use message passing for all communication in such a 
system, but a hybrid approach might be best for this architecture, with 
use of shared memory within each box and message passing between.  Of 
course the best choice depends strongly on the particular task.


In the long run, it'll likely be a combination of a few large, powerful 
cores (Intel-CPU style w/ the capability to run a single thread as fast 
as possible) with many many smaller compute engines (GPGPUs or the like, 
optimized for power and area, closely coupled with memory) that provides 
the highest performance density.


The question of how to program such an architecture seems as if it's 
being answered without the functional community's input. What can we 
contribute?


E.

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs