Hi all
I would like to point out that these days there is a huge hype around multicore systems. As a result one sees stupid parallel demonstrations, such as the Mandelbrot one. This is a purely graphics demo with no other utility and remote from any reallistic parallel applications. Moreover, this examples is one of the few which can be done efficiently on present day multicore system. So, if FPC team is really intent on bringing parallel constructs in the language, one has to look at things from a broader perspective. It is necessary to think of what reallistic parallel applications would look like.

Unfortunately, the utility of multicore systems has been largely exagerated by their manufacturers. The main problem is that multiple cores share the same memory bandwith. As a result it is highly unlikely that one can have COMPLEX programs running in a concurrent way in a multicore system without clogging the memory bus and using up all the cache. Multiple cores are usefull if there is little memory transfer (does not happen often, except of course if you compute fractals), or if memory transfer is done in a predictable fashion. About the only example of the later is linear algebra subroutines (scientific computing) and certain multimedia applications (concurrent MPEG decoders, for example).

Now, it is true that a dot-product or vectur sum can be done very ellegantly with a prallel loop. However, these are very low-level operations, ones that (at least in the scientific computing community) are typically optimized for each particular architecture and provided as user api. Consequently no one would go around write matrix-vector multiplication in a high level language. Linear algerba is the usual bottleneck and if you do real applications this has already been written and optimized. Consequently, parallel loops look beautiful, but they are of little practially utility. In summary the programming style that lead to assembly level loop-unrolling for superscalar processors is likely to be the same programming style that will be used for multicore machines.

So typicall parallel code revolves around higher level algorithms. For example, if you want to compute the heat distribution in a automobile engine you would go and first partition your engine in a lot of smaller components. The you would perform complex, memory intensive computations on each piece, then you would patch them together. It is quesionable if milticore systems are usefull in such a scenario as it involves large memory transfers. However, if it is (or you have a real multi-processor shared memory machine), then what you would need from the language is a nice enapsulation of threads. This implies (local) parallel procedures, arrays of (local) parallel procedures, parallel class methods, semaphores and critical sections.

The present obstacle with object pascal is that for each (class) method that implements a parallel algorithm, one has to separately implement a thread object. Moreover, parallel algorithms typically need global vars (the one they operate in prallel to), so you need to move the local method variables to the thread object to. In the end, the implementation of you algorithm is shared between the method and the thread object. Finally synchronization is provided by classes (TEvent, TCriticalSection) which have to be constructed and destructed explicitly, with the necessary resource protection (try..finally) overhead. This is not convenient.

I hope this will helps the discussion.

Peter Popov
_______________________________________________
fpc-devel maillist  -  [email protected]
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to