Recently I thought it would be a good idea to try out the new concurrency system once again. Some time back, when 'shared' was still new, I already tried it several times but since it was completely unusable I gave up on it at that time (and as it seems, many others also did this).

Now, however, after TDPL has been released and there is some documentation + std.concurrency, the system should be in a state where it is actually useful and only some bugs should be there to fix - which does not include inherent system changes. The reality is quite different once you step anywhere beside the already walked path (defined by the book examples and similar things).

Just for the record, I've done a lot with most kinds of threading schemes (even if the only lockless thing I implemented was a simple Shared/WeakPtr implementation *shiver*). This may very well have the effect that there are some patterns burned into my head that somehow clash with some ideas behind the current system. But, for most of the points, I am quite sure that there is no viable alternative if performance and memory consumption should be anywhere new the optimum.

I apologize for the length of this post, although I already tried to make it as short as possible and left out a lot of details. Also it is very possible that I assume some false things about the concurrency implementation because my knowledge is mostly based only on the NG and the book chapter.

The following problems are those that I found during a one day endeavor to convert some parts of my code base to spawn/shared (not really successful, partly because of the very viral nature of shared).


1. spawn and objects

Spawn only supports 'function' + some bound parameters. Since taking the address of an object method in D always yields a delegate, it is not possible to call class members without a static wrapper function. This can be quite disturbing when working object oriented (C++ obviously has the same problem).


2. error messages

Right now, error messages just state that there is a shared/unshared mismatch somewhere. For a non-shared-expert, this can be a real bummer. You have to know a lot of implications 'shared' has to be able to correctly interpret these messages and track down the cause. Not very good for a feature that is meant to make threading easier.


3. everything in implicit

This may seem kind of counter-intuitive, but using 'synchronized' classes and features like setSameMutex - which are deadly necessary, it is stupid to neglect the importance of lock based threading in an object oriented environment - creates a feeling of climbing without a safety rope. Not stating how you really want to synchronize/lock and not being able to directly read from the code how this is really done just leaves a black-box feeling. This in turn means threading newcomers will not be educated, they just use the system somehow and it magically works. But as soon as you get problems such as deadlocks, you suddenly have to understand the details and in this moment you have to read up and remember everything that is going on in the background - plus everything you would have to know about threading/synchronization in C. I'm not sure if this is the right course here or if there is any better one.


4. steep learning curve - more a high learning wall to climb on

Resulting from the first points, my feeling tells me that a newcomer, who has not followed the discussions and thoughts about the system here, will see himself standing before a very high barrier of material to learn, before he can actually put anything of it to use. Also I imagine this to be a very painful process because of all the things that you discover are not possible or those error messages that potentially make you banging your head against the wall.

        
5. advanced synchronization primitives need to be considered

Things such as core.sync.condition (the most important one) need to be considered in the 'shared'-system. This means there needs to be a condition variable that takes a shared object instead of a mutex or you have to be able to query an objects mutex.

        
6. temporary unlock

There are often situations when you do lock-based programming, in which you need to temporarily unlock your mutex, perform some time consuming external task (disk i/o, ...) and then reaquire the mutex. For this feature, which is really important also because it is really difficult and dirty to work around it, needs language support, could be something like the inverse of a synchronized {} scope or the possibility to define a special kind of private member function that unlocks the mutex. Then, inside whose blocks the compiler of course has to make sure that the appropriate access rules are not broken (could be as conservative as disallowing access to any class member).

        
7. optimization of pseudo-shared objects

Since the sharability/'synchronized' of an object is already decided at class definition time, for performance reasons it should be possible to somehow disable the mutex of those instances that are only used thread locally. Maybe it should be necessary to declare objects as "shared C c;" even if the class is defined as "synchronized class C {}" or you will get an object without a mutex which is not shared?

        
8. very strong split of shared and non-shared worlds

For container classes in particular it is really nasty that you have to define two versions of the container, one shared and the other non-shared if you want to be able to use it in both contexts and be able to put non-shared objects in it in a non-shared context. Also there should really be a way to declare a class to be hygienic in a way similar to pure, so that it would be possible to allow it to be used in a synchronized context and store shared objects, although it is not shared itself.

        
9. unique

Unique objects or chunks of data are really important not only to be able to check that a cast to 'immutable' is correct, but also to allow for passing objects to another thread for computations without making a superfluous copy or doing superfluous computation.

        
10. recursive locking

The only option right now is to have mutexes behave recursively. This is good to easily avoid deadlocks in the same thread. However, in my experience they are very dangerous because typically no one takes into account what happens when an algorithm is invoked recursively from the middle of its computation. This can happen easily in a threaded environment where you often use signals/slots or message passing. A deadlock or at least an assertion in debug mode is a good indicator in 90% of the situations that there just happened something that should not. Of course objects with shared mutexes are a different matter - in this case you actually need to have an ownership relation to do anything useful with non-recursive mutexes.

        
11. holes in the system

It seems like there are a lot of ways in which you can still slip in non-shared data into a shared context.

        One example is that you can pass a shared array
        ---
                void fnc(int[] arr);
                void fnc2(){
                        shared int[] arr;
                        spawn(&fnc, arr);
                }
        ---
        
        compiles. This is just a bug and probably easy to fix but what about:
        
        ---
                class C {
                        private void method();
                        private void method2(){
                                spawn( void function(C inst){ inst.method(); }, 
this );
                        }
                }
        ---
        
unless private functions to recursive locking (which in turn is usually useless overhead), method() will be invoked in a completely unprotected context. Tthis one has to be fixed somehow in the language. I'm sure there are other things like these.

        
12. more practical examples need to be considered

It seems right now, that all the examples, that are used to explore the features needed in the system, are somehow of a very academical nature. Either the most simple i/o or pure functional comptation, maybe a network protocol. However, when it comes to practical high performance computation on real systems, where memory consumption and low-level performance can really matter, there seems to be quite some no-mans-land here.
        
        Here some simple examples where I immediately came to a grinding halt:
        
        I. A an object loader with background processing
        
You have a shared class Loader which uses multiple threads to load objects on demand and then fires a signal or returns from its loadObject(x) method.
                
The problem is that the actual loading of an object must happen outside of a synchronized region of the loader or you get no parallelism out of this. Also, you have to use an external function because of 'spawn' instead of being able to directly use a member function. Fortunately in this case this is also the solution. Defining an external function, that takes the arguments needed to load the object, loading it, and then passing it back to the class. Waiting for finished objects can be implemented using message passing without worry here because the MP overhead is probably low enough.
                
                Features missing:
                        - spawn with methods
                        - temporary unlock
                        
        II. Implementation of a ThreadPool
        
The majority of applications can very well be broken up into small chunks of work that can be processed in parallel. Instead of using a costly thread-create, run task, thread-destroy cycle, it would be wise to reuse the threads for later tasks. The implementation of a thread pool that does this is of course a low-level thing and you could argue that it is ok to use some casts and such stuff here. Anyway, there are quite some things missing here.
                
                Features Missing:
                        - spawn with methods
                        - temporary unlock
- condition variables (message passing too slow + you need to manage destinations)

        III. multiple threads computing separate parts of an array
                
Probably the most simple form of parallelism is to perform similar operations on each element of an array (or similar things on regions of the array) and to do this in separate threads. The good news is that this works in the current implementation. The bad news is that this is really slow because you have to use atomic operations on the elements or it is unsafe and prone to low-level races. Right now the compiler checks almost nothing.
                The alternative would be to pass unique
                
                To illustrate the current situation, this compiles and runs:

                ---
                        import std.concurrency;
                        import std.stdio;

                        void doCompute(size_t offset, int[] arr){ // arr should 
be shared
                                foreach(i, ref el; arr){
el *= 2; // should be an atomic operation, which would make this useless because of the performance penalty writefln("Thread %s computed element %d: %d", thisTid(), i + offset, cast(int)el);
                                }
                        }

                        void waitForThread(Tid thread){
// TODO: implement in some complex way using messages or maybe there is a simple function for this
                        }

                        void main(){
                                shared int[] myarray = [1, 2, 3, 4];
                                Tid[2] threads;
                                foreach( i, ref t; threads )
t = spawn(&doCompute, i, myarray[i .. i+3]); // should error out because the slice is not shared
                                foreach( t; threads )
                                        waitForThread(t);
                        }
                ---
                
                Features missing:
                        - unique
- some way to safely partition/slice an array and get a set of still unique slices


- Sönke

Reply via email to