== Quote from Dan (dsstruth...@yahoo.com)'s article > I have a question regarding performance issue I am seeing on multicore Windows systems. I am creating many threads to do parallel tasks, and on multicore Windows systems the performance is abysmal. If I use task manager to set the processor affinity to a single CPU, the program runs as I would expect. Without that, it takes about 10 times as long to complete. > Am I doing something wrong? I have tried DMD 2.0.37 and DMD 1.0.53 with the same results, running the binary on both a dual-core P4 and a newer Core2 duo. > Any help is greatly appreciated!
I've seen this happen before. Without knowing the details of your code, my best guess is that you're getting a lot of contention for the GC lock. (It could also be some other lock, but if it were, there's a good chance you'd already know it because it wouldn't be hidden.) The current GC design isn't very multithreading-friendly yet. It requires a lock on every allocation. Furthermore, the array append operator (~=) currently takes the GC lock on **every append** to query the GC for info about the memory block that the array points to. There's been plenty of talk about what should be done to eliminate this, but nothing has been implemented so far. Assuming I am right about why your code is so slow, here's how to deal with it: 1. Cut down on unnecessary memory allocations. Use structs instead of classes where it makes sense. 2. Try to stack allocate stuff. alloca is your friend. 3. Pre-allocate arrays if you know ahead of time how long they're supposed to be. If you don't know how long they're supposed to be, use std.array.Appender (in D2) for now until a better solution gets implemented. Never use ~= in multithreaded code that gets executed a lot.