May I suggest that being able to choose the number of cores at runtime should actually be the default. Remember that most Linux distributions will not compile the source on the local machine and instead distributes binaries.
Having some #ifdefs would be good, and having runtime user-selected (or automatically based on cores) number of threads as default is important for this reason. Regards, Elias On 11 March 2014 23:07, Juergen Sauermann <[email protected]>wrote: > Hi David, > > looks good! Some comments, though. > > 1 .you could adapt src/testcases/Performance.pt with some longer > skalar functions in order to get some performance figures. You can start > it like this: > > ./apl -T testcases/Performance.pt > > 2. I believe we should not bother the user with specifying parallelization > parameters in ⎕SYL. > I would rather ./configure CORES=n with n=1 meaning no parallel execution, > CORES=auto > being the number of cores on the build machine, and explicit numbers n>1 > meaning that > n cores shall be used. This would generate slightly faster code than > computing array bounds > at runtime. Its a bit more hassle for the user, but may pay off soon. > > 3. Yes, GNU APL throws many exception (almost every APL error was thrown > from somewhere), > and I was excpecting that we have to catch them on the throwing > processor. Not too difficult if > we do it on the top level. > > 4. It would be good to understand how the OPenMP loops work. I could > imagined one of two strategies: > > - in loop(j, MAX) thread j executes iteration j, j+CORES, ... > - thread j executes iterations j*MAX/CORES ... (j+1)*MAX/CORES > > The first strategy interleaves the data and is more intuitive > while the second uses blocks of data and is more cache-friendly and > therefore probably > giving better performance. > > 5. Not sure if your earlier comment on letting the scheduler decide is > correct. I have been doing > pthread programming in the past and I have seen cases where the scheduler > fooled itself and > led to cases where the same problem took more than double the capacity > compared to explicit > affinity on a 4-core CPU. I would expect that APL generates very > fine-graned and short-lived > pieces of execution and the scheduler may not be optimized for that. I > guess we have to try that out. > > /// Jürgen > > > > > On 03/11/2014 08:02 AM, David B. Lamkins wrote: > >> Juergen's suggestion prompted me to attempt an implementation using >> OpenMP rather than the by-hand coding that I had been anticipating. >> Attached is a quick-and-dirty patch to enable GNU APL to be build with >> OpenMP support. >> >> ./configure --with-openmp >> >> There are many rough edges, both in the Makefile and the code. >> >> --with-openmp would ideally check to see whether the compiler supports >> OpenMP. It may be necessary to check the compiler version, as different >> compilers support different versions of OpenMP. Also, I've assumed >> compilation on/for Linux despite the fact that GNU APL and OpenMP should >> be buildable with the right Windows compiler. >> >> As one might expect, OpenMP requires that any throw from a worker thread >> must be caught by the same thread. I'm almost certain that this >> restriction could be violated by GNU APL code as currently written. >> >> The good news, though, is that the changes are benign; in the absence of >> --with-openmp, GNU APL's behavior is unchanged. >> >> With OpenMP support, ⎕syl is extended to access some of OpenMPs >> parameters. >> >> I've done only trivial testing at this point; just enough to verify that >> compiling OpenMP support doesn't obviously break GNU APL. >> >> I haven't confirmed that the OpenMP #pragmas on the key loops in >> SkalarFunction.cc have any effect on execution time or processor core >> utilization. I hope to do more testing later this week. >> >> Best wishes, >> David >> >> > >
