Oh and one more thing: Have you given any thought to my comments re. the
coalescing of certain functions to reduce thread dispatch effort? (also,
add some more functions to the no-copy optimisation?)

Regards,
Elias


On 11 March 2014 23:22, Elias Mårtenson <[email protected]> wrote:

> I agree. I just wanted to point out that without a runtime option,
> delivering binary versions will be hard, forcing the package maintainers to
> choose a default that will surely be wrong for the majority of users.
>
> That said, being able to choose a compile-time value is good too.
>
> Regards,
> Elias
>
>
> On 11 March 2014 23:20, Juergen Sauermann 
> <[email protected]>wrote:
>
>>  Hi,
>>
>> we could do it similar to the LOG macro where you can choose between
>> more efficient compile-time settings and less efficient run-time settings.
>>
>> It is important that we do these things properly from the outset to avoid
>> too many changes later on.
>>
>> /// Jürgen
>>
>>
>>
>> On 03/11/2014 04:10 PM, Elias Mårtenson wrote:
>>
>> May I suggest that being able to choose the number of cores at runtime
>> should actually be the default. Remember that most Linux distributions will
>> not compile the source on the local machine and instead distributes
>> binaries.
>>
>>  Having some #ifdefs would be good, and having runtime user-selected (or
>> automatically based on cores) number of threads as default is important for
>> this reason.
>>
>>  Regards,
>> Elias
>>
>>
>> On 11 March 2014 23:07, Juergen Sauermann 
>> <[email protected]>wrote:
>>
>>> Hi David,
>>>
>>> looks good! Some comments, though.
>>>
>>> 1 .you could adapt src/testcases/Performance.pt with some longer
>>> skalar functions in order to get some performance figures. You can start
>>> it like this:
>>>
>>> ./apl -T testcases/Performance.pt
>>>
>>> 2. I believe we should not bother the user with specifying
>>> parallelization parameters in ⎕SYL.
>>> I would rather ./configure CORES=n with n=1 meaning no parallel
>>> execution, CORES=auto
>>> being the number of cores on the build machine, and explicit numbers n>1
>>> meaning that
>>> n cores shall be used. This would generate slightly faster code than
>>> computing array bounds
>>> at runtime. Its a bit more hassle for the user, but may pay off soon.
>>>
>>> 3. Yes, GNU APL throws many exception (almost every APL error was thrown
>>> from somewhere),
>>>  and I was excpecting that we have to catch them on the throwing
>>> processor. Not too difficult if
>>> we do it on the top level.
>>>
>>> 4. It would be good to understand how the OPenMP loops work. I could
>>> imagined one of two strategies:
>>>
>>> - in loop(j, MAX)   thread j executes iteration j, j+CORES, ...
>>> - thread j executes iterations j*MAX/CORES ... (j+1)*MAX/CORES
>>>
>>> The first strategy interleaves the data and is more intuitive
>>> while the second uses blocks of data and is more cache-friendly and
>>> therefore probably
>>> giving better performance.
>>>
>>> 5. Not sure if your earlier comment on letting the scheduler decide is
>>> correct. I have been doing
>>> pthread programming in the past and I have seen cases where the
>>> scheduler fooled itself and
>>> led to cases where the same problem took more than double the capacity
>>> compared to explicit
>>> affinity on a 4-core CPU. I would expect that APL generates very
>>> fine-graned and short-lived
>>> pieces of execution and the scheduler may not be optimized for that. I
>>> guess we have to try that out.
>>>
>>> /// Jürgen
>>>
>>>
>>>
>>>
>>> On 03/11/2014 08:02 AM, David B. Lamkins wrote:
>>>
>>>> Juergen's suggestion prompted me to attempt an implementation using
>>>> OpenMP rather than the by-hand coding that I had been anticipating.
>>>> Attached is a quick-and-dirty patch to enable GNU APL to be build with
>>>> OpenMP support.
>>>>
>>>> ./configure --with-openmp
>>>>
>>>> There are many rough edges, both in the Makefile and the code.
>>>>
>>>> --with-openmp would ideally check to see whether the compiler supports
>>>> OpenMP. It may be necessary to check the compiler version, as different
>>>> compilers support different versions of OpenMP. Also, I've assumed
>>>> compilation on/for Linux despite the fact that GNU APL and OpenMP should
>>>> be buildable with the right Windows compiler.
>>>>
>>>> As one might expect, OpenMP requires that any throw from a worker thread
>>>> must be caught by the same thread. I'm almost certain that this
>>>> restriction could be violated by GNU APL code as currently written.
>>>>
>>>> The good news, though, is that the changes are benign; in the absence of
>>>> --with-openmp, GNU APL's behavior is unchanged.
>>>>
>>>> With OpenMP support, ⎕syl is extended to access some of OpenMPs
>>>> parameters.
>>>>
>>>> I've done only trivial testing at this point; just enough to verify that
>>>> compiling OpenMP support doesn't obviously break GNU APL.
>>>>
>>>> I haven't confirmed that the OpenMP #pragmas on the key loops in
>>>> SkalarFunction.cc have any effect on execution time or processor core
>>>> utilization. I hope to do more testing later this week.
>>>>
>>>> Best wishes,
>>>>    David
>>>>
>>>>
>>>
>>>
>>
>>
>

Reply via email to