OK, I started some tests on my 80-core machine. At first I decided to run
the exact same thing as what you ran above.

As you can see, before I set the dyadic threshold, I got the expected
results. After setting it, the same command hangs with 200% CPU usage. At
the time I'm writing this mail, it's been sitting like that for about 30
minutes or so.

Here's the log of what I did. GNU APL was compiled with CORE_COUNT_WANTED=-3

      *∇Z ← NCPU time LEN;T;X;tmp*
[1]       *⎕SYL[26;2] ← NCPU*
[2]       *X ← LEN⍴2J2*
[3]       *T ← ⎕TS*
[4]       *tmp ← X⋆X*
[5]       *Z←1 1 1 24 60 60 1000⊥⎕TS - T*
[6] *∇*
*      (⍳8) ∘.time 10⋆⍳7*
0 0 1 7 40 414 4180
0 0 0 3 38 409 4178
0 0 0 4 39 412 4212
0 0 1 4 38 416 4204
0 0 0 5 39 417 4225
0 0 0 4 39 417 4232
0 0 0 4 38 417 4245
0 0 0 4 38 417 4241
      *)COPY 5 FILE_IO*
loading )DUMP file /home/emartenson/src/apl/wslib5/FILE_IO.apl...

      *1 FIO∆set_dyadic_threshold  '⋆'*
8888888888888888888
      *(⍳8) ∘.time 10⋆⍳7*
*(Hangs here)*

Regards,
Elias

On 26 September 2014 20:04, Juergen Sauermann <juergen.sauerm...@t-online.de
> wrote:

>  Hi Elias,
>
> if you used a recent SVN then you need to set the thresholds (vector size)
> above which
> parallel execution is performed:
>
> *      (⍳4) ∘.time 10⋆⍳7*
> *0 0 1 3 29 254 2593*
> *0 0 1 2 25 252 2618*
> *0 0 1 2 26 258 2682*
> *0 0 1 2 26 263 2866*
>
> *      )COPY 5 FILE_IO*
> *loading )DUMP file /usr/local/lib/apl/wslib5/FILE_IO.apl...*
>
> *      1 FIO∆set_dyadic_threshold  '⋆'*
> *⍝ returns the previous threshold for dyadic ⋆ *
>
> *8070450532247928832 **      (⍳4) ∘.time 10⋆⍳7*
> *0 0 0 2 30 250 2590*
> *0 0 0 1 15 149 1580*
> *0 0 0 1 11 113 1225*
> *0 3 0 0 12 103 1120*
>
> I am currently working on a benchmark workspace that determines the
> optimal thresholds
> for the different scalar functions (and those thresholds will beome the
> future defaults). Right
> now the default thresholds are so high that you will always have
> sequential execution.
>
> /// Jürgen
>
>
>  On 09/26/2014 07:22 AM, Elias Mårtenson wrote:
>
> I've tested this code, and I don't see much of an improvement as I
> increase the core count:
>
>  Given the following function:
>
>      ∇Z ← NCPU time LEN;T;X;tmp
>       ⎕SYL[26;2] ← NCPU
>       X ← LEN⍴2J2
>       T ← ⎕TS
>       tmp ← X⋆X
>       Z←1 1 1 24 60 60 1000⊥⎕TS - T
>      ∇
>
>  I'm running this command on my 8-core workstation:
>
>  *      (⍳8) ∘.time 10⋆⍳7*
> 0 0 0 2 19 188 2139
> 0 0 1 2 19 189 2147
> 0 0 1 2 19 210 2256
> 0 0 0 2 19 194 2427
> 0 0 0 3 28 284 3581
> 0 0 0 3 27 280 3510
> 0 0 0 3 27 284 3754
> 0 0 0 3 27 279 3637
>
>  Regards,
> Elias
>
> On 26 September 2014 13:05, Elias Mårtenson <loke...@gmail.com> wrote:
>
>> Thanks, I have merged the necessary changes.
>>
>>  Regards,
>> Elias
>>
>> On 22 September 2014 23:50, Juergen Sauermann <
>> juergen.sauerm...@t-online.de> wrote:
>>
>>>  Hi,
>>>
>>> I have finished a first shot at parallel (i.e. multicore) GNU APL: SVN
>>> 480.
>>>
>>> This version computes all scalar functions in parallel if the ravel
>>> length of the result exceeds 100.
>>> This can make the computation of small (but still > 100) vectors slower
>>> than if they were computed sequentially.
>>> Therefore parallel execution is not yet the default. To enable it:
>>>
>>>
>>>
>>>
>>> *    ./configure     make parallel     make     sudo make install*
>>>
>>> The current version uses some linux-specific features, which will be
>>> ported to other platforms later on (if possible).
>>> ./configure is supposed to detect this.
>>>
>>> Some simple benchmarks are promising:
>>>
>>> *      X←1000000⍴2J2   ⍝ 1 Mio complex numbers*
>>>
>>> *      ⎕SYL[26;2]←1   ⍝ 1 core*
>>> *      T←⎕TS ◊ ⊣X⋆X ◊ 1 1 1 24 60 60 1000⊥⎕TS - T*
>>> *246*
>>>
>>> *      ⎕SYL[26;2]←2   ⍝ 2 cores*
>>> *      T←⎕TS ◊ ⊣X⋆X ◊ 1 1 1 24 60 60 1000⊥⎕TS - T*
>>> *136*
>>>
>>> *      ⎕SYL[26;2]←3   ⍝ 3 cores*
>>> *      T←⎕TS ◊ ⊣X⋆X ◊ 1 1 1 24 60 60 1000⊥⎕TS - T*
>>> *102*
>>>
>>> *      ⎕SYL[26;2]←4   ⍝ 4 cores*
>>> *      T←⎕TS ◊ ⊣X⋆X ◊ 1 1 1 24 60 60 1000⊥⎕TS - T*
>>> 91
>>>
>>> The next step will be to find the break-even points of all scalar
>>> functions, so that parallel execution is
>>> only done when it promises some speedup.
>>>
>>> Elias, the *PointerCell* constructor has got one more argument . I have
>>> updated *emacs-mode* and *sql* accordingly.
>>> - you may want to sync back.
>>>
>>> /// Jürgen
>>>
>>>
>>>
>>>
>>
>
>

Reply via email to