I finally ran the full test case again, and I will concentrate on a piece
of data:

The constructor IntCell::IntCell(long) alone is responsible for 19.65% CPU
time. What's most revealing is that it's being called 6210421276 times. For
those who don't like counting numbers, that's over 6 billion times.

We also have the pair IntCell::get_cell_type() and IntCell::get_int_value()
being called the exact same number of times.

My takeaway from that is that we have an excessive number of array
duplications, creating arrays that are just traversed a single time, and
then being dropped.

Anyone interested in the details concerning my original experimentation
with this can search back in the mailing list archives, but here's what
happens:

Let's take a simple expression:

*    foo ← 1+2×⍳N*

This performs the following operations internally:

   1. Allocate array of size N and fill it with the sequence 1, 2, 3, …
   2. Create a new array of size N and copy the values from the previous
   point
   3. Multiply all values by 2
   4. Create a new array of size N and copy the values from the previous
   point
   5. Add 1 to all values
   6. Create a new array of size N and copy…
   7. Store this array in variable foo

My original work attempted to mark arrays as "mutable". For example, the
value "⍳N" would be mutable, since they are safe to modify in-place, while
the result of "foo" would not be, since they should not be changed
in-place. Thus, 1+⍳N should be able to create an array and update it, while
1+foo should copy the array before the addition. The implementation was
similar in concept to your vanilla copy-on-write algorithm.

My solution worked for simple cases, but had some very serious bugs that
caused arrays to be changed that shouldn't be changed, which is why it was
never integrated. However, for the cases where it worked, the performance
improvement was massive.

I have no current plans to resume this, but it would be interesting to
discuss it if someone else wants to play around with it.

Regards,
Elias

On 31 August 2015 at 00:36, Mike Duvos <[email protected]> wrote:

> Hi Elias,
>
> It appears I screwed up my previous benchmark of SIEVE.  I apparently
> forgot I had the XTerm on my desktop logged into an instance of Ubuntu
> Server LTS on Amazon Web Services, and ended up comparing APL2 on my
> desktop with GNU APL on a much faster processor.
>
> I've built SVN 669 on my Dell Dimension 2400, an older slower 32 bit
> machine,
> and now get the following results on that box for SIEVE 100000.  4.219
> seconds for APL2, and 812.4 seconds for GNU APL.
>
> So GNU APL is about 192.557 times as slow.  There's a bit of variability
> in the time for GNU APL.  If I put my ear to the box, I can hear the disk
> being flogged on, and Task Manager shows the memory bouncing around between
> 8 and 10 meg rapidly, so it's not necessarily true that the processor time
> is equal to the wall clock time.
>
> I'm not sure how much time you'll get back by reducing redundant array
> copying.  Any commercial APL will special-case "reshape", "and", and
> :"index of" on packed boolean vectors, and this is always going to beat GNU
> APL's object-oriented approach of boxing everything and making no
> assumptions about the types of values.
>
> Looking back over the bug list, I see only two unresolved issues.
> Building shared libraries under Cygwin/Windows, and specification sometimes
> giving the wrong answer when you combine mixed structural primitives with
> bracket selection.
>
> The shared library thing is important, because with AP210 not being
> completely implemented, Windows users currently have no way to get ASCII
> files in and out of their workspace.
>
> Regards,
>
> Mike
>
>
> On Sun, Aug 30, 2015 at 4:20 AM, Elias Mårtenson <[email protected]>
> wrote:
>
>> All right, I have re-run the Sieve benchmark on GNU APL as of svn id 669.
>>
>> I do see differences, but they are not great. I also ran it on a reduced
>> case in order to have it finish quicker, so the breakdown could be
>> incorrect. I will re-run this on the full (SIEVE 100000) test case and
>> report back.
>>
>> I can still see Clone taking ridiculous amounts of time though.
>>
>> Regards,
>> Elias
>>
>> On 29 August 2015 at 11:42, Elias Mårtenson <[email protected]> wrote:
>>
>>> I never analysed the Sieve benchmark. I'll take a look at it tonight.
>>>
>>> Regards,
>>> Elias
>>> On 29 Aug 2015 06:51, "Mike Duvos" <[email protected]> wrote:
>>>
>>>> Hi Fred,
>>>>
>>>> I just built SVN 664 on the Dell Dimension I did the original factorial
>>>> benchmark on.  The time went from 52 seconds to 23.  So that is indeed a
>>>> dramatic improvement.
>>>>
>>>> I wonder why my Prime Sieve didn't speed up much.
>>>>
>>>> Regards,
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>> On Fri, Aug 28, 2015 at 11:28 AM, Fred Weigel <[email protected]>
>>>> wrote:
>>>>
>>>>> Juergen
>>>>>
>>>>> I just built 664 -- the performance of the factorial 300 that Mike
>>>>> Duvos
>>>>> posted has indeed improved enormously - now running 8 seconds instead
>>>>> of
>>>>> 20 on my reference platform.
>>>>>
>>>>> In line with expectations from Elias' profiling results.
>>>>>
>>>>> Very well done!
>>>>>
>>>>> FredW
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>

Reply via email to