I think this is a big issue.  It would be great if Juergen could get
involved with your project!

Blake McBride


On Mon, Aug 31, 2015 at 3:19 AM, Elias Mårtenson <[email protected]> wrote:

> I finally ran the full test case again, and I will concentrate on a piece
> of data:
>
> The constructor IntCell::IntCell(long) alone is responsible for 19.65%
> CPU time. What's most revealing is that it's being called 6210421276 times.
> For those who don't like counting numbers, that's over 6 billion times.
>
> We also have the pair IntCell::get_cell_type() and
> IntCell::get_int_value() being called the exact same number of times.
>
> My takeaway from that is that we have an excessive number of array
> duplications, creating arrays that are just traversed a single time, and
> then being dropped.
>
> Anyone interested in the details concerning my original experimentation
> with this can search back in the mailing list archives, but here's what
> happens:
>
> Let's take a simple expression:
>
> *    foo ← 1+2×⍳N*
>
> This performs the following operations internally:
>
>    1. Allocate array of size N and fill it with the sequence 1, 2, 3, …
>    2. Create a new array of size N and copy the values from the previous
>    point
>    3. Multiply all values by 2
>    4. Create a new array of size N and copy the values from the previous
>    point
>    5. Add 1 to all values
>    6. Create a new array of size N and copy…
>    7. Store this array in variable foo
>
> My original work attempted to mark arrays as "mutable". For example, the
> value "⍳N" would be mutable, since they are safe to modify in-place, while
> the result of "foo" would not be, since they should not be changed
> in-place. Thus, 1+⍳N should be able to create an array and update it, while
> 1+foo should copy the array before the addition. The implementation was
> similar in concept to your vanilla copy-on-write algorithm.
>
> My solution worked for simple cases, but had some very serious bugs that
> caused arrays to be changed that shouldn't be changed, which is why it was
> never integrated. However, for the cases where it worked, the performance
> improvement was massive.
>
> I have no current plans to resume this, but it would be interesting to
> discuss it if someone else wants to play around with it.
>
> Regards,
> Elias
>
> On 31 August 2015 at 00:36, Mike Duvos <[email protected]> wrote:
>
>> Hi Elias,
>>
>> It appears I screwed up my previous benchmark of SIEVE.  I apparently
>> forgot I had the XTerm on my desktop logged into an instance of Ubuntu
>> Server LTS on Amazon Web Services, and ended up comparing APL2 on my
>> desktop with GNU APL on a much faster processor.
>>
>> I've built SVN 669 on my Dell Dimension 2400, an older slower 32 bit
>> machine,
>> and now get the following results on that box for SIEVE 100000.  4.219
>> seconds for APL2, and 812.4 seconds for GNU APL.
>>
>> So GNU APL is about 192.557 times as slow.  There's a bit of variability
>> in the time for GNU APL.  If I put my ear to the box, I can hear the disk
>> being flogged on, and Task Manager shows the memory bouncing around between
>> 8 and 10 meg rapidly, so it's not necessarily true that the processor time
>> is equal to the wall clock time.
>>
>> I'm not sure how much time you'll get back by reducing redundant array
>> copying.  Any commercial APL will special-case "reshape", "and", and
>> :"index of" on packed boolean vectors, and this is always going to beat GNU
>> APL's object-oriented approach of boxing everything and making no
>> assumptions about the types of values.
>>
>> Looking back over the bug list, I see only two unresolved issues.
>> Building shared libraries under Cygwin/Windows, and specification sometimes
>> giving the wrong answer when you combine mixed structural primitives with
>> bracket selection.
>>
>> The shared library thing is important, because with AP210 not being
>> completely implemented, Windows users currently have no way to get ASCII
>> files in and out of their workspace.
>>
>> Regards,
>>
>> Mike
>>
>>
>> On Sun, Aug 30, 2015 at 4:20 AM, Elias Mårtenson <[email protected]>
>> wrote:
>>
>>> All right, I have re-run the Sieve benchmark on GNU APL as of svn id 669.
>>>
>>> I do see differences, but they are not great. I also ran it on a reduced
>>> case in order to have it finish quicker, so the breakdown could be
>>> incorrect. I will re-run this on the full (SIEVE 100000) test case and
>>> report back.
>>>
>>> I can still see Clone taking ridiculous amounts of time though.
>>>
>>> Regards,
>>> Elias
>>>
>>> On 29 August 2015 at 11:42, Elias Mårtenson <[email protected]> wrote:
>>>
>>>> I never analysed the Sieve benchmark. I'll take a look at it tonight.
>>>>
>>>> Regards,
>>>> Elias
>>>> On 29 Aug 2015 06:51, "Mike Duvos" <[email protected]> wrote:
>>>>
>>>>> Hi Fred,
>>>>>
>>>>> I just built SVN 664 on the Dell Dimension I did the original
>>>>> factorial benchmark on.  The time went from 52 seconds to 23.  So that is
>>>>> indeed a dramatic improvement.
>>>>>
>>>>> I wonder why my Prime Sieve didn't speed up much.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Mike
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 28, 2015 at 11:28 AM, Fred Weigel <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Juergen
>>>>>>
>>>>>> I just built 664 -- the performance of the factorial 300 that Mike
>>>>>> Duvos
>>>>>> posted has indeed improved enormously - now running 8 seconds instead
>>>>>> of
>>>>>> 20 on my reference platform.
>>>>>>
>>>>>> In line with expectations from Elias' profiling results.
>>>>>>
>>>>>> Very well done!
>>>>>>
>>>>>> FredW
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Reply via email to