I think this is a big issue. It would be great if Juergen could get involved with your project!
Blake McBride On Mon, Aug 31, 2015 at 3:19 AM, Elias Mårtenson <[email protected]> wrote: > I finally ran the full test case again, and I will concentrate on a piece > of data: > > The constructor IntCell::IntCell(long) alone is responsible for 19.65% > CPU time. What's most revealing is that it's being called 6210421276 times. > For those who don't like counting numbers, that's over 6 billion times. > > We also have the pair IntCell::get_cell_type() and > IntCell::get_int_value() being called the exact same number of times. > > My takeaway from that is that we have an excessive number of array > duplications, creating arrays that are just traversed a single time, and > then being dropped. > > Anyone interested in the details concerning my original experimentation > with this can search back in the mailing list archives, but here's what > happens: > > Let's take a simple expression: > > * foo ← 1+2×⍳N* > > This performs the following operations internally: > > 1. Allocate array of size N and fill it with the sequence 1, 2, 3, … > 2. Create a new array of size N and copy the values from the previous > point > 3. Multiply all values by 2 > 4. Create a new array of size N and copy the values from the previous > point > 5. Add 1 to all values > 6. Create a new array of size N and copy… > 7. Store this array in variable foo > > My original work attempted to mark arrays as "mutable". For example, the > value "⍳N" would be mutable, since they are safe to modify in-place, while > the result of "foo" would not be, since they should not be changed > in-place. Thus, 1+⍳N should be able to create an array and update it, while > 1+foo should copy the array before the addition. The implementation was > similar in concept to your vanilla copy-on-write algorithm. > > My solution worked for simple cases, but had some very serious bugs that > caused arrays to be changed that shouldn't be changed, which is why it was > never integrated. However, for the cases where it worked, the performance > improvement was massive. > > I have no current plans to resume this, but it would be interesting to > discuss it if someone else wants to play around with it. > > Regards, > Elias > > On 31 August 2015 at 00:36, Mike Duvos <[email protected]> wrote: > >> Hi Elias, >> >> It appears I screwed up my previous benchmark of SIEVE. I apparently >> forgot I had the XTerm on my desktop logged into an instance of Ubuntu >> Server LTS on Amazon Web Services, and ended up comparing APL2 on my >> desktop with GNU APL on a much faster processor. >> >> I've built SVN 669 on my Dell Dimension 2400, an older slower 32 bit >> machine, >> and now get the following results on that box for SIEVE 100000. 4.219 >> seconds for APL2, and 812.4 seconds for GNU APL. >> >> So GNU APL is about 192.557 times as slow. There's a bit of variability >> in the time for GNU APL. If I put my ear to the box, I can hear the disk >> being flogged on, and Task Manager shows the memory bouncing around between >> 8 and 10 meg rapidly, so it's not necessarily true that the processor time >> is equal to the wall clock time. >> >> I'm not sure how much time you'll get back by reducing redundant array >> copying. Any commercial APL will special-case "reshape", "and", and >> :"index of" on packed boolean vectors, and this is always going to beat GNU >> APL's object-oriented approach of boxing everything and making no >> assumptions about the types of values. >> >> Looking back over the bug list, I see only two unresolved issues. >> Building shared libraries under Cygwin/Windows, and specification sometimes >> giving the wrong answer when you combine mixed structural primitives with >> bracket selection. >> >> The shared library thing is important, because with AP210 not being >> completely implemented, Windows users currently have no way to get ASCII >> files in and out of their workspace. >> >> Regards, >> >> Mike >> >> >> On Sun, Aug 30, 2015 at 4:20 AM, Elias Mårtenson <[email protected]> >> wrote: >> >>> All right, I have re-run the Sieve benchmark on GNU APL as of svn id 669. >>> >>> I do see differences, but they are not great. I also ran it on a reduced >>> case in order to have it finish quicker, so the breakdown could be >>> incorrect. I will re-run this on the full (SIEVE 100000) test case and >>> report back. >>> >>> I can still see Clone taking ridiculous amounts of time though. >>> >>> Regards, >>> Elias >>> >>> On 29 August 2015 at 11:42, Elias Mårtenson <[email protected]> wrote: >>> >>>> I never analysed the Sieve benchmark. I'll take a look at it tonight. >>>> >>>> Regards, >>>> Elias >>>> On 29 Aug 2015 06:51, "Mike Duvos" <[email protected]> wrote: >>>> >>>>> Hi Fred, >>>>> >>>>> I just built SVN 664 on the Dell Dimension I did the original >>>>> factorial benchmark on. The time went from 52 seconds to 23. So that is >>>>> indeed a dramatic improvement. >>>>> >>>>> I wonder why my Prime Sieve didn't speed up much. >>>>> >>>>> Regards, >>>>> >>>>> Mike >>>>> >>>>> >>>>> >>>>> On Fri, Aug 28, 2015 at 11:28 AM, Fred Weigel <[email protected]> >>>>> wrote: >>>>> >>>>>> Juergen >>>>>> >>>>>> I just built 664 -- the performance of the factorial 300 that Mike >>>>>> Duvos >>>>>> posted has indeed improved enormously - now running 8 seconds instead >>>>>> of >>>>>> 20 on my reference platform. >>>>>> >>>>>> In line with expectations from Elias' profiling results. >>>>>> >>>>>> Very well done! >>>>>> >>>>>> FredW >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>> >> >
