Nice!  I think I mentioned it before but, I have read in several places that 
the speed of CPU's is quickly approaching its practical limits and in the 
future the big speed gains will supposedly come from adding more cores/CPU's. 
If we can get 2 CPU's with 8 cores right now for a couple thousand $$, it seems 
that the trend is pretty well underway already. It sounds like, for TJ or any 
developers who want to take advantage of faster computers in the future, they 
will actually *have to* write software that takes advantage of parellel 
processing.  Do you think?

Steve
  ----- Original Message ----- 
  From: Fred Tonetti 
  To: [email protected] 
  Sent: Tuesday, June 17, 2008 7:10 PM
  Subject: RE: [amibroker] Multi Core Optimization, L2 Cache & Optimization Run 
Times


  Here are some results I got with my new toy .

  This is using a reasonably complex system on ~500 symbols over 10 years i.e. 
~2500 bars ...

   

  Cores    Time    Percent

   

  1          218                                                     

  2          114      52.29%

  3          79        36.24%

  4          62        28.44%

  5          52        23.85%

  6          46        21.10%

  7          41        18.81%

  8          37        16.97%

   

  As expected the higher you go the more overhead there is . but improvements 
like this are still well worth the effort . Especially on a single box .

   

   


------------------------------------------------------------------------------

  From: [email protected] [mailto:[EMAIL PROTECTED] On Behalf Of Steve 
Dugas
  Sent: Saturday, June 14, 2008 7:00 PM
  To: [email protected]
  Subject: Re: [amibroker] Multi Core Optimization, L2 Cache & Optimization Run 
Times

   

  Very interesting Fred, thanks!  This looks encouraging, at least for us EOD 
guys.

   

  One thing I notice - at 32 tickers, it looks like the curve has "recovered" 
to what you might expect to see even if there was no dent at 16. And also, 
after 32 the curve seems to get a second wind, i.e. it "inverts" and the time 
per symbol decreases *more* rapidly as more tickers are added. What do you 
think might account for that?  Is it just due to the log nature of the chart? 
Thanks!

   

  Steve

    ----- Original Message ----- 

    From: Fred Tonetti 

    To: [email protected] 

    Sent: Saturday, June 14, 2008 5:49 PM

    Subject: [amibroker] Multi Core Optimization, L2 Cache & Optimization Run 
Times

     

    Given TJ's comments about:

     

    -          The amount of memory utilized in processing symbols of data 

    -          Whether or not this would fit in the L2 cache 

    -          The effect it would have on optimizations when it didn't

     

    I finally got around to running a little benchmark for Multi Core 
Optimization using the program I wrote and posted ( MCO ) which I'll be posting 
a new version of shortly .

     

    These tests were run under the following conditions:

     

    -          A less than state of the art laptop with 

    o        Core 2 Duo 1.86 Ghz processor

    o        2 MB of L2 Cache

     

    -          Watch Lists of symbols each of which 

    o        Contains the next power of two number of symbols of the previous 
i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256

    o        Contains Symbols containing ~5000 bars of data .

     

    Given the above:

     

    -          Each symbol should require 160,000 bytes i.e. ~5,000 bars * 32 
bytes per bar

    -          Loading more than 13 symbols should cause L2 cache misses to 
occur

     

    Results:

     

    -          See the attached data & chart

     

    There are several interesting things I find regarding the results .

     

    -          The "dent" in the curve looking left to right occurs right where 
you'd think it would, between 8 symbols and 16 symbols i.e. from the point at 
which all data can be loaded to and accessed from the L2 cache to the point 
where it no longer can .

    -          The "dent" occurs in the same place running either one or two 
instances of AB

    -          The "dent" while clearly visible is hardly traumatic in terms of 
run times

    -          The relationship of run times between running one and two 
instances of AB is consistent at 40% savings in terms of run times regardless 
of the number of symbols.  

    -          This is also in line when one looks at how much CPU is utilized 
when running one instance of AB which on the test machine is typically in the 
54 - 60% range.

     

    I have a new toy that I'll be trying these benchmarks on again shortly i.e. 
a dual core 2 duo quad 3.0 ghz . 



------------------------------------------------------------------------------
  I am using the free version of SPAMfighter for private users.
  It has removed 480 spam emails to date.
  Paying users do not have this message in their emails.
  Try SPAMfighter for free now!
   

Reply via email to