Very interesting Fred, thanks!  This looks encouraging, at least for us EOD 
guys.

One thing I notice - at 32 tickers, it looks like the curve has "recovered" to 
what you might expect to see even if there was no dent at 16. And also, after 
32 the curve seems to get a second wind, i.e. it "inverts" and the time per 
symbol decreases *more* rapidly as more tickers are added. What do you think 
might account for that?  Is it just due to the log nature of the chart? Thanks!

Steve
  ----- Original Message ----- 
  From: Fred Tonetti 
  To: [email protected] 
  Sent: Saturday, June 14, 2008 5:49 PM
  Subject: [amibroker] Multi Core Optimization, L2 Cache & Optimization Run 
Times


  Given TJ's comments about:

   

  -          The amount of memory utilized in processing symbols of data 

  -          Whether or not this would fit in the L2 cache 

  -          The effect it would have on optimizations when it didn't

   

  I finally got around to running a little benchmark for Multi Core 
Optimization using the program I wrote and posted ( MCO ) which I'll be posting 
a new version of shortly .

   

  These tests were run under the following conditions:

   

  -          A less than state of the art laptop with 

  o        Core 2 Duo 1.86 Ghz processor

  o        2 MB of L2 Cache

   

  -          Watch Lists of symbols each of which 

  o        Contains the next power of two number of symbols of the previous 
i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256

  o        Contains Symbols containing ~5000 bars of data .

   

  Given the above:

   

  -          Each symbol should require 160,000 bytes i.e. ~5,000 bars * 32 
bytes per bar

  -          Loading more than 13 symbols should cause L2 cache misses to occur

   

  Results:

   

  -          See the attached data & chart

   

  There are several interesting things I find regarding the results .

   

  -          The "dent" in the curve looking left to right occurs right where 
you'd think it would, between 8 symbols and 16 symbols i.e. from the point at 
which all data can be loaded to and accessed from the L2 cache to the point 
where it no longer can .

  -          The "dent" occurs in the same place running either one or two 
instances of AB

  -          The "dent" while clearly visible is hardly traumatic in terms of 
run times

  -          The relationship of run times between running one and two 
instances of AB is consistent at 40% savings in terms of run times regardless 
of the number of symbols.  

  -          This is also in line when one looks at how much CPU is utilized 
when running one instance of AB which on the test machine is typically in the 
54 - 60% range.

   

  I have a new toy that I'll be trying these benchmarks on again shortly i.e. a 
dual core 2 duo quad 3.0 ghz . 

   

Reply via email to