I don't think whether it is EOD or intraday is relevant .
In regards to your other comments . I don't think the curve ever fully recovers to where it would have been had there been say a gig of L2 cache . But I believe the per symbol optimization time will continue to decrease at an ever slower rate ad infinitum . The reason is there is probably some initialization time and some amount of overhead between each iteration as the AFL is reloaded or whatever . _____ From: [email protected] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Dugas Sent: Saturday, June 14, 2008 7:00 PM To: [email protected] Subject: Re: [amibroker] Multi Core Optimization, L2 Cache & Optimization Run Times Very interesting Fred, thanks! This looks encouraging, at least for us EOD guys. One thing I notice - at 32 tickers, it looks like the curve has "recovered" to what you might expect to see even if there was no dent at 16. And also, after 32 the curve seems to get a second wind, i.e. it "inverts" and the time per symbol decreases *more* rapidly as more tickers are added. What do you think might account for that? Is it just due to the log nature of the chart? Thanks! Steve ----- Original Message ----- From: Fred <mailto:[EMAIL PROTECTED]> Tonetti To: [EMAIL PROTECTED] <mailto:[email protected]> ps.com Sent: Saturday, June 14, 2008 5:49 PM Subject: [amibroker] Multi Core Optimization, L2 Cache & Optimization Run Times Given TJ's comments about: - The amount of memory utilized in processing symbols of data - Whether or not this would fit in the L2 cache - The effect it would have on optimizations when it didn't I finally got around to running a little benchmark for Multi Core Optimization using the program I wrote and posted ( MCO ) which I'll be posting a new version of shortly . These tests were run under the following conditions: - A less than state of the art laptop with o Core 2 Duo 1.86 Ghz processor o 2 MB of L2 Cache - Watch Lists of symbols each of which o Contains the next power of two number of symbols of the previous i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256 o Contains Symbols containing ~5000 bars of data . Given the above: - Each symbol should require 160,000 bytes i.e. ~5,000 bars * 32 bytes per bar - Loading more than 13 symbols should cause L2 cache misses to occur Results: - See the attached data & chart There are several interesting things I find regarding the results . - The "dent" in the curve looking left to right occurs right where you'd think it would, between 8 symbols and 16 symbols i.e. from the point at which all data can be loaded to and accessed from the L2 cache to the point where it no longer can . - The "dent" occurs in the same place running either one or two instances of AB - The "dent" while clearly visible is hardly traumatic in terms of run times - The relationship of run times between running one and two instances of AB is consistent at 40% savings in terms of run times regardless of the number of symbols. - This is also in line when one looks at how much CPU is utilized when running one instance of AB which on the test machine is typically in the 54 - 60% range. I have a new toy that I'll be trying these benchmarks on again shortly i.e. a dual core 2 duo quad 3.0 ghz .
