yuva...@gmail.com writes:

> I have a C++ program that does an economic simulation. The program
> have one function with many nested loops (3-5 levels of nesting), each
> run in order of 10-1000 times. All loop limits are constants.
> Inside the loops, simple calculations are done (exp is the heaviest),
> and some global matrices are accessed (both for reading and writing).
> The global matrices are very large ~10M.
>
> I'm using floats for most of my computations. Accuracy is not very
> important, and speed is the most
> important, because I need to later find a minimum for that function.
>
> currently I'm using the following compilation line:
>
> g++ -Wall -fmove-all-movables -fmerge-all-constants -funroll-loops -O3
> -o min_estimation min_estimation.cpp -lgsl
>
> with the following g++:
>
> gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)
>
> Do you have any suggestion for other compilation flags?
> Would I benefit from upgrading to newer compiler version?

I think the best optimization you could do here is to try to access
the matrices locally.  Check that the indices are incremented in such
a way that the minimum difference in address results.  The idea is to
try to use the data that is already in the L1 or L2 cache instead of
jumping all over the RAM and getting caches misses on each access.

Next, if you can process two halves, or four quarters of the matrices
independently, then you can easily create threads to process them
independently, and benefit from the multi-core processors that are
becoming common.

-- 
__Pascal Bourguignon__
_______________________________________________
help-gplusplus mailing list
help-gplusplus@gnu.org
http://lists.gnu.org/mailman/listinfo/help-gplusplus

Reply via email to