I have found that "old fashoined" batch programming techniques can help improve
efficiencies when working with large volumes of data. This approached evolved
when resources where scarce and we all bowed down before the alter of the CPU
in the temple of the Glass House.

The majority of "untuned" programs that I have profiled spend the most time in
memory management.  This approach can require multiple passes of the data
read, write, read, write, write, read (oops a bug) ad infinitum ad nausium.

The actual number depends on the set of transformations to be performed and
composition (internal relationships) and sequence of data. If natural cycles
can be identified the number of passes can be reduced even to one.

Natural cycles is a 50 cent name for BATCHING or grouping your processing. (I
have to use those expensive phases to prove to my parents that I actually went
to classes and that their money was well spent.) At times it may be
counter-intuitive (ooh that was a good one) but it is not uncommon to achive a
10 to 1 or better reduction.

Efficencies are gained by using stable work areas/buffers and simplifing
processing at each stage. This has advantages at the lowest system levels; i.e.
data flow and instruction flow through the cache(s) and system bus. Any I/O
should be in LARGE chucks, some multiple of page, buffer and disk segment size.
You could use 4, 8, 16, 32, 64... K or even mega bytes. Have some fun, knck
your brains out trying different sizes and different approaches. Very quickly
you will have a better understanding of what I call the Data Physics of your
environment.

It would also be very helpful if you used some form of "instrumentation". I
could be a full profiled version of your program(s) or roll your own timer and
resource usage functions.

If you have the data you will understand.

Hope this helps.

-- 

David Ross

[EMAIL PROTECTED]
Toad Technologies

"I'll be good! I will, I will !"

Reply via email to