Thanks for the reply!

>Are you seeing a lot of disk activity after the data have been loaded
>into memory?  That would indicate
>excessive swapping.   Low CPU usage (CPU is waiting on I/O) is another
>indicator.   There are usually some OS-specific tools to gather
>statistics on vm usage and swapping.   Are the data on a local disk or
>a network server?

The entire thing is being run on a cluster, so I can't check disk activity - 
but the data is local to the program.
However, I can see that the program is fast at loading the first 60ish files, 
and then slows down. As soon as that slowdown occurs I also see virtual memory 
useage increase, so I assume it's loading data into VM rather than physical RAM.

>You need to tell us more about how the data are used. One common
>example is where the calculation is repeated for each (i,j) coord. all
>100+ files, so there is no need to store complete arrays, but you want
>parts of all arrays to be stored at the same time.  Another is a
>calculation that uses data from one array at a time, so there is no
>need to store more than one array at a time.

Yes, I'm performing the former - processing each i,j element individually. It 
is remote sensing data, with each file being a separate observation, so what 
I'm doing is processing a timeseries on a per-pixel basis.
As you say, there's no need to store the complete arrays, but my attempts at 
loading only a small hyperslab (corresponding to one row of the input images) 
have not been successful.

Hope that makes sense, and thanks again.
Simon.

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to