Thanks for the reply! >Are you seeing a lot of disk activity after the data have been loaded >into memory? That would indicate >excessive swapping. Low CPU usage (CPU is waiting on I/O) is another >indicator. There are usually some OS-specific tools to gather >statistics on vm usage and swapping. Are the data on a local disk or >a network server?
The entire thing is being run on a cluster, so I can't check disk activity - but the data is local to the program. However, I can see that the program is fast at loading the first 60ish files, and then slows down. As soon as that slowdown occurs I also see virtual memory useage increase, so I assume it's loading data into VM rather than physical RAM. >You need to tell us more about how the data are used. One common >example is where the calculation is repeated for each (i,j) coord. all >100+ files, so there is no need to store complete arrays, but you want >parts of all arrays to be stored at the same time. Another is a >calculation that uses data from one array at a time, so there is no >need to store more than one array at a time. Yes, I'm performing the former - processing each i,j element individually. It is remote sensing data, with each file being a separate observation, so what I'm doing is processing a timeseries on a per-pixel basis. As you say, there's no need to store the complete arrays, but my attempts at loading only a small hyperslab (corresponding to one row of the input images) have not been successful. Hope that makes sense, and thanks again. Simon.
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
