On Thu, Jan 27, 2011 at 4:43 PM, Simon R. Proud <[email protected]> wrote:

> Thanks for the reply!
>
>>Are you seeing a lot of disk activity after the data have been loaded
>>into memory? That would indicate
>>excessive swapping. Low CPU usage (CPU is waiting on I/O) is another
>>indicator. There are usually some OS-specific tools to gather
>>statistics on vm usage and swapping. Are the data on a local disk or
>>a network server?
>
> The entire thing is being run on a cluster, so I can't check disk activity -
> but the data is local to the program.
> However, I can see that the program is fast at loading the first 60ish
> files, and then slows down. As soon as that slowdown occurs I also see
> virtual memory useage increase, so I assume it's loading data into VM rather
> than physical RAM.
>
>>You need to tell us more about how the data are used. One common
>>example is where the calculation is repeated for each (i,j) coord. all
>>100+ files, so there is no need to store complete arrays, but you want
>>parts of all arrays to be stored at the same time. Another is a
>>calculation that uses data from one array at a time, so there is no
>>need to store more than one array at a time.
>
> Yes, I'm performing the former - processing each i,j element individually.
> It is remote sensing data, with each file being a separate observation, so
> what I'm doing is processing a timeseries on a per-pixel basis.
> As you say, there's no need to store the complete arrays, but my attempts at
> loading only a small hyperslab (corresponding to one row of the input
> images) have not been successful.
>
> Hope that makes sense, and thanks again.
> Simon.

Ger van Diepen's suggestions make sense to me.  I know that some other
sites that offer time-series views of RS data create a separate copy
of the data organized as he suggests.   What I don't know is whether
it is still possible on a modern cluster and using hdf5 to take
advantage of memory-mapped I/O for this use-case.  Real life is more
complicated as we want to do this with "binned" (integerized
sinusoidal grid) data so don't have arrays.

-- 
George N. White III <[email protected]>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to