On Thu, Jan 27, 2011 at 11:47 AM, Simon R. Proud <[email protected]> wrote: > Hi all, > > I'm working on a project to read data from multiple HDF5 files for analysis. > Each file consists of 5 floating point datasets (each being 2000x2200 in > size) and there's between 100 and 120 files to read. > At the moment my code reads all the data from all the files into memory at > once, which is nice and simple but because of memory constraints I end up > using a lot of virtual memory....which is rather slow. >
So about 32MBytes each for over 500 arrays, or over 16GBytes. At one time I was running a calculation on O(50) such arrays stored in hdf4 on a system with 0.5G RAM (e.g., much smaller than the data. We had a version of the hdf4 library that used memory mapping, and each calculation only need a small part of each array, so by raising the limit on the maximum virtual memory that could be allocated the calculation ran with very modest physical I/O and modest run times. Are you seeing a lot of disk activity after the data have been loaded into memory? That would indicate excessive swapping. Low CPU usage (CPU is waiting on I/O) is another indicator. There are usually some OS-specific tools to gather statistics on vm usage and swapping. Are the data on a local disk or a network server? > I tried reading a hyperslab of each dataset (corresponding of 2000 elements) > from each file, but that turned out to be even slower than reading all the > data at once. > > So, do you have any suggestions as to the best way to read this data? Aside > from getting more memory for the computer! If your data are larger than real memory you need to arrange things so memory accesses don't jump around too much. You need to tell us more about how the data are used. One common example is where the calculation is repeated for each (i,j) coord. all 100+ files, so there is no need to store complete arrays, but you want parts of all arrays to be stored at the same time. Another is a calculation that uses data from one array at a time, so there is no need to store more than one array at a time. -- George N. White III <[email protected]> Head of St. Margarets Bay, Nova Scotia _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
