Dear Eric,
thank you for the insight and suggestion. Reading between the lines I
developed the suspicion that the problem might be in the extension
function 'unpack_vdr_data'. Previously the last part of that was
Cmplx = PyComplex_FromCComplex(cmplx);
if (PyList_SetItem(Clist, time_point_count, Cmplx)) {
fprintf(stderr,"Could not set list item %d\n",time_point_count);
}
time_point_count++;
}
}
return Py_BuildValue("O", Clist);
}
Your response made me realize that Clist was being retained as a Python
object each time. The following works much better,
PyObject * result;
....
Cmplx = PyComplex_FromCComplex(cmplx);
if (diag) {
printf("Created Python complex object %d\n", time_point_count);
}
time_point_count++;
}
}
result = Py_BuildValue("O", Clist);
Py_CLEAR(Cmplx);
Py_CLEAR(Clist);
return result;
}
Memory still grows, but much more slowly and, for my purposes, it's no
longer a problem.
I will use the report_memory function to research this a little more.
Thanks so much,
Tom
Date: Sun, 06 Jun 2010 15:00:16 -1000
From: Eric Firing <efir...@hawaii.edu>
Subject: Re: [Numpy-discussion] memory usage question
To: numpy-discussion@scipy.org
On 06/06/2010 02:17 PM, Tom Kuiper wrote:
Greetings all.
I have a feeling that, coming at this with a background in FORTRAN and
C, I'm missing some subtlety, possibly of an OO nature. Basically, I'm
looping over very large data arrays and memory usage just keeps growing
even though I re-use the arrays. Below is a stripped down version of
what I'm doing. You'll recognize it as gulping a great quantity of data
(1 million complex samples), Fourier transforming these by 1000 sample
blocks into spectra, co-adding the spectra, and doing this 255 times,
for a grand 1000 point total spectrum. At iteration 108 of the outer
loop, I get a memory error. By then, according to 'top', ipython (or
python) is using around 85% of 3.5 GB of memory.
P = zeros(fft_size)
nsecs = 255
fft_size = 1000
for i in range(nsecs):
header,data = get_raw_record(fd_in)
num_bytes = len(data)
label, reclen, recver, softver, spcid, vsrid, schanid,
bits_per_sample, \
ksamps_per_sec, sdplr, prdx_dss_id, prdx_sc_id, prdx_pass_num, \
prdx_uplink_band,prdx_downlink_band, trk_mode, uplink_dss_id,
ddc_lo, \
rf_to_if_lo, data_error, year, doy, sec, data_time_offset, frov,
fro, \
frr, sfro,rf_freq, schan_accum_phase, (scpp0,scpp1,scpp2,scpp3), \
schan_label = header
# ksamp_per_sec = 1e3, number of complex samples in 'data' = 1e6
num_32bit_words = len(data)*8/BITS_PER_32BIT_WORD
cmplx_samp_per_word = (BITS_PER_32BIT_WORD/(2*bits_per_sample))
cmplx_samples =
unpack_vdr_data(num_32bit_words,cmplx_samp_per_word,data)
del(data) # This makes no difference
for j in range(0,ksamps_per_sec*1000/fft_size):
index = int(j*fft_size)
S = fft(cmplx_samples[index:index+fft_size])
P += S*conjugate(S)
del(cmplx_samples) # This makes no difference
if (i % 20) == 0:
gc.collect(0) # This makes no difference
P /= nsecs
sample_period = 1./ksamps_per_sec # kHz
f = fftfreq(fft_size, d=sample_period)
What am I missing?
I don't know, but I would suggest that you strip the example down
further: instead of reading data from a file, use numpy.random.randn to
generate fake data as needed. In other words, use only numpy
functions--no readers, no unpackers. Put this minimal script into a
file and run it from the command line, not in ipython. (Have you
verified that you get the same result running a standalone script from
the command line as running from ipython?) Put a memory-monitoring step
inside, maybe at each outer loop iteration. You can use the
matplotlib.cbook.report_memory function or similar:
def report_memory(i=0): # argument may go away
'return the memory consumed by process'
from subprocess import Popen, PIPE
pid = os.getpid()
if sys.platform=='sunos5':
a2 = Popen('ps -p %d -o osz' % pid, shell=True,
stdout=PIPE).stdout.readlines()
mem = int(a2[-1].strip())
elif sys.platform.startswith('linux'):
a2 = Popen('ps -p %d -o rss,sz' % pid, shell=True,
stdout=PIPE).stdout.readlines()
mem = int(a2[1].split()[1])
elif sys.platform.startswith('darwin'):
a2 = Popen('ps -p %d -o rss,vsz' % pid, shell=True,
stdout=PIPE).stdout.readlines()
mem = int(a2[1].split()[0])
return mem
I'm suspecting the problem may be in your data reader and/or unpacker,
not in the application of numpy functions. Also, ipython can confuse
the issue by keeping references to objects. In any case, with a simpler
test script and regular memory monitoring, it should be easier for you
to track down the problem.
Eric
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion