I found another problem, which I caused all by my sorry little self: I was caching all the dataframes, effectively forcing them *not* to be garbage collected. If facepalm had a face... :P Disabling that does a lot toward my goal - processing 10k traces (10% of my data) now takes a little less than 20 minutes, with no discernable memory management issues.
Any ideas on piping the awk output to a dataframe? // T On Wednesday, May 28, 2014 10:57:26 AM UTC+2, Tim Holy wrote: > > I doubt that there's going to be a way to modify running code on the fly > from > another process anytime soon. I suspect the solution will be a better > garbage- > collector (#5227). > > Since your process has now crashed (sorry to hear it), you could insert > (i % 1000 == 0) && gc() > in your loop. It's just unfortunate that it will take so long to find out > whether this works. > > --Tim > > On Tuesday, May 27, 2014 08:31:41 AM Tomas Lycken wrote: > > I started a Julia script that processes a very large set of data, by > > reading a large number (100k) of quite small text files, doing some > > calculations, and aggregating the results. After running for a while > I've > > noticed that there seems to be some memory management issues, that I > > suspect are just inefficient garbage collection. With some > pseudo-elements, > > my script does something like this: > > > > function process_all_the_stuff() > > results1 = Float64[] > > results2 = Float64[] > > for i in 1:1e5 > > thisdata = read_text_file_with_index(i) > > thisresult1 = do_calculation_1(thisdata) > > thisresult2 = do_calculation_2(thisdata) > > push!(results1, thisresult1) > > push!(results2, thisresult2) > > end > > results1, results2 > > end > > > > I've come about halfway, and htop looks like this: > > > > < > https://lh3.googleusercontent.com/-rFSwZ9UtvIg/U4SvG5EL4xI/AAAAAAAAAMY/QYYb > > NCv-6l0/s1600/htop.png> > > > > As you see, I'm about to run out of memory. Is there any way I can > "inject" > > a call to gc(), say, at the end of the loop body, without interrupting > the > > script and loosing all the work done so far? Or will Julia do so, when > (if) > > she realizes memory is (too) scarce? > > > > If there isn't a way to do this, see this as the first step toward a > > feature request :P > > > > // Tomas >
