Glad to hear you fixed it, and that there isn't a deeper problem. Re awk, I've not done a lot with external processes (in fact, in Images I spent a lot of time wrapping ImageMagick directly to _avoid_ using external processes because interacting with them can be awfully slow). So it would take me a while to figure that out, sorry.
--Tim On Wednesday, May 28, 2014 02:07:39 AM Tomas Lycken wrote: > I found another problem, which I caused all by my sorry little self: I was > caching all the dataframes, effectively forcing them *not* to be garbage > collected. If facepalm had a face... :P Disabling that does a lot toward my > goal - processing 10k traces (10% of my data) now takes a little less than > 20 minutes, with no discernable memory management issues. > > Any ideas on piping the awk output to a dataframe? > > // T > > On Wednesday, May 28, 2014 10:57:26 AM UTC+2, Tim Holy wrote: > > I doubt that there's going to be a way to modify running code on the fly > > from > > another process anytime soon. I suspect the solution will be a better > > garbage- > > collector (#5227). > > > > Since your process has now crashed (sorry to hear it), you could insert > > > > (i % 1000 == 0) && gc() > > > > in your loop. It's just unfortunate that it will take so long to find out > > whether this works. > > > > --Tim > > > > On Tuesday, May 27, 2014 08:31:41 AM Tomas Lycken wrote: > > > I started a Julia script that processes a very large set of data, by > > > reading a large number (100k) of quite small text files, doing some > > > calculations, and aggregating the results. After running for a while > > > > I've > > > > > noticed that there seems to be some memory management issues, that I > > > suspect are just inefficient garbage collection. With some > > > > pseudo-elements, > > > > > my script does something like this: > > > > > > function process_all_the_stuff() > > > > > > results1 = Float64[] > > > results2 = Float64[] > > > for i in 1:1e5 > > > > > > thisdata = read_text_file_with_index(i) > > > thisresult1 = do_calculation_1(thisdata) > > > thisresult2 = do_calculation_2(thisdata) > > > push!(results1, thisresult1) > > > push!(results2, thisresult2) > > > > > > end > > > results1, results2 > > > > > > end > > > > > > I've come about halfway, and htop looks like this: > > > > > > < > > > > https://lh3.googleusercontent.com/-rFSwZ9UtvIg/U4SvG5EL4xI/AAAAAAAAAMY/QYY > > b > > > > > NCv-6l0/s1600/htop.png> > > > > > > As you see, I'm about to run out of memory. Is there any way I can > > > > "inject" > > > > > a call to gc(), say, at the end of the loop body, without interrupting > > > > the > > > > > script and loosing all the work done so far? Or will Julia do so, when > > > > (if) > > > > > she realizes memory is (too) scarce? > > > > > > If there isn't a way to do this, see this as the first step toward a > > > feature request :P > > > > > > // Tomas
