I found another problem, which I caused all by my sorry little self: I was 
caching all the dataframes, effectively forcing them *not* to be garbage 
collected. If facepalm had a face... :P Disabling that does a lot toward my 
goal - processing 10k traces (10% of my data) now takes a little less than 
20 minutes, with no discernable memory management issues.

Any ideas on piping the awk output to a dataframe?

// T

On Wednesday, May 28, 2014 10:57:26 AM UTC+2, Tim Holy wrote:
>
> I doubt that there's going to be a way to modify running code on the fly 
> from 
> another process anytime soon. I suspect the solution will be a better 
> garbage- 
> collector (#5227). 
>
> Since your process has now crashed (sorry to hear it), you could insert 
>    (i % 1000 == 0) && gc() 
> in your loop. It's just unfortunate that it will take so long to find out 
> whether this works. 
>
> --Tim 
>
> On Tuesday, May 27, 2014 08:31:41 AM Tomas Lycken wrote: 
> > I started a Julia script that processes a very large set of data, by 
> > reading a large number (100k) of quite small text files, doing some 
> > calculations, and aggregating the results. After running for a while 
> I've 
> > noticed that there seems to be some memory management issues, that I 
> > suspect are just inefficient garbage collection. With some 
> pseudo-elements, 
> > my script does something like this: 
> > 
> > function process_all_the_stuff() 
> >     results1 = Float64[] 
> >     results2 = Float64[] 
> >     for i in 1:1e5 
> >         thisdata = read_text_file_with_index(i) 
> >         thisresult1 = do_calculation_1(thisdata) 
> >         thisresult2 = do_calculation_2(thisdata) 
> >         push!(results1, thisresult1) 
> >         push!(results2, thisresult2) 
> >     end 
> >     results1, results2 
> > end 
> > 
> > I've come about halfway, and htop looks like this: 
> > 
> > <
> https://lh3.googleusercontent.com/-rFSwZ9UtvIg/U4SvG5EL4xI/AAAAAAAAAMY/QYYb 
> > NCv-6l0/s1600/htop.png> 
> > 
> > As you see, I'm about to run out of memory. Is there any way I can 
> "inject" 
> > a call to gc(), say, at the end of the loop body, without interrupting 
> the 
> > script and loosing all the work done so far? Or will Julia do so, when 
> (if) 
> > she realizes memory is (too) scarce? 
> > 
> > If there isn't a way to do this, see this as the first step toward a 
> > feature request :P 
> > 
> > // Tomas 
>

Reply via email to