Glad to hear you fixed it, and that there isn't a deeper problem.

Re awk, I've not done a lot with external processes (in fact, in Images I 
spent a lot of time wrapping ImageMagick directly to _avoid_ using external 
processes because interacting with them can be awfully slow). So it would take 
me a while to figure that out, sorry.

--Tim


On Wednesday, May 28, 2014 02:07:39 AM Tomas Lycken wrote:
> I found another problem, which I caused all by my sorry little self: I was
> caching all the dataframes, effectively forcing them *not* to be garbage
> collected. If facepalm had a face... :P Disabling that does a lot toward my
> goal - processing 10k traces (10% of my data) now takes a little less than
> 20 minutes, with no discernable memory management issues.
> 
> Any ideas on piping the awk output to a dataframe?
> 
> // T
> 
> On Wednesday, May 28, 2014 10:57:26 AM UTC+2, Tim Holy wrote:
> > I doubt that there's going to be a way to modify running code on the fly
> > from
> > another process anytime soon. I suspect the solution will be a better
> > garbage-
> > collector (#5227).
> > 
> > Since your process has now crashed (sorry to hear it), you could insert
> > 
> >    (i % 1000 == 0) && gc()
> > 
> > in your loop. It's just unfortunate that it will take so long to find out
> > whether this works.
> > 
> > --Tim
> > 
> > On Tuesday, May 27, 2014 08:31:41 AM Tomas Lycken wrote:
> > > I started a Julia script that processes a very large set of data, by
> > > reading a large number (100k) of quite small text files, doing some
> > > calculations, and aggregating the results. After running for a while
> > 
> > I've
> > 
> > > noticed that there seems to be some memory management issues, that I
> > > suspect are just inefficient garbage collection. With some
> > 
> > pseudo-elements,
> > 
> > > my script does something like this:
> > > 
> > > function process_all_the_stuff()
> > > 
> > >     results1 = Float64[]
> > >     results2 = Float64[]
> > >     for i in 1:1e5
> > >     
> > >         thisdata = read_text_file_with_index(i)
> > >         thisresult1 = do_calculation_1(thisdata)
> > >         thisresult2 = do_calculation_2(thisdata)
> > >         push!(results1, thisresult1)
> > >         push!(results2, thisresult2)
> > >     
> > >     end
> > >     results1, results2
> > > 
> > > end
> > > 
> > > I've come about halfway, and htop looks like this:
> > > 
> > > <
> > 
> > https://lh3.googleusercontent.com/-rFSwZ9UtvIg/U4SvG5EL4xI/AAAAAAAAAMY/QYY
> > b
> > 
> > > NCv-6l0/s1600/htop.png>
> > > 
> > > As you see, I'm about to run out of memory. Is there any way I can
> > 
> > "inject"
> > 
> > > a call to gc(), say, at the end of the loop body, without interrupting
> > 
> > the
> > 
> > > script and loosing all the work done so far? Or will Julia do so, when
> > 
> > (if)
> > 
> > > she realizes memory is (too) scarce?
> > > 
> > > If there isn't a way to do this, see this as the first step toward a
> > > feature request :P
> > > 
> > > // Tomas

Reply via email to