I started a Julia script that processes a very large set of data, by 
reading a large number (100k) of quite small text files, doing some 
calculations, and aggregating the results. After running for a while I've 
noticed that there seems to be some memory management issues, that I 
suspect are just inefficient garbage collection. With some pseudo-elements, 
my script does something like this:

function process_all_the_stuff()
    results1 = Float64[]
    results2 = Float64[]
    for i in 1:1e5
        thisdata = read_text_file_with_index(i)
        thisresult1 = do_calculation_1(thisdata)
        thisresult2 = do_calculation_2(thisdata)
        push!(results1, thisresult1)
        push!(results2, thisresult2)
    end
    results1, results2
end

I've come about halfway, and htop looks like this:

<https://lh3.googleusercontent.com/-rFSwZ9UtvIg/U4SvG5EL4xI/AAAAAAAAAMY/QYYbNCv-6l0/s1600/htop.png>

As you see, I'm about to run out of memory. Is there any way I can "inject" 
a call to gc(), say, at the end of the loop body, without interrupting the 
script and loosing all the work done so far? Or will Julia do so, when (if) 
she realizes memory is (too) scarce?

If there isn't a way to do this, see this as the first step toward a 
feature request :P

// Tomas

Reply via email to