Re: [julia-users] 900mb csv loading in Julia failed: memory comparison vs python pandas and R

Yichao Yu Tue, 13 Oct 2015 14:13:16 -0700

On Tue, Oct 13, 2015 at 4:21 PM, Grey Marsh <[email protected]> wrote:
> I was using 0.3.10. How would gc come in the picture? I mean gc would be
> called only after the file has been read into memory completely. nyway let
> me try the  v0.4.


Well, there are also intermediate objects that needs to be allocated.

See https://github.com/JuliaLang/julia/issues/10428
and https://github.com/JuliaLang/julia/pull/12632

>
> On Wednesday, October 14, 2015 at 1:17:58 AM UTC+5:30, Yichao Yu wrote:
>>
>>
>> On Oct 13, 2015 2:47 PM, "Grey Marsh" <[email protected]> wrote:
>>
>> Which julia version are you using. There's sime gc tweak on 0.4 for that.
>>
>> >
>> > I was trying to load the training dataset from springleaf marketing
>> > response on Kaggle. The csv is 921 mb, has 145321 row and 1934 columns. My
>> > machine has 8 gb ram and julia ate 5.8gb+ memory after that I stopped julia
>> > as there was barely any memory left for OS to function properly. It took
>> > about 5-6 minutes later for the incomplete operation. I've windows 8  
>> > 64bit.
>> > Used the following code to read the csv to Julia:
>> >
>> > using DataFrames
>> > train = readtable("C:\\train.csv")
>> >
>> > Next I tried to to load the same file in python:
>> >
>> > import pandas as pd
>> > train = pd.read_csv("C:\\train.csv")
>> >
>> > This took ~2.4gb memory, about a minute time
>> >
>> > Checking the same in R again:
>> > df = read.csv('E:/Libraries/train.csv', as.is = T)
>> >
>> > This took 2-3 minutes and consumes 3.5gb mem on the same machine.
>> >
>> > Why such discrepancy and why Julia even fails to load the csv before
>> > running out of memory? If there is any better way to get the file loaded in
>> > Julia?
>> >
>> >

Re: [julia-users] 900mb csv loading in Julia failed: memory comparison vs python pandas and R

Reply via email to