Finally was able to load it, but the process consumes a ton of memory.
julia> @time train = readtable("./test.csv");
124.575362 seconds (376.11 M allocations: 13.438 GB, 10.77% gc time)
On Tuesday, October 13, 2015 at 4:34:05 PM UTC-4, feza wrote:
>
> Same here on a 12gb ram machine
>
> _
> _ _ _(_)_ | A fresh approach to technical computing
> (_) | (_) (_) | Documentation: http://docs.julialang.org
> _ _ _| |_ __ _ | Type "?help" for help.
> | | | | | | |/ _` | |
> | | |_| | | | (_| | | Version 0.5.0-dev+429 (2015-09-29 09:47 UTC)
> _/ |\__'_|_|_|\__'_| | Commit f71e449 (14 days old master)
> |__/ | x86_64-w64-mingw32
>
> julia> using DataFrames
>
>
>
> julia> train = readtable("./test.csv");
>
> ERROR: OutOfMemoryError()
>
> in resize! at array.jl:452
>
> in readnrows! at
> C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:164
> in readtable! at
> C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:767
> in readtable at
> C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:847
> in readtable at
> C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:893
>
>
>
>
>
> On Tuesday, October 13, 2015 at 3:47:58 PM UTC-4, Yichao Yu wrote:
>>
>>
>> On Oct 13, 2015 2:47 PM, "Grey Marsh" <[email protected]> wrote:
>>
>> Which julia version are you using. There's sime gc tweak on 0.4 for that.
>>
>> >
>> > I was trying to load the training dataset from springleaf marketing
>> response on Kaggle. The csv is 921 mb, has 145321 row and 1934 columns. My
>> machine has 8 gb ram and julia ate 5.8gb+ memory after that I stopped julia
>> as there was barely any memory left for OS to function properly. It took
>> about 5-6 minutes later for the incomplete operation. I've windows 8
>> 64bit. Used the following code to read the csv to Julia:
>> >
>> > using DataFrames
>> > train = readtable("C:\\train.csv")
>> >
>> > Next I tried to to load the same file in python:
>> >
>> > import pandas as pd
>> > train = pd.read_csv("C:\\train.csv")
>> >
>> > This took ~2.4gb memory, about a minute time
>> >
>> > Checking the same in R again:
>> > df = read.csv('E:/Libraries/train.csv', as.is = T)
>> >
>> > This took 2-3 minutes and consumes 3.5gb mem on the same machine.
>> >
>> > Why such discrepancy and why Julia even fails to load the csv before
>> running out of memory? If there is any better way to get the file loaded in
>> Julia?
>> >
>> >
>>
>