On Oct 13, 2015 2:47 PM, "Grey Marsh" <[email protected]> wrote:
Which julia version are you using. There's sime gc tweak on 0.4 for that.
>
> I was trying to load the training dataset from springleaf marketing
response on Kaggle. The csv is 921 mb, has 145321 row and 1934 columns. My
machine has 8 gb ram and julia ate 5.8gb+ memory after that I stopped julia
as there was barely any memory left for OS to function properly. It took
about 5-6 minutes later for the incomplete operation. I've windows 8
64bit. Used the following code to read the csv to Julia:
>
> using DataFrames
> train = readtable("C:\\train.csv")
>
> Next I tried to to load the same file in python:
>
> import pandas as pd
> train = pd.read_csv("C:\\train.csv")
>
> This took ~2.4gb memory, about a minute time
>
> Checking the same in R again:
> df = read.csv('E:/Libraries/train.csv', as.is = T)
>
> This took 2-3 minutes and consumes 3.5gb mem on the same machine.
>
> Why such discrepancy and why Julia even fails to load the csv before
running out of memory? If there is any better way to get the file loaded in
Julia?
>
>