I was using 0.3.10. How would gc come in the picture? I mean gc would be called only after the file has been read into memory completely. nyway let me try the v0.
On Wednesday, October 14, 2015 at 1:17:58 AM UTC+5:30, Yichao Yu wrote: > > > On Oct 13, 2015 2:47 PM, "Grey Marsh" <[email protected] <javascript:>> > wrote: > > Which julia version are you using. There's sime gc tweak on 0.4 for that. > > > > > I was trying to load the training dataset from springleaf marketing > response on Kaggle. The csv is 921 mb, has 145321 row and 1934 columns. My > machine has 8 gb ram and julia ate 5.8gb+ memory after that I stopped julia > as there was barely any memory left for OS to function properly. It took > about 5-6 minutes later for the incomplete operation. I've windows 8 > 64bit. Used the following code to read the csv to Julia: > > > > using DataFrames > > train = readtable("C:\\train.csv") > > > > Next I tried to to load the same file in python: > > > > import pandas as pd > > train = pd.read_csv("C:\\train.csv") > > > > This took ~2.4gb memory, about a minute time > > > > Checking the same in R again: > > df = read.csv('E:/Libraries/train.csv', as.is = T) > > > > This took 2-3 minutes and consumes 3.5gb mem on the same machine. > > > > Why such discrepancy and why Julia even fails to load the csv before > running out of memory? If there is any better way to get the file loaded in > Julia? > > > > >
