I was trying to load the training dataset from springleaf marketing response
<https://www.kaggle.com/c/springleaf-marketing-response> on Kaggle. The csv
is 921 mb, has 145321 row and 1934 columns. My machine has 8 gb ram and
julia ate 5.8gb+ memory after that I stopped julia as there was barely any
memory left for OS to function properly. It took about 5-6 minutes later
for the incomplete operation. I've windows 8 64bit. Used the following
code to read the csv to Julia:
using DataFrames
train = readtable("C:\\train.csv")
Next I tried to to load the same file in python:
import pandas as pd
train = pd.read_csv("C:\\train.csv")
This took ~2.4gb memory, about a minute time
Checking the same in R again:
df = read.csv('E:/Libraries/train.csv', as.is = T)
This took 2-3 minutes and consumes 3.5gb mem on the same machine.
Why such discrepancy and why Julia even fails to load the csv before
running out of memory? If there is any better way to get the file loaded in
Julia?