hello all!
I am trying to do some work with DataFrames and i've found that subsetting
them in a loop is too slow... A workaround i found is to pull each column
into an array and working with them in the following manner:
using(DataFrames)
using(NumericExtensions)
function testMe(df)
a1 = df[1]
a2 = df[2]
a3 = df[3]
b1 = df[4]
b2 = df[5]
b3 = df[6]
b4 = df[7]
c1 = df[8]
c2 = df[9]
c3 = df[10]
theSum = Array(Float64, 1000)
output = ""
for i = 1:3, j = 4:7, k=8:9
if i == 1
sum!(theSum, a1)
elseif i ==2
sum!(theSum, a2)
elseif i == 3
sum!(theSum, a3)
end
if j == 1
sum!(theSum, b1)
elseif j ==2
sum!(theSum, b2)
elseif j == 3
sum!(theSum, b3)
elseif j == 4
sum!(theSum, b4)
end
if k == 1
sum!(theSum, c1)
elseif k ==2
sum!(theSum, c2)
elseif k == 3
sum!(theSum, c3)
end
output = [output, mean(theSum)]
end
return (output)
end
df = DataFrame(
A1 = randn(1000),
A2 = randn(1000),
A3 = randn(1000),
B1 = randn(1000),
B2 = randn(1000),
B3 = randn(1000),
B4 = randn(1000),
C1 = randn(1000),
C2 = randn(1000),
C3 = randn(1000)
)
output = testMe(df)
the problem there is the code become pretty lengthy when working with many
columns and testing the combination. I'm wondering if there's any tricks
in Julia that i'm not thinking of to make a task like this a little easier
from a code perspective.
Thank you for time!
Jason