[julia-users] Some DataFrames questions

Nils Gudat Wed, 20 May 2015 08:18:08 -0700

I have two questions regarding the usage of DataFrame:

1. How can I subset a DataFrame based on multiple criteria (similar to the 
pandas np.logical_and)?
Consider:


df = DataFrame(A = 1:3, B = 1:3)

How do I get the subset of the DataFrame for which (for simplicity) A and B 
are 1? df[:A].==1 and df[:B].==1 give me boolean arrays, but I can't find 
any way of combining them to give me a single boolean mask - things like 
df[df[:A].==1 & df[:B].==1] won't work, and my first idea of a workaround 
df[ (df[:A].==1 + df[:B].==1)==2 ] fails as well, as for some reason adding 
the two boolean arrays gives me false even for the first entry (which 
should be true+true).

2. How do I deal with NA's when indexing? Consider:

df = DataFrame(A = 1:3, B = 1:3, C = @data([1,2,NA]))

Here, df[df[:C].==1, :] fails with NAException("cannot index an array with 
a DataArray containing NA values"). One way around this would be 
df[array(df[:C].==1, false), :] - is this the "correct" way of doing it or 
are there other indexing methods that automatically deal with NAs?

[julia-users] Some DataFrames questions

Reply via email to