I have two questions regarding the usage of DataFrame:
1. How can I subset a DataFrame based on multiple criteria (similar to the
pandas np.logical_and)?
Consider:
df = DataFrame(A = 1:3, B = 1:3)
How do I get the subset of the DataFrame for which (for simplicity) A and B
are 1? df[:A].==1 and df[:B].==1 give me boolean arrays, but I can't find
any way of combining them to give me a single boolean mask - things like
df[df[:A].==1 & df[:B].==1] won't work, and my first idea of a workaround
df[ (df[:A].==1 + df[:B].==1)==2 ] fails as well, as for some reason adding
the two boolean arrays gives me false even for the first entry (which
should be true+true).
2. How do I deal with NA's when indexing? Consider:
df = DataFrame(A = 1:3, B = 1:3, C = @data([1,2,NA]))
Here, df[df[:C].==1, :] fails with NAException("cannot index an array with
a DataArray containing NA values"). One way around this would be
df[array(df[:C].==1, false), :] - is this the "correct" way of doing it or
are there other indexing methods that automatically deal with NAs?