Re #1: Have you looked into the DataFramesMeta.jl experimental package? https://github.com/JuliaStats/DataFramesMeta.jl
It may be able to help you, though I'm not sure. See in particular this issue: https://github.com/JuliaStats/DataFramesMeta.jl/issues/13. On Wednesday, May 20, 2015 at 11:17:28 AM UTC-4, Nils Gudat wrote: > > I have two questions regarding the usage of DataFrame: > > 1. How can I subset a DataFrame based on multiple criteria (similar to the > pandas np.logical_and)? > Consider: > > df = DataFrame(A = 1:3, B = 1:3) > > How do I get the subset of the DataFrame for which (for simplicity) A and > B are 1? df[:A].==1 and df[:B].==1 give me boolean arrays, but I can't find > any way of combining them to give me a single boolean mask - things like > df[df[:A].==1 & df[:B].==1] won't work, and my first idea of a workaround > df[ (df[:A].==1 + df[:B].==1)==2 ] fails as well, as for some reason adding > the two boolean arrays gives me false even for the first entry (which > should be true+true). > > 2. How do I deal with NA's when indexing? Consider: > > df = DataFrame(A = 1:3, B = 1:3, C = @data([1,2,NA])) > > Here, df[df[:C].==1, :] fails with NAException("cannot index an array with > a DataArray containing NA values"). One way around this would be > df[array(df[:C].==1, false), :] - is this the "correct" way of doing it or > are there other indexing methods that automatically deal with NAs? >
