Tony, it's not possible to manipulate a subdataframe like that with the current code. You can make something that works close, though. In Julia v0.4, you can create a DataFrame with columns that are views into the original columns using `sub`. Here is an example:
df = DataFrame(a = rand(1:3, 10), x = rand(10), y = rand(10)) idx = find(df[:a] .== 1) sd = DataFrame(Any[sub(df[i], idx) for i in 1:size(df, 2)], names(df)) Try `dump(sd)` to view the structure of the result. Now, you can use `sd` like a SubDataFrame, but you can do any normal DataFrame manipulation. If the starting point is a SubDataFrame, you can convert it as follows: sdf = sub(df, df[:a] .== 1) sd2 = DataFrame(Any[sub(sdf.parent[i], idx) for i in 1:size(sdf, 2)], names(sdf)) One huge improvement of this is approach is that `sd[:colA]` doesn't allocate anything. With the existing SubDataFrames implementation, a column indexing operation like that does a row indexing operation into the parent DataFrame, so it will allocate memory. That differs from a DataFrame where column indexing never allocates anything. I've been thinking that this might be a good change in general for `sub`. If Base Arrays switch to views by default, this might match better with that. In fact, we could get rid of `sub` and do this for all row indexing operations. It could use some more thought. This approach doesn't work in Julia v0.3 because `sub` did not support arbitrary indexes in v0.3. On Fri, Dec 12, 2014 at 1:58 AM, Tony Fong <[email protected]> wrote: > > > In a vanilla dataframe, I can do this > df[ :a ] = mydarray > > It doesn't seem to allow me to do the same for a subdataframe returned by > a groupby. > > I'm trying to compute a new column based on the subdataframe, attach this > column to it and then do further groupby. > > Is there a way to do that? > > Tony >
