Re: [julia-users] How do I add a column to a subdataframe?

Tom Short Fri, 12 Dec 2014 05:17:49 -0800

Tony, it's not possible to manipulate a subdataframe like that with the
current code. You can make something that works close, though. In Julia
v0.4, you can create a DataFrame with columns that are views into the
original columns using `sub`. Here is an example:

df = DataFrame(a = rand(1:3, 10), x = rand(10), y = rand(10))
idx = find(df[:a] .== 1)
sd = DataFrame(Any[sub(df[i], idx) for i in 1:size(df, 2)], names(df))

Try `dump(sd)` to view the structure of the result.

Now, you can use `sd` like a SubDataFrame, but you can do any normal
DataFrame manipulation. If the starting point is a SubDataFrame, you can
convert it as follows:

sdf = sub(df, df[:a] .== 1)
sd2 = DataFrame(Any[sub(sdf.parent[i], idx) for i in 1:size(sdf, 2)],
names(sdf))

One huge improvement of this is approach is that `sd[:colA]` doesn't
allocate anything. With the existing SubDataFrames implementation, a column
indexing operation like that does a row indexing operation into the parent
DataFrame, so it will allocate memory. That differs from a DataFrame where
column indexing never allocates anything.

I've been thinking that this might be a good change in general for `sub`.
If Base Arrays switch to views by default, this might match better with
that. In fact, we could get rid of `sub` and do this for all row indexing
operations. It could use some more thought.

This approach doesn't work in Julia v0.3 because `sub` did not support
arbitrary indexes in v0.3.

On Fri, Dec 12, 2014 at 1:58 AM, Tony Fong <[email protected]> wrote:
>
>
> In a vanilla dataframe, I can do this
> df[ :a ] = mydarray
>
> It doesn't seem to allow me to do the same for a subdataframe returned by
> a groupby.
>
> I'm trying to compute a new column based on the subdataframe, attach this
> column to it and then do further groupby.
>
> Is there a way to do that?
>
> Tony
>

Re: [julia-users] How do I add a column to a subdataframe?

Reply via email to