Matthew Dowle wrote
Hi Berto,
I think we may be experiencing a language barrier here. This is
datatable-help; i.e.,
not r-help. You *can* write in another language on this list, if
you'd
like to, in
case someone else here understands better. The rules are less strict
here. Nobody has yet
done so, but there is no rule against it. Why not?
Proceeding in English for now ...
Sorry for my loose English writing, I'll try to be more precise this
time.
If this does not work, I'll switch the language.
Matthew Dowle wrote
I like the lack of spaces, but what do the * mean? In other words,
you've presented
a line of code :
DT[y>=3&*v<=7&w<=7*,sum(y), by=x]
but that doesn't actually evaluate to anything, does it? So that's
pseudo-code. I don't even need to copy and paste that into R to know
it's invalid. That cannot possible give the expected result,
because of
the "*" characters.
I don't have these asteriks in my message, but this part of the code
was
highlighted in bold as *here*.
I've run all code and copy-pasted in the post.
Matthew Dowle wrote
Might you be looking for something like :
sapply(.SD, `<`, 7)
? Dunno. Guessing.
I've tried the sapply suggestion but the result is a logical matrix,
not
what I am looking for:
DT3 = data.table( x=rep ( c("a","b","c"), each=3), y = c(1,3,6), u =
2:10,
v = 1:9, w = 3:11)
DT3[y>=3, sapply(.SD, `<`, 7),]
x y u v w
[1,] FALSE TRUE TRUE TRUE TRUE
[2,] FALSE TRUE TRUE TRUE TRUE
[3,] FALSE TRUE TRUE TRUE FALSE
[4,] FALSE TRUE FALSE TRUE FALSE
[5,] FALSE TRUE FALSE FALSE FALSE
[6,] FALSE TRUE FALSE FALSE FALSE
Matthew Dowle wrote
But, focussing on this part of your email :
But if the number of columns grows, I can't specify all columns
anymore,
maybe should I use column names?
You actually do, really, honestly, need to show us, physically, in
email, what you mean. Columns
of what? Growing how? Show us 2,3,4,5 columns. Show us the manual
way.
Show us the input and
show us the output.
Your email can be very long. It can contain very little English.
But
you
actually need to show what the output is you would like, for me (at
least)
to understand.
What I am certain of is that whatever you want to do is possible.
And
if it isn't, then
we will likely enhance data.table to do it.
I see the verb "grow" is misleading here, I meant "if the number of
columns
is high".
I want this code for a data.table with 1000s of rows and 10s - 100s
of
"other numeric columns".
Find below 2 examples: DT3 and DT5 (input, subset, output).
#Example data table (DT3) with 1 character column (x), 1 numeric
column of
interest (y) and 3 other numeric columns (u, v, w) - input
DT3 = data.table( x=rep ( c("a","b","c"), each=3), y = c(1,3,6), u =
2:10,
v = 1:9, w = 3:11)
DT3
x y u v w
1: a 1 2 1 3
2: a 3 3 2 4
3: a 6 4 3 5
4: b 1 5 4 6
5: b 3 6 5 7
6: b 6 7 6 8
7: c 1 8 7 9
8: c 3 9 8 10
9: c 6 10 9 11
# What I want to subset from DT3:
DT3[y>=3 & u <=7 & v <=7 & w <=7, ] # but this way, I need to write
all
column names
x y u v w
1: a 3 3 2 4
2: a 6 4 3 5
3: b 3 6 5 7
# Sum(y) calculation in this subset from DT3 - output
DT3[y>=3 & u <=7 & v <=7 & w <=7, sum(y), by=x]
x V1
1: a 9
2: b 3
# Mean(y) calculation in this subset from DT3 - output
DT3[y>=3 & u <=7 & v <=7 & w <=7, mean(y), by=x]
x V1
1: a 4.5
2: b 3.0
# Max(y) in this subset from DT3 - output
DT3[y>=3 & u <=7 & v <=7 & w <=7, max(y), by=x]
x V1
1: a 6
2: b 3
#Another example data table (DT5) with 1 character column (x), 1
numeric
column of interest (y) and 5 other numeric columns (s, t, u, v, w) -
input
DT5 = data.table( x=rep ( c("a","b","c"), each=3), y = c(1,3,6), s =
4:12, t
= 0:8, u = 2:10, v = 1:9, w = 3:11)
DT5
x y s t u v w
1: a 1 4 0 2 1 3
2: a 3 5 1 3 2 4
3: a 6 6 2 4 3 5
4: b 1 7 3 5 4 6
5: b 3 8 4 6 5 7
6: b 6 9 5 7 6 8
7: c 1 10 6 8 7 9
8: c 3 11 7 9 8 10
9: c 6 12 8 10 9 11
# Same subset from DT5 as used in DT3:
DT5[y>=3 & u <=7 & v <=7 & w <=7,]
x y s t u v w
1: a 3 5 1 3 2 4
2: a 6 6 2 4 3 5
3: b 3 8 4 6 5 7
# Now filtering by s and t columns as well, excludes "3: b 3 8 4 6 5
7" as
s=8.
DT5[y>=3 & s <=7 & t <=7 & u <=7 & v <=7 & w <=7,] ## it works, but
inconvenient for 10s-100s of columns
x y s t u v w
1: a 3 5 1 3 2 4
2: a 6 6 2 4 3 5
# Doing a sum(y) calculation over this subset from DT5 - output
DT5[y>=3 & s <=7 & t <=7 & u <=7 & v <=7 & w <=7, sum(y), by=x]
x V1
1: a 9
# Doing a mean(y) calculation over this subset from DT5 - output
DT5[y>=3 & s <=7 & t <=7 & u <=7 & v <=7 & w <=7, mean(y), by=x]
x V1
1: a 4.5
All the outputs shown here match the expectations, but it's not
useful for
10s-100s of columns.
I hope the desired output is clear now; ignore my past attempts to
avoid
confusion.
Thanks again!
B
--
View this message in context:
http://r.789695.n4.nabble.com/Subsetting-columns-in-data-table-tp4649736p4649841.html
Sent from the datatable-help mailing list archive at Nabble.com.
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help