Re: [R] combine filter() and select()

2020-08-20 Thread Hadley Wickham
On Wed, Aug 19, 2020 at 10:03 AM Ivan Calandra  wrote:
>
> Dear useRs,
>
> I'm new to the tidyverse world and I need some help on basic things.
>
> I have the following tibble:
> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>
> I want to subset the rows with "a" in the column "files", and keep only
> that column.
>
> So I did:
> myfile <- mytbl %>%
>   filter(grepl("a", files)) %>%
>   select(files)
>
> It works, but I believe there must be an easier way to combine filter()
> and select(), right?

Not in the tidyverse. As others have mentioned, both [ and subset() in
base R allow you to simultaneously subset rows and columns, but
there's no single verb in the tidyverse that does both. This is
somewhat informed by the observation that in data frames, unlike
matrices, rows and columns are not exchangeable, and you typically
want to express subsetting in rather different ways.

Hadley

-- 
http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine filter() and select()

2020-08-20 Thread Martin Morgan
A kind of hybrid answer is to use base::subset(), which supports non-standard 
evaluation (it searches for unquoted symbols like 'files' in the code line 
below in the object that is its first argument; %>% puts 'mytbl' in that first 
position) and row (filter) and column (select) subsets

> mytbl %>% subset(files %in% "a", files)
# A tibble: 1 x 1
  files
  
1 a

Or subset(grepl("a", files), files) if that was what you meant.

One important idea that the tidyverse implements is, in my opinion, 
'endomorphism' -- you get back the same type of object as you put in -- so I 
wouldn't use a base R idiom that returned a vector unless that were somehow 
essential for the next step in the analysis. 

There is value in having separate functions for filter() and select(), and 
probably there are edge cases where filter(), select(), and subset() behave 
differently, but for what it's worth subset() can be used to perform these 
operations individually

> mytbl %>% subset(, files)
# A tibble: 6 x 1
  files
  
1 a
2 b
3 c
4 d
5 e
6 f
> mytbl %>% subset(grepl("a", files), )
# A tibble: 1 x 2
  files  prop
   
1 a 1

Martin Morgan

On 8/20/20, 2:48 AM, "R-help on behalf of Ivan Calandra" 
 wrote:

Hi Jeff,

The code you show is exactly what I usually do, in base R; but I wanted
to play with tidyverse to learn it (and also understand when it makes
sense and when it doesn't).

And yes, of course, in the example I gave, I end up with a 1-cell
tibble, which could be better extracted as a length-1 vector. But my
real goal is not to end up with a single value or even a single column.
I just thought that simplifying my example was the best approach to ask
for advice.

But thank you for letting me know that what I'm doing is pointless!

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 19/08/2020 19:27, Jeff Newmiller wrote:
> The whole point of dplyr primitives is to support data frames... that is, 
lists of columns. When you pare your data frame down to one column you are 
almost certainly using the wrong tool for the job.
>
> So, sure, your code works... and it even does what you wanted in the 
dplyr style, but what a pointless exercise.
>
> grep( "a", mytbl$file, value=TRUE )
>
> On August 19, 2020 7:56:32 AM PDT, Ivan Calandra  wrote:
>> Dear useRs,
>>
>> I'm new to the tidyverse world and I need some help on basic things.
>>
>> I have the following tibble:
>> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
>> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>>
>> I want to subset the rows with "a" in the column "files", and keep only
>> that column.
>>
>> So I did:
>> myfile <- mytbl %>%
>>   filter(grepl("a", files)) %>%
>>   select(files)
>>
>> It works, but I believe there must be an easier way to combine filter()
>> and select(), right?
>>
>> Thank you!
>> Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine filter() and select()

2020-08-20 Thread Ivan Calandra
Hi Jeff,

The code you show is exactly what I usually do, in base R; but I wanted
to play with tidyverse to learn it (and also understand when it makes
sense and when it doesn't).

And yes, of course, in the example I gave, I end up with a 1-cell
tibble, which could be better extracted as a length-1 vector. But my
real goal is not to end up with a single value or even a single column.
I just thought that simplifying my example was the best approach to ask
for advice.

But thank you for letting me know that what I'm doing is pointless!

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 19/08/2020 19:27, Jeff Newmiller wrote:
> The whole point of dplyr primitives is to support data frames... that is, 
> lists of columns. When you pare your data frame down to one column you are 
> almost certainly using the wrong tool for the job.
>
> So, sure, your code works... and it even does what you wanted in the dplyr 
> style, but what a pointless exercise.
>
> grep( "a", mytbl$file, value=TRUE )
>
> On August 19, 2020 7:56:32 AM PDT, Ivan Calandra  wrote:
>> Dear useRs,
>>
>> I'm new to the tidyverse world and I need some help on basic things.
>>
>> I have the following tibble:
>> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
>> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>>
>> I want to subset the rows with "a" in the column "files", and keep only
>> that column.
>>
>> So I did:
>> myfile <- mytbl %>%
>>   filter(grepl("a", files)) %>%
>>   select(files)
>>
>> It works, but I believe there must be an easier way to combine filter()
>> and select(), right?
>>
>> Thank you!
>> Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine filter() and select()

2020-08-20 Thread Ivan Calandra
Dear Chris,

I didn't think about having the assignment at the end as you showed; it
indeed fits the pipe workflow better.

By "easy", I actually meant shorter. As you said, in base R, I usually
do that in 1 line, so I was hoping to do the same in tidyverse. But I'm
glad to hear that I'm using tidyverse the proper way :)

Best regards,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 19/08/2020 19:21, Chris Evans wrote:
> Inline
>
> - Original Message -
>> From: "Ivan Calandra" 
>> To: "R-help" 
>> Sent: Wednesday, 19 August, 2020 16:56:32
>> Subject: [R] combine filter() and select()
>> Dear useRs,
>>
>> I'm new to the tidyverse world and I need some help on basic things.
>>
>> I have the following tibble:
>> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
>> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>>
>> I want to subset the rows with "a" in the column "files", and keep only
>> that column.
>>
>> So I did:
>> myfile <- mytbl %>%
>>   filter(grepl("a", files)) %>%
>>   select(files)
>>
>> It works, but I believe there must be an easier way to combine filter()
>> and select(), right?
> I would write 
>
> mytbl %>%
>   filter(grepl("a", files)) %>%
>   select(files) -> myfile
>
> as I like to keep a sort of "top to bottom and left to right" flow when 
> writing in the tidyverse dialect of R but that's really not important.
>
> Apart from that I think what you've done is "proper tidyverse". To me another 
> difference between the dialects is that classical R often seems to put value 
> on, and make it easy, to do things with incredible few characters.  I think 
> the people who are brilliant at that sort of coding, and there are many on 
> this list, that sort of coding is also easy to read.  I know that Chinese is 
> easy to read if you grew up on it but to a bear of little brain like me, the 
> much more verbose style of tidyverse repays typing time with readability when 
> I come back to my code and, though I have little experience of this yet, when 
> I read other poeple's code.
>
> What did you think wasn't "easy" about what you wrote?
>
> Very best (all),
>
> Chris
>
>> Thank you!
>> Ivan
>>
>> --
>> Dr. Ivan Calandra
>> TraCEr, laboratory for Traceology and Controlled Experiments
>> MONREPOS Archaeological Research Centre and
>> Museum for Human Behavioural Evolution
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> https://www.researchgate.net/profile/Ivan_Calandra
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine filter() and select()

2020-08-19 Thread Jeff Newmiller
The whole point of dplyr primitives is to support data frames... that is, lists 
of columns. When you pare your data frame down to one column you are almost 
certainly using the wrong tool for the job.

So, sure, your code works... and it even does what you wanted in the dplyr 
style, but what a pointless exercise.

grep( "a", mytbl$file, value=TRUE )

On August 19, 2020 7:56:32 AM PDT, Ivan Calandra  wrote:
>Dear useRs,
>
>I'm new to the tidyverse world and I need some help on basic things.
>
>I have the following tibble:
>mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
>1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>
>I want to subset the rows with "a" in the column "files", and keep only
>that column.
>
>So I did:
>myfile <- mytbl %>%
>  filter(grepl("a", files)) %>%
>  select(files)
>
>It works, but I believe there must be an easier way to combine filter()
>and select(), right?
>
>Thank you!
>Ivan

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine filter() and select()

2020-08-19 Thread Chris Evans
Inline

- Original Message -
> From: "Ivan Calandra" 
> To: "R-help" 
> Sent: Wednesday, 19 August, 2020 16:56:32
> Subject: [R] combine filter() and select()

> Dear useRs,
> 
> I'm new to the tidyverse world and I need some help on basic things.
> 
> I have the following tibble:
> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
> 
> I want to subset the rows with "a" in the column "files", and keep only
> that column.
> 
> So I did:
> myfile <- mytbl %>%
>  filter(grepl("a", files)) %>%
>  select(files)
> 
> It works, but I believe there must be an easier way to combine filter()
> and select(), right?

I would write 

mytbl %>%
  filter(grepl("a", files)) %>%
  select(files) -> myfile

as I like to keep a sort of "top to bottom and left to right" flow when writing 
in the tidyverse dialect of R but that's really not important.

Apart from that I think what you've done is "proper tidyverse". To me another 
difference between the dialects is that classical R often seems to put value 
on, and make it easy, to do things with incredible few characters.  I think the 
people who are brilliant at that sort of coding, and there are many on this 
list, that sort of coding is also easy to read.  I know that Chinese is easy to 
read if you grew up on it but to a bear of little brain like me, the much more 
verbose style of tidyverse repays typing time with readability when I come back 
to my code and, though I have little experience of this yet, when I read other 
poeple's code.

What did you think wasn't "easy" about what you wrote?

Very best (all),

Chris

> 
> Thank you!
> Ivan
> 
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Small contribution in our coronavirus rigours: 
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/

Chris Evans  Visiting Professor, University of Sheffield 

I do some consultation work for the University of Roehampton 
 and other places
but  remains my main Email address.  I have a work web site 
at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see: 
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
   
https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/

If you want an Emeeting, I am trying to keep them to Thursdays and my diary is 
at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combine filter() and select()

2020-08-19 Thread Ivan Calandra
Dear useRs,

I'm new to the tidyverse world and I need some help on basic things.

I have the following tibble:
mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

I want to subset the rows with "a" in the column "files", and keep only
that column.

So I did:
myfile <- mytbl %>%
  filter(grepl("a", files)) %>%
  select(files)

It works, but I believe there must be an easier way to combine filter()
and select(), right?

Thank you!
Ivan

-- 
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.