Re: [R] How to understand the mentality behind tidyverse and ggplot2?

2020-11-18 Thread Roy Mendelssohn - NOAA Federal via R-help
Personally I liked two workshops Thomas Lin Pedersen gave:

https://www.youtube.com/watch?v=h29g21z0a68
https://www.youtube.com/watch?v=0m4yywqNPVY=5219s

-Roy

> On Nov 18, 2020, at 3:24 PM, John via R-help  wrote:
> 
> On Tue, 17 Nov 2020 12:43:21 -0500
> C W  wrote:
> 
>> Dear R list,
>> 
>> I am an old-school R user. I use apply(), with(), and which() in base
>> package instead of filter(), select(), separate() in Tidyverse. The
>> idea of pipeline (i.e. %>%) my code was foreign to me for a while. It
>> makes the code shorter, but sometimes less readable?
>> 
>> With ggplot2, I just don't understand how it is organized. Take this
>> code:
>> 
>>> ggplot(diamonds, aes(x=carat, y=price)) +
>>> geom_point(aes(color=cut)) +  
>> geom_smooth()
>> 
>> There are three plus signs. How do you know when to "add" and what to
>> "add"? I've seen more plus signs.
>> 
>> To me, aes() stands for aesthetic, meaning looks. So, anything
>> related to looks like points and smooth should be in aes().
>> Apparently, it's not the case.
>> 
>> So, how does ggplot2 work? Could someone explain this for an
>> old-school R user?
>> 
>> Thank you!
>> 
> A really short form is to consider that ggplot2 syntax defines an
> object, and then additional simply adds to it, which is what all the
> plus signs are.  Ideally, you can start a ggplot call with a
> designation of a target:
> 
> Instead of:
> ggplot(diamonds, aes(x=carat, y=price)) + ...
> 
> use something like"
> 
> fig1 <- ggplot(diamonds, aes(x=carat, y=price)) + ...
> 
> This creates an environment object that can then be further modified.
> Learning the syntax is a chore, but the output tends to be fine,
> especially for publications and final graphics. One the other hand it's
> slower and fussier than some of the more traditional approaches, which
> are what I would prefer for EDA. 
> 
> JWDougherty
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new street address***
110 McAllister Way
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: https://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to understand the mentality behind tidyverse and ggplot2?

2020-11-18 Thread John via R-help
On Tue, 17 Nov 2020 12:43:21 -0500
C W  wrote:

> Dear R list,
> 
> I am an old-school R user. I use apply(), with(), and which() in base
> package instead of filter(), select(), separate() in Tidyverse. The
> idea of pipeline (i.e. %>%) my code was foreign to me for a while. It
> makes the code shorter, but sometimes less readable?
> 
> With ggplot2, I just don't understand how it is organized. Take this
> code:
> 
> > ggplot(diamonds, aes(x=carat, y=price)) +
> > geom_point(aes(color=cut)) +  
> geom_smooth()
> 
> There are three plus signs. How do you know when to "add" and what to
> "add"? I've seen more plus signs.
> 
> To me, aes() stands for aesthetic, meaning looks. So, anything
> related to looks like points and smooth should be in aes().
> Apparently, it's not the case.
> 
> So, how does ggplot2 work? Could someone explain this for an
> old-school R user?
> 
> Thank you!
> 
A really short form is to consider that ggplot2 syntax defines an
object, and then additional simply adds to it, which is what all the
plus signs are.  Ideally, you can start a ggplot call with a
designation of a target:

Instead of:
ggplot(diamonds, aes(x=carat, y=price)) + ...

use something like"

fig1 <- ggplot(diamonds, aes(x=carat, y=price)) + ...

This creates an environment object that can then be further modified.
Learning the syntax is a chore, but the output tends to be fine,
especially for publications and final graphics. One the other hand it's
slower and fussier than some of the more traditional approaches, which
are what I would prefer for EDA. 

JWDougherty

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tutorial/vignette on modified Kneser Ney smoothing

2020-11-18 Thread Jim Lemon
Hi Gayathri,
Maybe the cmscu package?

https://github.com/jasonkdavis/r-cmscu

Jim

On Thu, Nov 19, 2020 at 6:30 AM Gayathri Nagarajan <
gayathri.nagara...@gmail.com> wrote:

> Hi Team
>
> Iam a new learner trying to build n gram models from text corpus and trying
> to understand the modified kneser Ney smoothing algorithm to code and build
> my word prediction model.
>
> Can someone point me to a vignette or tutorial that will help me learn this
> ?
>
> Thanks in advance for your help
>
> Regards
> Gayathri
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to understand the mentality behind tidyverse and ggplot2?

2020-11-18 Thread Duncan Murdoch

On 17/11/2020 12:43 p.m., C W wrote:

Dear R list,

I am an old-school R user. I use apply(), with(), and which() in base
package instead of filter(), select(), separate() in Tidyverse. The idea of
pipeline (i.e. %>%) my code was foreign to me for a while. It makes the
code shorter, but sometimes less readable?


Think of the pipe as pure syntactic sugar.  It doesn't really do 
anything, it just lets you write "f(g(x))" as "x %>% g() %>% f()" (where 
the parens "()" are optional).  Read it as "Take x and pass it to g(); 
take the result and pass it to f()", which is exactly how you'd read 
"f(g(x))".  The pipe  presents it in the same order as in English, which 
sometimes makes it a bit easier to read than the mathematical notation.


There's a lot more to tidyverse ideas besides the pipe.  The overview is 
in the "Tidyverse Manifesto" (a vignette in the tidyverse package), and 
details are in Grolemund and Wickham's book "R for Data Science".




With ggplot2, I just don't understand how it is organized. Take this code:


ggplot2 is much harder to understand, but Wickham's book "ggplot2: 
Elegant Graphics for Data Analysis" gives a really readable yet thorough 
description.





ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) +

geom_smooth()

There are three plus signs. How do you know when to "add" and what to
"add"? I've seen more plus signs.

To me, aes() stands for aesthetic, meaning looks. So, anything related to
looks like points and smooth should be in aes(). Apparently, it's not the
case.


Yes "aesthetic" was a really bad choice of word.


So, how does ggplot2 work? Could someone explain this for an old-school R
user?


Not in one email, but hopefully the references (which are both available 
online for free, or in a bookstore at some cost) can help.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to understand the mentality behind tidyverse and ggplot2?

2020-11-18 Thread Hadley Wickham
I'd recommend two places to get started:

* https://r4ds.had.co.nz/data-visualisation.html for a quick intro to
ggplot2 (and the rest of the book explains the general tidyverse
philosophy)

* https://ggplot2-book.org for the full details of ggplot2.

Hadley

On Wed, Nov 18, 2020 at 11:37 AM C W  wrote:
>
> Dear R list,
>
> I am an old-school R user. I use apply(), with(), and which() in base
> package instead of filter(), select(), separate() in Tidyverse. The idea of
> pipeline (i.e. %>%) my code was foreign to me for a while. It makes the
> code shorter, but sometimes less readable?
>
> With ggplot2, I just don't understand how it is organized. Take this code:
>
> > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) +
> geom_smooth()
>
> There are three plus signs. How do you know when to "add" and what to
> "add"? I've seen more plus signs.
>
> To me, aes() stands for aesthetic, meaning looks. So, anything related to
> looks like points and smooth should be in aes(). Apparently, it's not the
> case.
>
> So, how does ggplot2 work? Could someone explain this for an old-school R
> user?
>
> Thank you!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to understand the mentality behind tidyverse and ggplot2?

2020-11-18 Thread Bert Gunter
I should have said: Have you worked through the Vignettes and examples??

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Nov 18, 2020 at 9:37 AM C W  wrote:

> Dear R list,
>
> I am an old-school R user. I use apply(), with(), and which() in base
> package instead of filter(), select(), separate() in Tidyverse. The idea of
> pipeline (i.e. %>%) my code was foreign to me for a while. It makes the
> code shorter, but sometimes less readable?
>
> With ggplot2, I just don't understand how it is organized. Take this code:
>
> > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) +
> geom_smooth()
>
> There are three plus signs. How do you know when to "add" and what to
> "add"? I've seen more plus signs.
>
> To me, aes() stands for aesthetic, meaning looks. So, anything related to
> looks like points and smooth should be in aes(). Apparently, it's not the
> case.
>
> So, how does ggplot2 work? Could someone explain this for an old-school R
> user?
>
> Thank you!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tutorial/vignette on modified Kneser Ney smoothing

2020-11-18 Thread Bert Gunter
Wrong list!

Google "kneser Ney smoothing algorithm" for possibilities.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Nov 18, 2020 at 11:30 AM Gayathri Nagarajan <
gayathri.nagara...@gmail.com> wrote:

> Hi Team
>
> Iam a new learner trying to build n gram models from text corpus and trying
> to understand the modified kneser Ney smoothing algorithm to code and build
> my word prediction model.
>
> Can someone point me to a vignette or tutorial that will help me learn this
> ?
>
> Thanks in advance for your help
>
> Regards
> Gayathri
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tutorial/vignette on modified Kneser Ney smoothing

2020-11-18 Thread Gayathri Nagarajan
Hi Team

Iam a new learner trying to build n gram models from text corpus and trying
to understand the modified kneser Ney smoothing algorithm to code and build
my word prediction model.

Can someone point me to a vignette or tutorial that will help me learn this
?

Thanks in advance for your help

Regards
Gayathri

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to understand the mentality behind tidyverse and ggplot2?

2020-11-18 Thread el
RTFM, perhaps?

Or even worse, buy his book?

el

—
Sent from Dr Lisse’s iPad Mini 5
On 18 Nov 2020, 20:39 +0200, Ben Tupper , wrote:
> Hi,
>
> I feel your pain. As you have likely discovered yourself, there are
> just about 10^14 tutorials/posts/tips out there on ggplot2. See
> https://rseek.org/?q=+ggplot2+tutorial for example. Yikes!
>
> One resource I found most helpful when I started is
> https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1.
> This is a terrific resource for getting the feel of layering-up.
>
> Hope you find it helpful.
>
> CHeers,
> Ben
>
> On Wed, Nov 18, 2020 at 12:37 PM C W  wrote:
> >
> > Dear R list,
> >
> > I am an old-school R user. I use apply(), with(), and which() in base
> > package instead of filter(), select(), separate() in Tidyverse. The idea of
> > pipeline (i.e. %>%) my code was foreign to me for a while. It makes the
> > code shorter, but sometimes less readable?
> >
> > With ggplot2, I just don't understand how it is organized. Take this code:
> >
> > > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) +
> > geom_smooth()
> >
> > There are three plus signs. How do you know when to "add" and what to
> > "add"? I've seen more plus signs.
> >
> > To me, aes() stands for aesthetic, meaning looks. So, anything related to
> > looks like points and smooth should be in aes(). Apparently, it's not the
> > case.
> >
> > So, how does ggplot2 work? Could someone explain this for an old-school R
> > user?
> >
> > Thank you!
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Ben Tupper
> Bigelow Laboratory for Ocean Science
> East Boothbay, Maine
> http://www.bigelow.org/
> https://eco.bigelow.org
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to understand the mentality behind tidyverse and ggplot2?

2020-11-18 Thread Ben Tupper
Hi,

I feel your pain.  As you have likely discovered yourself, there are
just about 10^14 tutorials/posts/tips out there on ggplot2.  See
https://rseek.org/?q=+ggplot2+tutorial for example.   Yikes!

One resource I found most helpful when I started is
https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1.
This is a terrific resource for getting the feel of layering-up.

Hope you find it helpful.

CHeers,
Ben

On Wed, Nov 18, 2020 at 12:37 PM C W  wrote:
>
> Dear R list,
>
> I am an old-school R user. I use apply(), with(), and which() in base
> package instead of filter(), select(), separate() in Tidyverse. The idea of
> pipeline (i.e. %>%) my code was foreign to me for a while. It makes the
> code shorter, but sometimes less readable?
>
> With ggplot2, I just don't understand how it is organized. Take this code:
>
> > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) +
> geom_smooth()
>
> There are three plus signs. How do you know when to "add" and what to
> "add"? I've seen more plus signs.
>
> To me, aes() stands for aesthetic, meaning looks. So, anything related to
> looks like points and smooth should be in aes(). Apparently, it's not the
> case.
>
> So, how does ggplot2 work? Could someone explain this for an old-school R
> user?
>
> Thank you!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Ben Tupper
Bigelow Laboratory for Ocean Science
East Boothbay, Maine
http://www.bigelow.org/
https://eco.bigelow.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to understand the mentality behind tidyverse and ggplot2?

2020-11-18 Thread Bert Gunter
This is not the place for tutorials (although I recognize that many
responses and discussions do intersect tutoriality).
If you do a web search on ggplot tutorials you will find many good ones. Or
go to the RStudio website which links to resources, including Hadley
Wickham's book, which is probably the most authoritative. Incidentally,
ggplot is based on Leland WIlkinson's book "The Grammar of Graphics" that
provided the blueprint for Wickham's software (his PhD project at Iowa
State I believe).

Cheers,

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Nov 18, 2020 at 9:37 AM C W  wrote:

> Dear R list,
>
> I am an old-school R user. I use apply(), with(), and which() in base
> package instead of filter(), select(), separate() in Tidyverse. The idea of
> pipeline (i.e. %>%) my code was foreign to me for a while. It makes the
> code shorter, but sometimes less readable?
>
> With ggplot2, I just don't understand how it is organized. Take this code:
>
> > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) +
> geom_smooth()
>
> There are three plus signs. How do you know when to "add" and what to
> "add"? I've seen more plus signs.
>
> To me, aes() stands for aesthetic, meaning looks. So, anything related to
> looks like points and smooth should be in aes(). Apparently, it's not the
> case.
>
> So, how does ggplot2 work? Could someone explain this for an old-school R
> user?
>
> Thank you!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to understand the mentality behind tidyverse and ggplot2?

2020-11-18 Thread C W
Dear R list,

I am an old-school R user. I use apply(), with(), and which() in base
package instead of filter(), select(), separate() in Tidyverse. The idea of
pipeline (i.e. %>%) my code was foreign to me for a while. It makes the
code shorter, but sometimes less readable?

With ggplot2, I just don't understand how it is organized. Take this code:

> ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) +
geom_smooth()

There are three plus signs. How do you know when to "add" and what to
"add"? I've seen more plus signs.

To me, aes() stands for aesthetic, meaning looks. So, anything related to
looks like points and smooth should be in aes(). Apparently, it's not the
case.

So, how does ggplot2 work? Could someone explain this for an old-school R
user?

Thank you!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] - Trying to replicate VLOOKUP in R - help needed

2020-11-18 Thread Gregg via R-help
I will do that...

Thanks again Jeff.

r/
Gregg Powell




‐‐‐ Original Message ‐‐‐
On Wednesday, November 18, 2020 8:36 AM, Jeff Newmiller 
 wrote:

> Instead, learn how to use the merge function, or perhaps the dplyr::left_join 
> function. VLOOKUP is really not necessary.
> 

> On November 18, 2020 7:11:49 AM PST, Gregg via R-help r-help@r-project.org 
> wrote:
> 

> > Thanks Andrew and Mitch for your help.
> > With your assistance, I was able to sort this out.
> > Since I have to do this type of thing of often, and since there is no
> > existing package/function (yet) that makes this easy, if ever I get to
> > the point were I develop enough skill to build and submit a new
> > package, a simple little VLOOKUP(like) function contained in a package
> > would be of great use.
> > r/
> > Gregg
> > ‐‐‐ Original Message ‐‐‐
> > On Monday, November 16, 2020 1:56 PM, Gregg via R-help
> > r-help@r-project.org wrote:
> > 

> > > PROBLEM: I am trying to replicate something like a VLOOKUP in R but
> > > am having no success - need a bit of help.
> > 

> > > GIVEN DATA SET (data.table): (looks something like this, but much
> > > bigger)
> > 

> > > NAME TOTALAUTH ASSIGNED_COMPANY
> > > ABERDEEN PROVING GROUND 1 NA
> > > ADELPHI LABORATORY CENTER 1 NA
> > > CARLISLE BARRACKS 1 NA
> > > DETROIT ARSENAL 1 NA
> > > DUGWAY PROVING GROUND 1 NA
> > > FORT A P HILL 1 NA
> > > FORT BELVOIR 1 NA
> > > FORT BENNING 1 NA
> > > FORT BLISS 1 NA
> > > FORT BRAGG 1 NA
> > > FORT BUCHANAN 1 NA
> > 

> > > I am trying to update the values in the ASSIGNED_COMPANY column from
> > > NAs to a value that matches based on the "key" word like below.
> > 

> > > NAME TOTALAUTH ASSIGNED_COMPANY
> > > ABERDEEN PROVING GROUND 1 NEC Aberdeen
> > > ADELPHI LABORATORY CENTER 1 NEC Adelphi
> > > CARLISLE BARRACKS 1 NEC Carlise
> > > DETROIT ARSENAL 1 NEC Detroit
> > > DUGWAY PROVING GROUND 1 NEC Dugway
> > > FORT A P HILL 1 NEC AP Hill
> > > FORT BELVOIR 1 NEC Belvoir
> > > FORT BENNING 1 NEC Benning
> > > FORT BLISS 1 NEC Bliss
> > > FORT BRAGG 1 NEC Bragg
> > > FORT BUCHANAN 1 NEC Buchanon
> > 

> > > In a nutshell, for instance...
> > 

> > > I want to search for the keyword "ABERDEEN" in the NAME column, and
> > > for every row where it exists, I want to update the NA in the
> > > ASSIGNED_COMPANY column to "NEC Aberdeen"
> > 

> > > I want to search for the keyword "ADELPHI" in the NAME column, and
> > > for every row where it exists, I want to update the NA in the
> > > ASSIGNED_COMPANY column to "NEC ADELPHI"
> > 

> > > ... and so on for every value in the NAME column - so in the end
> > > a I have matching names in the ASSIGNED_COMPANY column.
> > 

> > > I can use an if statement because it is not vectorized.
> > 

> > > If I use an ifelse statement, the "else" rewrites any changes with ""
> > 

> > > Something so simple should not be difficult.
> > 

> > > Some of the methods I attempted to use are below along with the
> > > errors I get...
> > 

> > > ###CODE###
> > 

> > > library(data.table)
> > > library(dplyr)
> > > library(stringr)
> > 

> > > VLOOKUP_inR <- data.table::fread("DATASET_TESTINGONLY.csv")
> > 

> > > #METHOD 1 FAILS
> > > VLOOKUP_inR %>% dplyr::rename_if(grepl("ADELPHI", VLOOKUP_inR$NAME,
> > > useBytes = TRUE), "NEC Adelphi")
> > 

> > > Error in get(.x, .env, mode = "function") :
> > 

> > > object 'NEC Adelphi' of mode 'function' was not found
> > 

> > > #METHOD 2 FAILS
> > > if(stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) {
> > > VLOOKUP_inR$ASSIGNED_COMPANY == "NEC Adelphi"
> > > }
> > 

> > > Warning message:
> > > In if (stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { :
> > > the condition has length > 1 and only the first element will be used
> > 

> > > #METHOD 3 FAILS
> > > ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME,
> > > "ADELPHI"), ASIP_combined_location_tally$ASSIGNED_COMPANY ==
> > > ASIP_combined_location_tally$ASSIGNED_COMPANY)
> > 

> > > Error in
> > > ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, :
> > 

> > > argument "no" is missing, with no default
> > 

> > > #METHOD4 FAILS
> > > VLOOKUP_inR_matching <- VLOOKUP_inR %>% mutate(ASSIGNED_COMPANY =
> > > ifelse(grepl(pattern = 'ABERDEEN', x = NAME), 'NEC Aberdeen', ''))
> > > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>%
> > > mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'ADELPHI', x = NAME),
> > > 'NEC Adelphi', ''))
> > 

> > > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>%
> > > mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'CARLISLE', x = NAME),
> > > 'NEC Carlisle Barracks', ''))
> > > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>%
> > > mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'DETROIT', x = NAME),
> > > 'NEC Detroit Arsenal', ''))
> > > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>%
> > > mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'BELVOIR', x = NAME),
> > > 'NEC Fort Belvoir', ''))
> > 

> > 

Re: [R] - Trying to replicate VLOOKUP in R - help needed

2020-11-18 Thread Jeff Newmiller
Instead, learn how to use the merge function, or perhaps the dplyr::left_join 
function. VLOOKUP is really not necessary.

On November 18, 2020 7:11:49 AM PST, Gregg via R-help  
wrote:
>Thanks Andrew and Mitch for your help.
>
>With your assistance, I was able to sort this out.
>
>Since I have to do this type of thing of often, and since there is no
>existing package/function (yet) that makes this easy, if ever I get to
>the point were I develop enough skill to build and submit a new
>package, a simple little VLOOKUP(like) function contained in a package
>would be of great use.
>
>r/
>Gregg
>
>
>
>
>‐‐‐ Original Message ‐‐‐
>On Monday, November 16, 2020 1:56 PM, Gregg via R-help
> wrote:
>
>> PROBLEM: I am trying to replicate something like a VLOOKUP in R but
>am having no success - need a bit of help.
>> 
>
>> GIVEN DATA SET (data.table): (looks something like this, but much
>bigger)
>> 
>
>> NAME TOTALAUTH ASSIGNED_COMPANY
>> ABERDEEN PROVING GROUND 1 NA
>> ADELPHI LABORATORY CENTER 1 NA
>> CARLISLE BARRACKS 1 NA
>> DETROIT ARSENAL 1 NA
>> DUGWAY PROVING GROUND 1 NA
>> FORT A P HILL 1 NA
>> FORT BELVOIR 1 NA
>> FORT BENNING 1 NA
>> FORT BLISS 1 NA
>> FORT BRAGG 1 NA
>> FORT BUCHANAN 1 NA
>> 
>
>> I am trying to update the values in the ASSIGNED_COMPANY column from
>NAs to a value that matches based on the "key" word like below.
>> 
>
>> NAME TOTALAUTH ASSIGNED_COMPANY
>> ABERDEEN PROVING GROUND 1 NEC Aberdeen
>> ADELPHI LABORATORY CENTER 1 NEC Adelphi
>> CARLISLE BARRACKS 1 NEC Carlise
>> DETROIT ARSENAL 1 NEC Detroit
>> DUGWAY PROVING GROUND 1 NEC Dugway
>> FORT A P HILL 1 NEC AP Hill
>> FORT BELVOIR 1 NEC Belvoir
>> FORT BENNING 1 NEC Benning
>> FORT BLISS 1 NEC Bliss
>> FORT BRAGG 1 NEC Bragg
>> FORT BUCHANAN 1 NEC Buchanon
>> 
>
>> In a nutshell, for instance...
>> 
>
>> I want to search for the keyword "ABERDEEN" in the NAME column, and
>for every row where it exists, I want to update the NA in the
>ASSIGNED_COMPANY column to "NEC Aberdeen"
>> 
>
>> I want to search for the keyword "ADELPHI" in the NAME column, and
>for every row where it exists, I want to update the NA in the
>ASSIGNED_COMPANY column to "NEC ADELPHI"
>> 
>
>> ... and so on for every value in the NAME column - so in the end
>a I have matching names in the ASSIGNED_COMPANY column.
>> 
>
>> I can use an if statement because it is not vectorized.
>> 
>
>> If I use an ifelse statement, the "else" rewrites any changes with ""
>> 
>
>> Something so simple should not be difficult.
>> 
>
>> Some of the methods I attempted to use are below along with the
>errors I get...
>> 
>
>> ###CODE###
>> 
>
>> library(data.table)
>> library(dplyr)
>> library(stringr)
>> 
>
>> VLOOKUP_inR <- data.table::fread("DATASET_TESTINGONLY.csv")
>> 
>
>> #METHOD 1 FAILS
>> VLOOKUP_inR %>% dplyr::rename_if(grepl("ADELPHI", VLOOKUP_inR$NAME,
>useBytes = TRUE), "NEC Adelphi")
>> 
>
>> Error in get(.x, .env, mode = "function") :
>> 
>
>> object 'NEC Adelphi' of mode 'function' was not found
>> 
>
>> #METHOD 2 FAILS
>> if(stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) {
>> VLOOKUP_inR$ASSIGNED_COMPANY == "NEC Adelphi"
>> }
>> 
>
>> Warning message:
>> In if (stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { :
>> the condition has length > 1 and only the first element will be used
>> 
>
>> #METHOD 3 FAILS
>> ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME,
>"ADELPHI"), ASIP_combined_location_tally$ASSIGNED_COMPANY ==
>ASIP_combined_location_tally$ASSIGNED_COMPANY)
>> 
>
>> Error in
>ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, :
>> 
>
>> argument "no" is missing, with no default
>> 
>
>> #METHOD4 FAILS
>> VLOOKUP_inR_matching <- VLOOKUP_inR %>% mutate(ASSIGNED_COMPANY =
>ifelse(grepl(pattern = 'ABERDEEN', x = NAME), 'NEC Aberdeen', ''))
>> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>%
>mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'ADELPHI', x = NAME),
>'NEC Adelphi', ''))
>> 
>
>> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>%
>mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'CARLISLE', x = NAME),
>'NEC Carlisle Barracks', ''))
>> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>%
>mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'DETROIT', x = NAME),
>'NEC Detroit Arsenal', ''))
>> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>%
>mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'BELVOIR', x = NAME),
>'NEC Fort Belvoir', ''))
>> 
>
>> ---the 4th method just over writes all previous changers back
>to ""
>> 
>
>>
>##
>> 
>
>> Any help offered would be so very greatly appreciated.
>> 
>
>> Thanks you.
>> 
>
>> r/
>> gregg powell
>> AZ
>> 
>
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, 

Re: [R] - Trying to replicate VLOOKUP in R - help needed

2020-11-18 Thread Gregg via R-help
Thanks Andrew and Mitch for your help.

With your assistance, I was able to sort this out.

Since I have to do this type of thing of often, and since there is no existing 
package/function (yet) that makes this easy, if ever I get to the point were I 
develop enough skill to build and submit a new package, a simple little 
VLOOKUP(like) function contained in a package would be of great use.

r/
Gregg




‐‐‐ Original Message ‐‐‐
On Monday, November 16, 2020 1:56 PM, Gregg via R-help  
wrote:

> PROBLEM: I am trying to replicate something like a VLOOKUP in R but am having 
> no success - need a bit of help.
> 

> GIVEN DATA SET (data.table): (looks something like this, but much bigger)
> 

> NAME TOTALAUTH ASSIGNED_COMPANY
> ABERDEEN PROVING GROUND 1 NA
> ADELPHI LABORATORY CENTER 1 NA
> CARLISLE BARRACKS 1 NA
> DETROIT ARSENAL 1 NA
> DUGWAY PROVING GROUND 1 NA
> FORT A P HILL 1 NA
> FORT BELVOIR 1 NA
> FORT BENNING 1 NA
> FORT BLISS 1 NA
> FORT BRAGG 1 NA
> FORT BUCHANAN 1 NA
> 

> I am trying to update the values in the ASSIGNED_COMPANY column from NAs to a 
> value that matches based on the "key" word like below.
> 

> NAME TOTALAUTH ASSIGNED_COMPANY
> ABERDEEN PROVING GROUND 1 NEC Aberdeen
> ADELPHI LABORATORY CENTER 1 NEC Adelphi
> CARLISLE BARRACKS 1 NEC Carlise
> DETROIT ARSENAL 1 NEC Detroit
> DUGWAY PROVING GROUND 1 NEC Dugway
> FORT A P HILL 1 NEC AP Hill
> FORT BELVOIR 1 NEC Belvoir
> FORT BENNING 1 NEC Benning
> FORT BLISS 1 NEC Bliss
> FORT BRAGG 1 NEC Bragg
> FORT BUCHANAN 1 NEC Buchanon
> 

> In a nutshell, for instance...
> 

> I want to search for the keyword "ABERDEEN" in the NAME column, and for every 
> row where it exists, I want to update the NA in the ASSIGNED_COMPANY column 
> to "NEC Aberdeen"
> 

> I want to search for the keyword "ADELPHI" in the NAME column, and for every 
> row where it exists, I want to update the NA in the ASSIGNED_COMPANY column 
> to "NEC ADELPHI"
> 

> ... and so on for every value in the NAME column - so in the end a I have 
> matching names in the ASSIGNED_COMPANY column.
> 

> I can use an if statement because it is not vectorized.
> 

> If I use an ifelse statement, the "else" rewrites any changes with ""
> 

> Something so simple should not be difficult.
> 

> Some of the methods I attempted to use are below along with the errors I 
> get...
> 

> ###CODE###
> 

> library(data.table)
> library(dplyr)
> library(stringr)
> 

> VLOOKUP_inR <- data.table::fread("DATASET_TESTINGONLY.csv")
> 

> #METHOD 1 FAILS
> VLOOKUP_inR %>% dplyr::rename_if(grepl("ADELPHI", VLOOKUP_inR$NAME, useBytes 
> = TRUE), "NEC Adelphi")
> 

> Error in get(.x, .env, mode = "function") :
> 

> object 'NEC Adelphi' of mode 'function' was not found
> 

> #METHOD 2 FAILS
> if(stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) {
> VLOOKUP_inR$ASSIGNED_COMPANY == "NEC Adelphi"
> }
> 

> Warning message:
> In if (stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { :
> the condition has length > 1 and only the first element will be used
> 

> #METHOD 3 FAILS
> ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, "ADELPHI"), 
> ASIP_combined_location_tally$ASSIGNED_COMPANY == 
> ASIP_combined_location_tally$ASSIGNED_COMPANY)
> 

> Error in ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, :
> 

> argument "no" is missing, with no default
> 

> #METHOD4 FAILS
> VLOOKUP_inR_matching <- VLOOKUP_inR %>% mutate(ASSIGNED_COMPANY = 
> ifelse(grepl(pattern = 'ABERDEEN', x = NAME), 'NEC Aberdeen', ''))
> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = 
> ifelse(grepl(pattern = 'ADELPHI', x = NAME), 'NEC Adelphi', ''))
> 

> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = 
> ifelse(grepl(pattern = 'CARLISLE', x = NAME), 'NEC Carlisle Barracks', ''))
> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = 
> ifelse(grepl(pattern = 'DETROIT', x = NAME), 'NEC Detroit Arsenal', ''))
> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = 
> ifelse(grepl(pattern = 'BELVOIR', x = NAME), 'NEC Fort Belvoir', ''))
> 

> ---the 4th method just over writes all previous changers back to ""
> 

> ##
> 

> Any help offered would be so very greatly appreciated.
> 

> Thanks you.
> 

> r/
> gregg powell
> AZ
> 

> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and 

Re: [R] analyzing results from Tuesday's US elections

2020-11-18 Thread Marc Roos
 
Maybe this could be interesting to verify against found anomalies?

"A second memory card with uncounted votes was found during an audit in 
Fayette County, Georgia, containing 2,755 votes"
https://www.zerohedge.com/political/second-memory-card-2755-votes-found-during-georgia-election-audit-decreasing-biden-lead

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting duplicate items that occur in multiple groups

2020-11-18 Thread Tom Woolman

Thanks, everyone!



Quoting Jim Lemon :


Oops, I sent this to Tom earlier today and forgot to copy to the list:

VendorID=rep(paste0("V",1:10),each=5)
AcctID=paste0("A",sample(1:5,50,TRUE))
Data<-data.frame(VendorID,AcctID)
table(Data)
# get multiple vendors for each account
dupAcctID<-colSums(table(Data)>0)
Data$dupAcct<-NA
# fill in the new column
for(i in 1:length(dupAcctID))
 Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i]

Jim

On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman 
wrote:


Hi everyone.  I have a dataframe that is a collection of Vendor IDs
plus a bank account number for each vendor. I'm trying to find a way
to count the number of duplicate bank accounts that occur in more than
one unique Vendor_ID, and then assign the count value for each row in
the dataframe in a new variable.

I can do a count of bank accounts that occur within the same vendor
using dplyr and group_by and count, but I can't figure out a way to
count duplicates among multiple Vendor_IDs.


Dataframe example code:


#Create a sample data frame:

set.seed(1)

Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
sample(1:1))




Thanks in advance for any help.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting duplicate items that occur in multiple groups

2020-11-18 Thread Jim Lemon
Oops, I sent this to Tom earlier today and forgot to copy to the list:

VendorID=rep(paste0("V",1:10),each=5)
AcctID=paste0("A",sample(1:5,50,TRUE))
Data<-data.frame(VendorID,AcctID)
table(Data)
# get multiple vendors for each account
dupAcctID<-colSums(table(Data)>0)
Data$dupAcct<-NA
# fill in the new column
for(i in 1:length(dupAcctID))
 Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i]

Jim

On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman 
wrote:

> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> plus a bank account number for each vendor. I'm trying to find a way
> to count the number of duplicate bank accounts that occur in more than
> one unique Vendor_ID, and then assign the count value for each row in
> the dataframe in a new variable.
>
> I can do a count of bank accounts that occur within the same vendor
> using dplyr and group_by and count, but I can't figure out a way to
> count duplicates among multiple Vendor_IDs.
>
>
> Dataframe example code:
>
>
> #Create a sample data frame:
>
> set.seed(1)
>
> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
> sample(1:1))
>
>
>
>
> Thanks in advance for any help.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting duplicate items that occur in multiple groups

2020-11-18 Thread Deepayan Sarkar
On Wed, Nov 18, 2020 at 5:40 AM Bert Gunter  wrote:
>
> z <- with(Data2, tapply(Vendor,Account, I))
> n <- vapply(z,length,1)
> data.frame (Vendor = unlist(z),
>Account = rep(names(z),n),
>NumVen = rep(n,n)
> )
>
> ## which gives:
>
>Vendor Account NumVen
> A1  V1  A1  1
> A21 V2  A2  3
> A22 V3  A2  3
> A23 V1  A2  3
> A3  V4  A3  1
> A4  V2  A4  1
>
> Of course this also works for Data1
>
> Bill may be able to come up with a slicker version, however.

Perhaps

transform(Data2, nshare = as.vector(table(Account)[Account]))

(or dplyr::mutate() instead of transform(), if you prefer.)

-Deepayan

>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman 
> wrote:
>
> > Yes, good catch. Thanks
> >
> >
> > Quoting Bert Gunter :
> >
> > > Why 0's in the data frame? Shouldn't that be 1 (vendor with that
> > account)?
> > >
> > > Bert
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep coming along
> > and
> > > sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >
> > >
> > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman 
> > > wrote:
> > >
> > >> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> > >> (too hard) not to share any actual values so I just created randomized
> > >> values for my example; of course I should have specified that the
> > >> random values would not provide the expected problem pattern. I should
> > >> have just used simple dummy codes as Bill Dunlap did.
> > >>
> > >> So per Bill's example data for Data1, the expected (hoped for) output
> > >> should be:
> > >>
> > >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> > >> 1 V1  A1  0
> > >> 2 V2  A2  3
> > >> 3 V3  A2  3
> > >> 4 V4  A2  3
> > >>
> > >>
> > >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
> > >> The value is 3 for V2, V3 and V4 because they each share bank account
> > >> A2.
> > >>
> > >>
> > >> Likewise, in the Data2 frame, the same logic applies:
> > >>
> > >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> > >> 1 V1  A1 0
> > >> 2 V2  A2 3
> > >> 3 V3  A2 3
> > >> 4 V1  A2 3
> > >> 5 V4  A3 0
> > >> 6 V2  A4 0
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Thanks!
> > >>
> > >>
> > >> Quoting Bill Dunlap :
> > >>
> > >> > What should the result be for
> > >> >   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> > >> > Account=c("A1","A2","A2","A2"))
> > >> > ?
> > >> >
> > >> > Must each vendor have only one account?  If not, what should the
> > result
> > >> be
> > >> > for
> > >> >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> > >> > Account=c("A1","A2","A2","A2","A3","A4"))
> > >> > ?
> > >> >
> > >> > -Bill
> > >> >
> > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman  > >
> > >> > wrote:
> > >> >
> > >> >> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> > >> >> plus a bank account number for each vendor. I'm trying to find a way
> > >> >> to count the number of duplicate bank accounts that occur in more
> > than
> > >> >> one unique Vendor_ID, and then assign the count value for each row in
> > >> >> the dataframe in a new variable.
> > >> >>
> > >> >> I can do a count of bank accounts that occur within the same vendor
> > >> >> using dplyr and group_by and count, but I can't figure out a way to
> > >> >> count duplicates among multiple Vendor_IDs.
> > >> >>
> > >> >>
> > >> >> Dataframe example code:
> > >> >>
> > >> >>
> > >> >> #Create a sample data frame:
> > >> >>
> > >> >> set.seed(1)
> > >> >>
> > >> >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
> > >> >> sample(1:1))
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> Thanks in advance for any help.
> > >> >>
> > >> >> __
> > >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >> PLEASE do read the posting guide
> > >> >> http://www.R-project.org/posting-guide.html
> > >> >> and provide commented, minimal, self-contained, reproducible code.
> > >> >>
> > >>
> > >> __
> > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> >
> >
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To