[R] Text search
Hi all, >From one of the columns of the data frame I want to search and extract a text that contains Tall or Short and create new column that should contain these texts in a corresponding row. My example data and the desired output are shown below dat<-read.table(text="obs Year char 1 2001 Tall156 2 2002 12565Tall 3 2003 all54 4 2004 Short 5 2005 54all 6 2006 7Short12 ",header=TRUE,stringsAsFactors=F) dat$new <- " " Desired out put obs Year charnew 1 2001 Tall156 Tall 2 2002 12565TallTall 3 2003 all54 5 2004 Short Short 6 2005 Shall54 7 2006 7Short12 Short How do I get my desired output? Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove a row
Thank you so much Bert. Is it possible to split the varx into three ( area code, region and the numeric part)as a separate variable On Thu, Nov 28, 2019 at 7:31 PM Bert Gunter wrote: > > Use regular expressions. > > See ?regexp and ?grep > > Using your example: > > > grep("^[[:digit:]]{1,3}[[:alpha:]]{1,2}[[:digit:]]{1,5}$",dat$varx,value = > > TRUE) > [1] "9F209" "2F250" "121FL50" > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Nov 28, 2019 at 3:17 PM Ashta wrote: >> >> Hi all, I want to remove a row based on a condition in one of the >> variables from a data frame. >> When we split this string it should be composed of 3-2- 5 format (3 >> digits numeric, 2 characters and 5 digits numeric). Like >> area code -region-numeric. The max length of the area code should be >> 3, the max length of region be should be 2, followed by a max length >> of 5 numeric digits. The are code can be 1 digit, or 2 digits or >> 3 digits but not more than three digits. So the max length of this >> variable is 10. Anything outside of this pattern should be excluded. >> As an example >> >> dat <-read.table(text=" rown varx >> 1 9F209 >> 2 FL250 >> 3 2F250 >> 4 102250 >> 5 102FL >> 6 102 >> 7 1212FL250 >> 8 121FL50",header=TRUE,stringsAsFactors=F) >> >> 1 9F209 # keep >> 2 FL250 # remove, no area code >> 3 2F250 # keep >> 4 102250 # remove , no region code >> 5 102FL # remove , no numeric after region code >> 6 102 # remove , no region code and numeric >> 7 1212FL250 #remove, area code is more than three digits >> 8 121FL50 # Keep >> >> The desired output should be >> 1 9F209 >> 3 2F250 >> 8 121FL50 >> >> How do I do this in an efficient way? >> >> Thank you in advance >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] remove a row
Hi all, I want to remove a row based on a condition in one of the variables from a data frame. When we split this string it should be composed of 3-2- 5 format (3 digits numeric, 2 characters and 5 digits numeric). Like area code -region-numeric. The max length of the area code should be 3, the max length of region be should be 2, followed by a max length of 5 numeric digits. The are code can be 1 digit, or 2 digits or 3 digits but not more than three digits. So the max length of this variable is 10. Anything outside of this pattern should be excluded. As an example dat <-read.table(text=" rown varx 1 9F209 2 FL250 3 2F250 4 102250 5 102FL 6 102 7 1212FL250 8 121FL50",header=TRUE,stringsAsFactors=F) 1 9F209 # keep 2 FL250 # remove, no area code 3 2F250 # keep 4 102250 # remove , no region code 5 102FL # remove , no numeric after region code 6 102 # remove , no region code and numeric 7 1212FL250 #remove, area code is more than three digits 8 121FL50 # Keep The desired output should be 1 9F209 3 2F250 8 121FL50 How do I do this in an efficient way? Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove
Thank you All !! Now, I have plenty of options to chose. On Sat, Dec 9, 2017 at 1:21 PM, William Dunlap <wdun...@tibco.com> wrote: > You could make numeric vectors, named by the group identifier, of the > contraints > and subscript it by group name: > >> DM <- read.table( text='GR x y > + A 25 125 > + A 23 135 > + A 14 145 > + A 35 230 > + B 45 321 > + B 47 512 > + B 53 123 > + B 55 451 > + C 61 521 > + C 68 235 > + C 85 258 > + C 80 654',header = TRUE, stringsAsFactors = FALSE) >> >> GRmin <- c(A=15, B=40, C=60) >> GRmax <- c(A=30, B=50, C=75) >> subset(DM, x>=GRmin[GR] & x <=GRmax[GR]) >GR x y > 1 A 25 125 > 2 A 23 135 > 5 B 45 321 > 6 B 47 512 > 9 C 61 521 > 10 C 68 235 > > Or, if you want to completely avoid non-standard evaluation: >> DM[ DM$x >= GRmin[DM$GR] & DM$x <= GRmax[DM$GR], ] >GR x y > 1 A 25 125 > 2 A 23 135 > 5 B 45 321 > 6 B 47 512 > 9 C 61 521 > 10 C 68 235 > > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Sat, Dec 9, 2017 at 9:38 AM, David Winsemius <dwinsem...@comcast.net> > wrote: >> >> >> > On Dec 8, 2017, at 6:16 PM, David Winsemius <dwinsem...@comcast.net> >> > wrote: >> > >> > >> >> On Dec 8, 2017, at 4:48 PM, Ashta <sewa...@gmail.com> wrote: >> >> >> >> Hi David, Ista and all, >> >> >> >> I have one related question Within one group I want to keep records >> >> conditionally. >> >> example within >> >> group A I want keep rows that have " x" values ranged between 15 and >> >> 30. >> >> group B I want keep rows that have " x" values ranged between 40 >> >> and 50. >> >> group C I want keep rows that have " x" values ranged between 60 >> >> and 75. >> > >> > When you have a problem where there are multiple "parallel: parameters, >> > the function to "reach for" is `mapply`. >> > >> >mapply( your_selection_func, group_vec, min_vec, max_vec) >> > >> > ... and this will probably return the values as a list (of dataframes if >> > you build the function correctly, so you may may need to then do: >> > >> >do.call(rbind, ...) >> >> do.call( rbind, >> mapply( function(dat, grp, minx, maxx) {dat[ dat$GR==grp & dat$x >= >> minx & dat$x <= maxx, ]}, >> grp=LETTERS[1:3], minx=c(15,40,60), maxx=c(30,50,75) , >> MoreArgs=list(dat=DM), >> IMPLIFY=FALSE)) >> GR x y >> A.1 A 25 125 >> A.2 A 23 135 >> B.5 B 45 321 >> B.6 B 47 512 >> C.9 C 61 521 >> C.10 C 68 235 >> >> > >> > -- >> > David. >> >> >> >> >> >> DM <- read.table( text='GR x y >> >> A 25 125 >> >> A 23 135 >> >> A 14 145 >> >> A 35 230 >> >> B 45 321 >> >> B 47 512 >> >> B 53 123 >> >> B 55 451 >> >> C 61 521 >> >> C 68 235 >> >> C 85 258 >> >> C 80 654',header = TRUE, stringsAsFactors = FALSE) >> >> >> >> >> >> The end result will be >> >> A 25 125 >> >> A 23 135 >> >> B 45 321 >> >> B 47 512 >> >> C 61 521 >> >> C 68 235 >> >> >> >> Thank you >> >> >> >> On Wed, Dec 6, 2017 at 10:34 PM, David Winsemius >> >> <dwinsem...@comcast.net> wrote: >> >>> >> >>>> On Dec 6, 2017, at 4:27 PM, Ashta <sewa...@gmail.com> wrote: >> >>>> >> >>>> Thank you Ista! Worked fine. >> >>> >> >>> Here's another (possibly more direct in its logic?): >> >>> >> >>> DM[ !ave(DM$x, DM$GR, FUN= function(x) {!length(unique(x))==1}), ] >> >>> GR x y >> >>> 5 B 25 321 >> >>> 6 B 25 512 >> >>> 7 B 25 123 >> >>> 8 B 25 451 >> >>> >> >>> -- >> >>> David >> >>> >> >>>> On Wed, Dec 6, 2017 at 5:59 PM, Ista Zahn <istaz...@gmail.com> wrote: >> >>>>> Hi Ashta, >> >>>>> >> >>>>> There are many ways to do it. Here is one: >> >>>>> >> >
Re: [R] Remove
Hi David, Ista and all, I have one related question Within one group I want to keep records conditionally. example within group A I want keep rows that have " x" values ranged between 15 and 30. group B I want keep rows that have " x" values ranged between 40 and 50. group C I want keep rows that have " x" values ranged between 60 and 75. DM <- read.table( text='GR x y A 25 125 A 23 135 A 14 145 A 35 230 B 45 321 B 47 512 B 53 123 B 55 451 C 61 521 C 68 235 C 85 258 C 80 654',header = TRUE, stringsAsFactors = FALSE) The end result will be A 25 125 A 23 135 B 45 321 B 47 512 C 61 521 C 68 235 Thank you On Wed, Dec 6, 2017 at 10:34 PM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On Dec 6, 2017, at 4:27 PM, Ashta <sewa...@gmail.com> wrote: >> >> Thank you Ista! Worked fine. > > Here's another (possibly more direct in its logic?): > > DM[ !ave(DM$x, DM$GR, FUN= function(x) {!length(unique(x))==1}), ] > GR x y > 5 B 25 321 > 6 B 25 512 > 7 B 25 123 > 8 B 25 451 > > -- > David > >> On Wed, Dec 6, 2017 at 5:59 PM, Ista Zahn <istaz...@gmail.com> wrote: >>> Hi Ashta, >>> >>> There are many ways to do it. Here is one: >>> >>> vars <- sapply(split(DM$x, DM$GR), var) >>> DM[DM$GR %in% names(vars[vars > 0]), ] >>> >>> Best >>> Ista >>> >>> On Wed, Dec 6, 2017 at 6:58 PM, Ashta <sewa...@gmail.com> wrote: >>>> Thank you Jeff, >>>> >>>> subset( DM, "B" != x ), this works if I know the group only. >>>> But if I don't know that group in this case "B", how do I identify >>>> group(s) that all elements of x have the same value? >>>> >>>> On Wed, Dec 6, 2017 at 5:48 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> >>>> wrote: >>>>> subset( DM, "B" != x ) >>>>> >>>>> This is covered in the Introduction to R document that comes with R. >>>>> -- >>>>> Sent from my phone. Please excuse my brevity. >>>>> >>>>> On December 6, 2017 3:21:12 PM PST, David Winsemius >>>>> <dwinsem...@comcast.net> wrote: >>>>>> >>>>>>> On Dec 6, 2017, at 3:15 PM, Ashta <sewa...@gmail.com> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> In a data set I have group(GR) and two variables x and y. I want to >>>>>>> remove a group that have the same record for the x variable in each >>>>>>> row. >>>>>>> >>>>>>> DM <- read.table( text='GR x y >>>>>>> A 25 125 >>>>>>> A 23 135 >>>>>>> A 14 145 >>>>>>> A 12 230 >>>>>>> B 25 321 >>>>>>> B 25 512 >>>>>>> B 25 123 >>>>>>> B 25 451 >>>>>>> C 11 521 >>>>>>> C 14 235 >>>>>>> C 15 258 >>>>>>> C 10 654',header = TRUE, stringsAsFactors = FALSE) >>>>>>> >>>>>>> In this example the output should contain group A and C as group B >>>>>>> has the same record for the variable x . >>>>>>> >>>>>>> The result will be >>>>>>> A 25 125 >>>>>>> A 23 135 >>>>>>> A 14 145 >>>>>>> A 12 230 >>>>>>> C 11 521 >>>>>>> C 14 235 >>>>>>> C 15 258 >>>>>>> C 10 654 >>>>>> >>>>>> Try: >>>>>> >>>>>> DM[ !duplicated(DM$x) , ] >>>>>>> >>>>>>> How do I do it R? >>>>>>> Thank you. >>>>>>> >>>>>>> __ >>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>>> David Winsemius >>>>>> Alameda, CA, USA >>>>>> >>>>>> 'Any technology distinguishable from magic is insufficiently advanced.' >>>>>> -Gehm's Corollary to Clarke's Third Law >>>>>> >>>>>> __ >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> __ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law > > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove
Thank you Ista! Worked fine. On Wed, Dec 6, 2017 at 5:59 PM, Ista Zahn <istaz...@gmail.com> wrote: > Hi Ashta, > > There are many ways to do it. Here is one: > > vars <- sapply(split(DM$x, DM$GR), var) > DM[DM$GR %in% names(vars[vars > 0]), ] > > Best > Ista > > On Wed, Dec 6, 2017 at 6:58 PM, Ashta <sewa...@gmail.com> wrote: >> Thank you Jeff, >> >> subset( DM, "B" != x ), this works if I know the group only. >> But if I don't know that group in this case "B", how do I identify >> group(s) that all elements of x have the same value? >> >> On Wed, Dec 6, 2017 at 5:48 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> >> wrote: >>> subset( DM, "B" != x ) >>> >>> This is covered in the Introduction to R document that comes with R. >>> -- >>> Sent from my phone. Please excuse my brevity. >>> >>> On December 6, 2017 3:21:12 PM PST, David Winsemius >>> <dwinsem...@comcast.net> wrote: >>>> >>>>> On Dec 6, 2017, at 3:15 PM, Ashta <sewa...@gmail.com> wrote: >>>>> >>>>> Hi all, >>>>> In a data set I have group(GR) and two variables x and y. I want to >>>>> remove a group that have the same record for the x variable in each >>>>> row. >>>>> >>>>> DM <- read.table( text='GR x y >>>>> A 25 125 >>>>> A 23 135 >>>>> A 14 145 >>>>> A 12 230 >>>>> B 25 321 >>>>> B 25 512 >>>>> B 25 123 >>>>> B 25 451 >>>>> C 11 521 >>>>> C 14 235 >>>>> C 15 258 >>>>> C 10 654',header = TRUE, stringsAsFactors = FALSE) >>>>> >>>>> In this example the output should contain group A and C as group B >>>>> has the same record for the variable x . >>>>> >>>>> The result will be >>>>> A 25 125 >>>>> A 23 135 >>>>> A 14 145 >>>>> A 12 230 >>>>> C 11 521 >>>>> C 14 235 >>>>> C 15 258 >>>>> C 10 654 >>>> >>>>Try: >>>> >>>>DM[ !duplicated(DM$x) , ] >>>>> >>>>> How do I do it R? >>>>> Thank you. >>>>> >>>>> __ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>>David Winsemius >>>>Alameda, CA, USA >>>> >>>>'Any technology distinguishable from magic is insufficiently advanced.' >>>> -Gehm's Corollary to Clarke's Third Law >>>> >>>>__ >>>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove
Thank you Jeff, subset( DM, "B" != x ), this works if I know the group only. But if I don't know that group in this case "B", how do I identify group(s) that all elements of x have the same value? On Wed, Dec 6, 2017 at 5:48 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > subset( DM, "B" != x ) > > This is covered in the Introduction to R document that comes with R. > -- > Sent from my phone. Please excuse my brevity. > > On December 6, 2017 3:21:12 PM PST, David Winsemius <dwinsem...@comcast.net> > wrote: >> >>> On Dec 6, 2017, at 3:15 PM, Ashta <sewa...@gmail.com> wrote: >>> >>> Hi all, >>> In a data set I have group(GR) and two variables x and y. I want to >>> remove a group that have the same record for the x variable in each >>> row. >>> >>> DM <- read.table( text='GR x y >>> A 25 125 >>> A 23 135 >>> A 14 145 >>> A 12 230 >>> B 25 321 >>> B 25 512 >>> B 25 123 >>> B 25 451 >>> C 11 521 >>> C 14 235 >>> C 15 258 >>> C 10 654',header = TRUE, stringsAsFactors = FALSE) >>> >>> In this example the output should contain group A and C as group B >>> has the same record for the variable x . >>> >>> The result will be >>> A 25 125 >>> A 23 135 >>> A 14 145 >>> A 12 230 >>> C 11 521 >>> C 14 235 >>> C 15 258 >>> C 10 654 >> >>Try: >> >>DM[ !duplicated(DM$x) , ] >>> >>> How do I do it R? >>> Thank you. >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >>David Winsemius >>Alameda, CA, USA >> >>'Any technology distinguishable from magic is insufficiently advanced.' >> -Gehm's Corollary to Clarke's Third Law >> >>__ >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove
Thank you David. This will not work. Tthis removes only duplicate records. DM[ !duplicated(DM$x) , ] My goal is to remove the group if all elements of x in that group have the same value. On Wed, Dec 6, 2017 at 5:21 PM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On Dec 6, 2017, at 3:15 PM, Ashta <sewa...@gmail.com> wrote: >> >> Hi all, >> In a data set I have group(GR) and two variables x and y. I want to >> remove a group that have the same record for the x variable in each >> row. >> >> DM <- read.table( text='GR x y >> A 25 125 >> A 23 135 >> A 14 145 >> A 12 230 >> B 25 321 >> B 25 512 >> B 25 123 >> B 25 451 >> C 11 521 >> C 14 235 >> C 15 258 >> C 10 654',header = TRUE, stringsAsFactors = FALSE) >> >> In this example the output should contain group A and C as group B >> has the same record for the variable x . >> >> The result will be >> A 25 125 >> A 23 135 >> A 14 145 >> A 12 230 >> C 11 521 >> C 14 235 >> C 15 258 >> C 10 654 > > Try: > > DM[ !duplicated(DM$x) , ] >> >> How do I do it R? >> Thank you. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law > > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Remove
Hi all, In a data set I have group(GR) and two variables x and y. I want to remove a group that have the same record for the x variable in each row. DM <- read.table( text='GR x y A 25 125 A 23 135 A 14 145 A 12 230 B 25 321 B 25 512 B 25 123 B 25 451 C 11 521 C 14 235 C 15 258 C 10 654',header = TRUE, stringsAsFactors = FALSE) In this example the output should contain group A and C as group B has the same record for the variable x . The result will be A 25 125 A 23 135 A 14 145 A 12 230 C 11 521 C 14 235 C 15 258 C 10 654 How do I do it R? Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading data
Hi Jim, With a little dig on my side , I have found the issue as to why the script is skipping that file. The file is "ISO-8859 text, with CRLF line terminators" The file should be ASCII and I changed using dos2unix and CRLF line terminators is eliminated but still I am not reading it. How can I read those files with "ISO-8859 text"? On Tue, Jun 13, 2017 at 7:20 PM, jim holtman <jholt...@gmail.com> wrote: > You need to provide reproducible data. What does the file contain? Why are > you using 'sep=' when reading fixed format. You might be able to attach the > '.txt' to your email to help with the problem. Also you did not state what > the differences that you are seeing. So help us out here. > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Tue, Jun 13, 2017 at 5:09 PM, Ashta <sewa...@gmail.com> wrote: >> >> Hi all, >> >> I am using R to extract data on a regular basis. >> However, sometimes using the same script and the same data I am >> getting different observation. >> The library I am using and how I am reading it is as follows. >> >> library(stringr) >> namelist <- file("Adress1.txt",encoding="ISO-8859-1") >> Name <- read.fwf(namelist, >> colClasses="character", skip=2,sep="\t",fill=T, >> width =c(2,8,1,1,1,1,1,1,9,5)+1,col.names=ccol) >> >> Can some one suggest me how track the issue? >> Is it the library issue or Java issue? >> May I read as free format instead of fixed format? >> >> Thank you in advance >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading data
Hi all, I am using R to extract data on a regular basis. However, sometimes using the same script and the same data I am getting different observation. The library I am using and how I am reading it is as follows. library(stringr) namelist <- file("Adress1.txt",encoding="ISO-8859-1") Name <- read.fwf(namelist, colClasses="character", skip=2,sep="\t",fill=T, width =c(2,8,1,1,1,1,1,1,9,5)+1,col.names=ccol) Can some one suggest me how track the issue? Is it the library issue or Java issue? May I read as free format instead of fixed format? Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non date value
Jeff, I am sorry for that. On Sat, Apr 15, 2017 at 12:04 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > You don't follow instructions very well. Read the Posting Guide more > carefully. > -- > Sent from my phone. Please excuse my brevity. > > On April 14, 2017 9:39:30 PM PDT, Ashta <sewa...@gmail.com> wrote: >>DF1 is a data frame. I am suspecting there might be non date value >>in that column. My question is how to remove a non date values from >> that field. >>example if Alex152 has 12253,. This value is not a date format. >> >> >>On Fri, Apr 14, 2017 at 11:24 PM, Bert Gunter <bgunter.4...@gmail.com> >>wrote: >>> Show us str(DF1) . It is not a data frame. >>> >>> -- Bert >>> >>> >>> >>> >>> On Fri, Apr 14, 2017 at 9:02 PM, Ashta <sewa...@gmail.com> wrote: >>>> Hi all, >>>> I am reading a field data that contains several variables. The >>sample >>>> of the data with the first two variables is shown below. I wanted >>to >>>> know the minimum and maximum recording date However, I have some >>>> problem. >>>> >>>> >>>> Name Rdate V1 to V20 >>>> Alex101/03/2015 >>>> Alex201/03/2014 >>>> Alex331/12/2012 >>>> Alex415/01/2011 >>>> Alex150 22/01/2010 >>>> Alex151 15/02/2011 >>>> >>>> >>>> >>>> DF1=DF1[!is.na(DF1$Rdate),] >>>> range(DF1$Rdate, na.rm=TRUE) >>>> >>>> Warning message: >>>> In is.na(DF1$Rdate) : >>>> is.na() applied to non-(list or vector) of type 'NULL' >>>> Error in DF1$Rdate : $ operator is invalid for atomic vectors >>>> Execution halted >>>> >>>> I am expecting the Rdate field should contain recording dates. I >>am >>>> suspecting there might be a non date value in that columns. How do >>I >>>> remove that row if it is not a date format? >>>> >>>> >>>> Thank you. >>>> >>>> __ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >>__ >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non date value
DF1 is a data frame. I am suspecting there might be non date value in that column. My question is how to remove a non date values from that field. example if Alex152 has 12253,. This value is not a date format. On Fri, Apr 14, 2017 at 11:24 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > Show us str(DF1) . It is not a data frame. > > -- Bert > > > > > On Fri, Apr 14, 2017 at 9:02 PM, Ashta <sewa...@gmail.com> wrote: >> Hi all, >> I am reading a field data that contains several variables. The sample >> of the data with the first two variables is shown below. I wanted to >> know the minimum and maximum recording date However, I have some >> problem. >> >> >> Name Rdate V1 to V20 >> Alex101/03/2015 >> Alex201/03/2014 >> Alex331/12/2012 >> Alex415/01/2011 >> Alex150 22/01/2010 >> Alex151 15/02/2011 >> >> >> >> DF1=DF1[!is.na(DF1$Rdate),] >> range(DF1$Rdate, na.rm=TRUE) >> >> Warning message: >> In is.na(DF1$Rdate) : >> is.na() applied to non-(list or vector) of type 'NULL' >> Error in DF1$Rdate : $ operator is invalid for atomic vectors >> Execution halted >> >> I am expecting the Rdate field should contain recording dates. I am >> suspecting there might be a non date value in that columns. How do I >> remove that row if it is not a date format? >> >> >> Thank you. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Non date value
Hi all, I am reading a field data that contains several variables. The sample of the data with the first two variables is shown below. I wanted to know the minimum and maximum recording date However, I have some problem. Name Rdate V1 to V20 Alex101/03/2015 Alex201/03/2014 Alex331/12/2012 Alex415/01/2011 Alex150 22/01/2010 Alex151 15/02/2011 DF1=DF1[!is.na(DF1$Rdate),] range(DF1$Rdate, na.rm=TRUE) Warning message: In is.na(DF1$Rdate) : is.na() applied to non-(list or vector) of type 'NULL' Error in DF1$Rdate : $ operator is invalid for atomic vectors Execution halted I am expecting the Rdate field should contain recording dates. I am suspecting there might be a non date value in that columns. How do I remove that row if it is not a date format? Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combine
Hi all, I have more than two files and merge by a single column and preserve the other columns. Here is an example of two files dat1 <- read.table(header=TRUE, text=' ID T1 T2 ID1125245 ID2141264 ID3133281') dat2 <- read.table(header=TRUE, text=' ID G1 G2 ID225 46 ID4 4164 ID53381') How do I get the following output? ID T1 T2 G1G2 ID11252450 0 ID2141264 2546 ID3133281 0 0 ID4 0 0 41 64 ID5 0 0 33 81 Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find and
Thank you Rudi and Ulrik. Rudi, your option worked for the small data set but when I applied to the big data set it taking long and never finished and have to kill it. I dont know why. Ulrik's option worked fine for the big data set (> 1.5M records) and took less than 2 minutes. These two are giving me the same results. # Counting unique DF4 %>%group_by(city) %>% filter(length(unique(var)) == 1) # Counting not duplicated DF4 %>%group_by(city) %>%filter(sum(!duplicated(var)) == 1) Thank yo again. On Sat, Mar 18, 2017 at 10:40 AM, Ulrik Stervbo <ulrik.ster...@gmail.com> wrote: > Using dplyr: > > library(dplyr) > > # Counting unique > DF4 %>% > group_by(city) %>% > filter(length(unique(var)) == 1) > > # Counting not duplicated > DF4 %>% > group_by(city) %>% > filter(sum(!duplicated(var)) == 1) > > HTH > Ulrik > > > On Sat, 18 Mar 2017 at 15:17 Rui Barradas <ruipbarra...@sapo.pt> wrote: >> >> Hello, >> >> I believe this does it. >> >> >> sp <- split(DF4, DF4$city) >> want <- do.call(rbind, lapply(sp, function(x) >> if(length(unique(x$var)) == 1) x else NULL)) >> rownames(want) <- NULL >> want >> >> >> Hope this helps, >> >> Rui Barradas >> >> Em 18-03-2017 13:51, Ashta escreveu: >> > Hi all, >> > >> > I am trying to find a city that do not have the same "var" value. >> > Within city the var should be the same otherwise exclude the city from >> > the final data set. >> > Here is my sample data and my attempt. City1 and city4 should be >> > excluded. >> > >> > DF4 <- read.table(header=TRUE, text=' city wk var >> > city1 1 x >> > city1 2 - >> > city1 3 x >> > city2 1 x >> > city2 2 x >> > city2 3 x >> > city2 4 x >> > city3 1 x >> > city3 2 x >> > city3 3 x >> > city3 4 x >> > city4 1 x >> > city4 2 x >> > city4 3 y >> > city4 4 y >> > city5 3 - >> > city5 4 -') >> > >> > my attempt >> > test2 <- data.table(DF4, key="city,var") >> > ID1<- test2[ !duplicated(test2),] >> > dps <- ID1$city[duplicated(ID1$city)] >> > Ddup <- which(test2$city %in% dps) >> > >> > if(length(Ddup) !=0) { >> >test2 <- test2[- Ddup,] } >> > >> > want <- data.frame(test2) >> > >> > >> > I want get the following result but I am not getting it. >> > >> > city wk var >> >city2 1 x >> >city2 2 x >> >city2 3 x >> >city2 4 x >> >city3 1 x >> >city3 2 x >> > city3 3 x >> > city3 4 x >> > city5 3 - >> > city5 4 - >> > >> > Can some help me out the problem is? >> > >> > Thank you. >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] find and
Hi all, I am trying to find a city that do not have the same "var" value. Within city the var should be the same otherwise exclude the city from the final data set. Here is my sample data and my attempt. City1 and city4 should be excluded. DF4 <- read.table(header=TRUE, text=' city wk var city1 1 x city1 2 - city1 3 x city2 1 x city2 2 x city2 3 x city2 4 x city3 1 x city3 2 x city3 3 x city3 4 x city4 1 x city4 2 x city4 3 y city4 4 y city5 3 - city5 4 -') my attempt test2 <- data.table(DF4, key="city,var") ID1<- test2[ !duplicated(test2),] dps <- ID1$city[duplicated(ID1$city)] Ddup <- which(test2$city %in% dps) if(length(Ddup) !=0) { test2 <- test2[- Ddup,] } want <- data.frame(test2) I want get the following result but I am not getting it. city wk var city2 1 x city2 2 x city2 3 x city2 4 x city3 1 x city3 2 x city3 3 x city3 4 x city5 3 - city5 4 - Can some help me out the problem is? Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeat
Thank you so much David! But if all element of a group has '-' did not work. In this case year 2006 an example If all values of flag are '-' within year then I wan to set as N dat=read.table(text = "Year month flag 2001 1 Z 2001 2 - 2001 4 X 2002 1 Z 2002 2 - 2003 1 - 2003 2 Z 2004 2 Z 2005 3 Z 2005 2 - 2005 3 - 2006 1 - 2006 2 - ", header = TRUE) dat$new <- with(dat, ave(flag, Year, FUN=function(s){ s[s=="-"] <- NA; zoo::na.locf(s) }) ) Error in `[<-.factor`(`*tmp*`, i, value = integer(0)) : replacement has length zero On Sat, Feb 25, 2017 at 5:43 PM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On Feb 25, 2017, at 10:45 AM, Ashta <sewa...@gmail.com> wrote: >> >> Thank you David. >> is it not possible to sort it by year and flag so that we can make '-' >> in the second row ? like this for that particular year. >> >> 2003 2 Z >> 2003 1 - >> > > I was a bit surprised by the results of htis since I had assumed than an > initial NA in a group would remain so, but apparently not: > > dat$new <- with(dat, ave(flag, Year, FUN=function(s){ s[s=="-"] <- NA; > zoo::na.locf(s) }) ) > >> dat >Year month flag new > 1 2001 1Z Z > 2 2001 2- Z > 3 2001 4X X > 4 2002 1Z Z > 5 2002 2- Z > 6 2003 1- Z > 7 2003 2Z Z > 8 2004 2Z Z > 9 2005 3Z Z > 10 2005 2- Z > 11 2005 3- Z > > David. > >> >> >> On Sat, Feb 25, 2017 at 12:14 PM, David Winsemius >> <dwinsem...@comcast.net> wrote: >>> >>>> On Feb 25, 2017, at 8:09 AM, Ashta <sewa...@gmail.com> wrote: >>>> >>>> I have a data set and I want to repeat a column value based on other >>>> column value, >>>> >>>> my data look like >>>> >>>> read.table(text = "Year month flag >>>> 2001 1 Z >>>> 2001 2 - >>>> 2001 4 X >>>> 2002 1 Z >>>> 2002 2 - >>>> 2003 1 - >>>> 2003 2 Z >>>> 2004 2 Z >>>> 2005 3 Z >>>> 2005 2 - >>>> 2005 3 -", header = TRUE) >>>> >>>> Within year If flag = '-' then i want replace '-' by the previous >>>> row value of flag. In this example for yea 2001 in month 2 flag is >>>> '-' and I want replace it by the previous value of flag (i.e., 'Z') >>>> 2001 1 Z >>>> 2001 2 Z >>>> 2001 4 X >>>> >>>> If all values of flag are '-' within year then I wan to set as N >>>> >>>> The complete out put result will be >>>> >>>> year month flag >>>> 2001 1 Z >>>> 2001 2 z >>>> 2001 4 X >>>> 2002 1 Z >>>> 2002 2 Z >>>> 2003 1 Z >>>> 2003 2 Z >>>> 2004 2 Z >>>> 2005 3 Z >>>> 2005 2 N >>>> 2005 3 N >>>> >>>> Thank you in advance >>>> >>> >>> Your example doesn't actually match your verbal description of the >>> algorithm because you have not specified the rule that establishes values >>> for instances where the first value in a year is "-". >>> >>> The `na.locf` function in the 'zoo' package would be useful for the task >>> describe in your verbal description when used in conjunction with the >>> 'stats'-package's `ave` function. >>> >>> -- >>> David. >>> >>> >>>> __ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius >>> Alameda, CA, USA >>> > > David Winsemius > Alameda, CA, USA > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeat
Thank you David. is it not possible to sort it by year and flag so that we can make '-' in the second row ? like this for that particular year. 2003 2 Z 2003 1 - On Sat, Feb 25, 2017 at 12:14 PM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On Feb 25, 2017, at 8:09 AM, Ashta <sewa...@gmail.com> wrote: >> >> I have a data set and I want to repeat a column value based on other >> column value, >> >> my data look like >> >> read.table(text = "Year month flag >> 2001 1 Z >> 2001 2 - >> 2001 4 X >> 2002 1 Z >> 2002 2 - >> 2003 1 - >> 2003 2 Z >> 2004 2 Z >> 2005 3 Z >> 2005 2 - >> 2005 3 -", header = TRUE) >> >> Within year If flag = '-' then i want replace '-' by the previous >> row value of flag. In this example for yea 2001 in month 2 flag is >> '-' and I want replace it by the previous value of flag (i.e., 'Z') >> 2001 1 Z >> 2001 2 Z >> 2001 4 X >> >> If all values of flag are '-' within year then I wan to set as N >> >> The complete out put result will be >> >> year month flag >> 2001 1 Z >> 2001 2 z >> 2001 4 X >> 2002 1 Z >> 2002 2 Z >> 2003 1 Z >> 2003 2 Z >> 2004 2 Z >> 2005 3 Z >> 2005 2 N >> 2005 3 N >> >> Thank you in advance >> > > Your example doesn't actually match your verbal description of the algorithm > because you have not specified the rule that establishes values for instances > where the first value in a year is "-". > > The `na.locf` function in the 'zoo' package would be useful for the task > describe in your verbal description when used in conjunction with the > 'stats'-package's `ave` function. > > -- > David. > > >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Repeat
I have a data set and I want to repeat a column value based on other column value, my data look like read.table(text = "Year month flag 2001 1 Z 2001 2 - 2001 4 X 2002 1 Z 2002 2 - 2003 1 - 2003 2 Z 2004 2 Z 2005 3 Z 2005 2 - 2005 3 -", header = TRUE) Within year If flag = '-' then i want replace '-' by the previous row value of flag. In this example for yea 2001 in month 2 flag is '-' and I want replace it by the previous value of flag (i.e., 'Z') 2001 1 Z 2001 2 Z 2001 4 X If all values of flag are '-' within year then I wan to set as N The complete out put result will be year month flag 2001 1 Z 2001 2 z 2001 4 X 2002 1 Z 2002 2 Z 2003 1 Z 2003 2 Z 2004 2 Z 2005 3 Z 2005 2 N 2005 3 N Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read
Hi all, I have a script that reads a file (dat.csv) from several folders. However, in some folders the file name is (dat) with out csv and in other folders it is dat.csv. The format of data is the same(only the file name differs with and without "csv". Is it possible to read these files depending on their name in one? like read.csv("dat.csv"). How can I read both type of file names? Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difference
Hi all thank you very much for your help. Worked very well for that data set. I just found out that one of the data sets have another level and do the same thing, I want to calculate the difference between successive row values (num) to the first row value within city and year. city, year, num 1, 2001,25 1, 2001,75 1, 2001,150 1, 2002,35 1, 2002,65 1, 2002,120 2, 2001,25 2, 2001,95 2, 2001,150 2, 2002,35 2, 2002,110 2, 2002,120 The result will be city,year,num,Diff 1, 2001,25, 0 1, 2001,75, 50 1, 2001,150, 125 1, 2002,35, 0 1, 2002,65, 30 1, 2002,120, 85 2, 2001,25, 0 2, 2001,95, 70 2, 2001,150, 125 2, 2002,35, 0 2, 2002,110, 75 2, 2002,120, 85 Thank you again On Fri, Oct 28, 2016 at 4:08 AM, P Tennant <philipt...@iinet.net.au> wrote: > Hi, > > You could use an anonymous function to operate on each `year-block' of your > dataset, then assign the result as a new column: > > d <- data.frame(year=c(rep(2001, 3), rep(2002, 3)), > num=c(25,75,150,30,85,95)) > > d$diff <- unlist(by(d$num, d$year, function(x) x - x[1])) > d > > year num diff > 1 2001 250 > 2 2001 75 50 > 3 2001 150 125 > 4 2002 300 > 5 2002 85 55 > 6 2002 95 65 > > > Philip > > > On 28/10/2016 3:20 PM, Ashta wrote: >> >> Hi all, >> >> I want to calculate the difference between successive row values to >> the first row value within year. >> How do I get that? >> >> Here isthe sample of data >> Year Num >> 200125 >> 200175 >> 2001 150 >> 200230 >> 200285 >> 200295 >> >> Desired output >> Year Num diff >> 200125 0 >> 200175 50 >> 2001 150125 >> 2002300 >> 200285 55 >> 200295 65 >> >> Thank you. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] difference
Hi all, I want to calculate the difference between successive row values to the first row value within year. How do I get that? Here isthe sample of data Year Num 200125 200175 2001 150 200230 200285 200295 Desired output Year Num diff 200125 0 200175 50 2001 150125 2002300 200285 55 200295 65 Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subset and sumerize
Hi all, I am trying to summarize big data set by selecting a row conditionally. and tried to do it in a loop Here is the sample of my data and my attempt dat<-read.table(text=" ID,x1,x2,y 1,a,b,15 1,x,z,21 1,x,b,16 1,x,k,25 2,d,z,31 2,x,z,28 2,g,t,41 3,h,e,32 3,x,z,38 3,x,g,45 ",sep=",",header=TRUE) For each unique ID, I want to select a data when x1= "x" and x2="z" Here is the selected data (newdat) ID,x1,x2,y 1,x,z,21 2,x,z,28 3,x,z,38 Then I want summarize Y values and out put as follows Summerize summary(newdat[i]) ## ID Min. 1st Qu. MedianMean 3rd Qu.Max. 1 2 3 . . . 28 Here is my attempt but did not work, trt=c(1:28) for(i in 1:length (trt)) { day[i]= newdat[which(newdat$ID== trt[i] & newdat$x1 =="x" & newdat$x2 =="z"),] NR[i]=dim(day[i])[1] print(paste("Number of Records :", NR[i])) sm[i]=summary(day[i]) } Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create variable
Hi David and all, I want run the following script in a loop but faced difficulty. trt=c(1,2,2,4,5,6,7,8) for(i in 1:length (trt)) { try[i] <- (select trt, date1, date2, datediff(date1,date2) as d12diff [i] from dateTable where trt=[i]") } I would appreciate if you point me the problem. Thank you in advance On Sun, Oct 9, 2016 at 11:16 AM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On Oct 9, 2016, at 7:56 AM, Ashta <sewa...@gmail.com> wrote: >> >> I am trying to query data from Hive service and create a variable. >> >> >> dbGetQuery(hivecon,"select date1, date2 from dateTable limit 10") >> date1, date2, Diif >> 4/5/1999, 6/14/2000 >> 7/2/1999, 6/26/2000 >> 8/14/1999, 8/19/2000 >> 11/10/1999, 9/18/2000 >> 8/25/2000, 6/5/2001 >> 3/14/2012, 3/15/2004 >> >> >> Here is what I wanted to do. While I am querying I want create a >> variable diff= dat1e1-date2. >> I may use this variable "diff" to do some statistics (mean, mode, >> etc) and also in the where clause l like as the following. >> >> test_date=dbGetQuery(hivecon,"select date1, date2 from dateTable >> where diff gt 1000 limit 10") >> >> I would appreciate if you suggest me how to do this. > > Sorry for the blank message earlier. My reading of the use of Hive queries is > that you would need to use the `datediff` function. I further suspect you > need to define a variable name to which then apply your limits. I also read > that hive dates are actually strings types represented as POSIX style > character values and might need a to_date funciton. This is all guesswork > since I don't have a hive cluster to run this against: > > So perhaps something like one of these: > > try1 <- dbGetQuery(hivecon,"select date1, date2, > datediff(TO_DATE(date1),TO_DATE(date2)) as d12diff from dateTable where > d12diff GT 1000 limit 10") > > try2 <- dbGetQuery(hivecon,"select date1, date2, datediff(dat1,date2) as > d12diff from dateTable where d12diff GT 1000 limit 10") > > Obviously these are just guesses. > > -- > David. >> >> >> >> Here is the sample of the data and result >> >> date1, date2, Diif >> 4/5/1999, 6/14/2000, -436 >> 7/2/1999, 6/26/2000, -360 >> 8/14/1999, 8/19/2000, -371 >> 11/10/1999, 9/18/2000, -313 >> 8/25/2000, 6/5/2001, -284 >> 3/14/2012, 3/15/2004, 2921 >> >> Thank you in advance >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create variable
Thank you so much David! Your suggestions worked for me. On Sun, Oct 9, 2016 at 11:16 AM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On Oct 9, 2016, at 7:56 AM, Ashta <sewa...@gmail.com> wrote: >> >> I am trying to query data from Hive service and create a variable. >> >> >> dbGetQuery(hivecon,"select date1, date2 from dateTable limit 10") >> date1, date2, Diif >> 4/5/1999, 6/14/2000 >> 7/2/1999, 6/26/2000 >> 8/14/1999, 8/19/2000 >> 11/10/1999, 9/18/2000 >> 8/25/2000, 6/5/2001 >> 3/14/2012, 3/15/2004 >> >> >> Here is what I wanted to do. While I am querying I want create a >> variable diff= dat1e1-date2. >> I may use this variable "diff" to do some statistics (mean, mode, >> etc) and also in the where clause l like as the following. >> >> test_date=dbGetQuery(hivecon,"select date1, date2 from dateTable >> where diff gt 1000 limit 10") >> >> I would appreciate if you suggest me how to do this. > > Sorry for the blank message earlier. My reading of the use of Hive queries is > that you would need to use the `datediff` function. I further suspect you > need to define a variable name to which then apply your limits. I also read > that hive dates are actually strings types represented as POSIX style > character values and might need a to_date funciton. This is all guesswork > since I don't have a hive cluster to run this against: > > So perhaps something like one of these: > > try1 <- dbGetQuery(hivecon,"select date1, date2, > datediff(TO_DATE(date1),TO_DATE(date2)) as d12diff from dateTable where > d12diff GT 1000 limit 10") > > try2 <- dbGetQuery(hivecon,"select date1, date2, datediff(dat1,date2) as > d12diff from dateTable where d12diff GT 1000 limit 10") > > Obviously these are just guesses. > > -- > David. >> >> >> >> Here is the sample of the data and result >> >> date1, date2, Diif >> 4/5/1999, 6/14/2000, -436 >> 7/2/1999, 6/26/2000, -360 >> 8/14/1999, 8/19/2000, -371 >> 11/10/1999, 9/18/2000, -313 >> 8/25/2000, 6/5/2001, -284 >> 3/14/2012, 3/15/2004, 2921 >> >> Thank you in advance >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] create variable
I am trying to query data from Hive service and create a variable. dbGetQuery(hivecon,"select date1, date2 from dateTable limit 10") date1, date2, Diif 4/5/1999, 6/14/2000 7/2/1999, 6/26/2000 8/14/1999, 8/19/2000 11/10/1999, 9/18/2000 8/25/2000, 6/5/2001 3/14/2012, 3/15/2004 Here is what I wanted to do. While I am querying I want create a variable diff= dat1e1-date2. I may use this variable "diff" to do some statistics (mean, mode, etc) and also in the where clause l like as the following. test_date=dbGetQuery(hivecon,"select date1, date2 from dateTable where diff gt 1000 limit 10") I would appreciate if you suggest me how to do this. Here is the sample of the data and result date1, date2, Diif 4/5/1999, 6/14/2000, -436 7/2/1999, 6/26/2000, -360 8/14/1999, 8/19/2000, -371 11/10/1999, 9/18/2000, -313 8/25/2000, 6/5/2001, -284 3/14/2012, 3/15/2004, 2921 Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix
HI Denes, Duncan,Michael and all, Thank you very much for the helpful suggestion. Some of my data sets were not square matrix, however, Denes's suggestion," as.data.frame.table() ", handled that one. Thank you again. On Sat, Jul 16, 2016 at 7:27 PM, Dénes Tóth <toth.de...@ttk.mta.hu> wrote: > > > On 07/17/2016 01:39 AM, Duncan Murdoch wrote: >> >> On 16/07/2016 6:25 PM, Ashta wrote: >> > Hi all, >> > >> > I have a large square matrix (60 x 60) and found it hard to >> > visualize. Is it possible to change it as shown below? >> > >> > Sample example (3 x 3) >> > >> > A B C >> > A 3 4 5 >> > B 4 7 8 >> > C 5 8 9 >> > >> > Desired output >> > A A 3 >> > A B 4 >> > A C 5 >> > B B 7 >> > B C 8 >> > C C 9 >> >> Yes, use matrix indexing. I don't think the 3600 values are going to be >> very easy to read, but here's how to produce them: >> >> m <- matrix(1:3600, 60, 60) >> indices <- expand.grid(row = 1:60, col = 1:60) >> cbind(indices$row, indices$col, m[as.matrix(indices)]) >> > > Or use as.data.frame.table(): > > m <- matrix(1:9, 3, 3, > dimnames = list(dimA = letters[1:3], > dimB = letters[1:3])) > m > as.data.frame.table(m, responseName = "value") > > --- > > I do not know what you mean by "visualize", but image() or heatmap() are > good starting points if you need a plot of the values. If you really need to > inspect the raw values, you can try interactive (scrollable) tables, e.g.: > > library(DT) > m <- provideDimnames(matrix(1:3600, 60, 60)) > datatable(m, options = list(pageLength = 60)) > > > Cheers, > Denes > > > > >> Duncan Murdoch >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix
Hi all, I have a large square matrix (60 x 60) and found it hard to visualize. Is it possible to change it as shown below? Sample example (3 x 3) A B C A 3 4 5 B 4 7 8 C 5 8 9 Desired output A A 3 A B 4 A C 5 B B 7 B C 8 C C 9 Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] not common records
Thank you Jeff. Solved. On Fri, Jun 3, 2016 at 12:47 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > ?merge > > Pay attention to the all-whatever parameters. > -- > Sent from my phone. Please excuse my brevity. > > On June 2, 2016 7:04:47 PM PDT, Ashta <sewa...@gmail.com> wrote: >> >> I have 2 data sets. File1 and File2. Some records are common to both >> data sets. For those common records I want get the difference between >> d_x1z1= z1-x1 and d_x2z2= z2-x2. >> >> File1<- data.frame(var = c(561,752,800,900), x1= c(23,35,40,15), x2= >> c(125,284,280,347)) >> File2<- data.frame(var = c(561,752,800,1001), z1= c(43,45,40,65), z2= >> c(185,299,280,310)) >> >> Record 900 15347 appears only in File1 >> Record 100165310 appears only in File2 >> >> File3 should look like as follows >> >> File3 >> var x1 x2 z1z2d_x1z1 d_x2z2 >> 561 23125 43165 20 40 >> 752 35284 45299 8 15 >> 800 40280 40280 0 0 >> 900 15347 NA NA NA NA >> 1001 NA NA 65310 NA NA >> >> How do I get those record not common in both data sets ? >> merge( >> File1,File2) gave me only for common "var" >> >> >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] not common records
I have 2 data sets. File1 and File2. Some records are common to both data sets. For those common records I want get the difference between d_x1z1= z1-x1 and d_x2z2= z2-x2. File1<- data.frame(var = c(561,752,800,900), x1= c(23,35,40,15), x2= c(125,284,280,347)) File2<- data.frame(var = c(561,752,800,1001), z1= c(43,45,40,65), z2= c(185,299,280,310)) Record 900 15347 appears only in File1 Record 100165310 appears only in File2 File3 should look like as follows File3 var x1 x2 z1z2d_x1z1 d_x2z2 561 23125 43165 20 40 752 35284 45299 8 15 800 40280 40280 0 0 900 15347 NA NA NA NA 1001 NA NA 65310 NA NA How do I get those record not common in both data sets ? merge( File1,File2) gave me only for common "var" __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] month and output
Thank you David! On Sat, May 7, 2016 at 12:18 AM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On May 6, 2016, at 5:15 PM, Ashta <sewa...@gmail.com> wrote: >> >> Thank you very much David. >> >> So there is no general formal that works year all round. >> >> The first one work only Jan to Nov >> today <- Sys.Date() >> nextmo<- paste0( month.abb[ as.numeric(format(today, format="%m"))+1] , >> format(today,"%Y") ) >> [1] "Jun2016" >> >> The second one works only for the last month of the year. >> today <- as.Date("2008-12-01") >> nextmo<- paste0(m <- month.abb[(as.numeric(format(today, >> format="%m"))+1) %/% 12] , >> as.numeric( format(today,"%Y") ) + (m == "Jan") ) > > Sorry; > > This works as intended: > >> today <- seq( from=as.Date("2008-1-01"), length=13, by="1 mo" ) >> >> nextmo<- paste0( m <- month.abb[ as.numeric(format(today, format="%m")) %% >> 12+1] , > +as.numeric( format(today,"%Y") ) + (m=="Jan") ); nextmo > [1] "Feb2008" "Mar2008" "Apr2008" "May2008" "Jun2008" "Jul2008" "Aug2008" > "Sep2008" > [9] "Oct2008" "Nov2008" "Dec2008" "Jan2009" "Feb2009" > > > >> nextmo >> >> >> Many thanks >> >> >> >> >> >> On Fri, May 6, 2016 at 6:40 PM, David Winsemius <dwinsem...@comcast.net> >> wrote: >>> >>>> On May 6, 2016, at 4:30 PM, David Winsemius <dwinsem...@comcast.net> wrote: >>>> >>>> >>>>> On May 6, 2016, at 4:11 PM, Ashta <sewa...@gmail.com> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I am trying to ge get the next month of the year. >>>>> >>>>> today <- Sys.Date() >>>>> xx<- format(today, format="%B%Y") >>>>> >>>>> I got "May2016", but I want Jun2016. How do I do that? >>>> >>>> today <- Sys.Date() >>>> nextmo<- paste0( month.abb[ as.numeric(format(today, format="%m"))+1] , >>>>format(today,"%Y") ) >>>> [1] "Jun2016" >>> >>> It occurred to me that at the end of the year you would want to increment >>> the year as well. This calculates the next month and increments the year >>> value if needed: >>> >>> today <- as.Date("2008-12-01") >>> nextmo<- paste0(m <- month.abb[(as.numeric(format(today, format="%m"))+1) >>> %/% 12] , >>> as.numeric( format(today,"%Y") ) + (m == "Jan") ) >>> nextmo >>> #[1] "Jan2009" >>>> >>>>> >>>>> My other question is that, I read a data and do some analysis and I >>>>> want to send all the results of the analysis to a pdf file >>>>> >>>>> Example >>>>> x5 <- runif(15, 5.0, 7.5) >>>>> x5 >>>>> >>>>> >>>>> I tried this one >>>>> >>>>> pdf(file=" test.pdf") >>>>> x5 >>>>> dev.off() >>>> >>>> pdf() opens a graphics device, so you need a function that establishes a >>>> coordinate system: >>>> >>>> x5 <- runif(15, 5.0, 7.5) >>>> pdf(file=" test.pdf"); >>>> plot(1,1,type="n") >>>> text(1, 1, paste(round(x5, 2), collapse="\n") ) >>>> dev.off() >>>> >>> >>> If you need to suppress the axes and their labels: >>> >>> pdf(file=" test.pdf"); plot(1,1, type="n", axes=FALSE, xlab="", ylab="") >>> text(1, 1, paste(round(x5, 2), collapse="\n") ) >>> dev.off() >>> >>>> I doubt that this is what you really want, and suspect you really need to >>>> be studying the capabilities supported by the knitr package. If I'm wrong >>>> about that and you want a system that supports drawing and text on a blank >>>> page, then first study: >>>> >>>>> library(grid) >>>>> help(pac=grid) >>>> >>>> If you choose that route then the text "R Graphics" by Paul Murrell will >>>> be indispensable. >>>> >>>> -- >>>> David Winsemius >>>> Alameda, CA, USA >>>> >>>> __ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius >>> Alameda, CA, USA >>> > > David Winsemius > Alameda, CA, USA > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] month and output
Thank you very much David. So there is no general formal that works year all round. The first one work only Jan to Nov today <- Sys.Date() nextmo<- paste0( month.abb[ as.numeric(format(today, format="%m"))+1] , format(today,"%Y") ) [1] "Jun2016" The second one works only for the last month of the year. today <- as.Date("2008-12-01") nextmo<- paste0(m <- month.abb[(as.numeric(format(today, format="%m"))+1) %/% 12] , as.numeric( format(today,"%Y") ) + (m == "Jan") ) nextmo Many thanks On Fri, May 6, 2016 at 6:40 PM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On May 6, 2016, at 4:30 PM, David Winsemius <dwinsem...@comcast.net> wrote: >> >> >>> On May 6, 2016, at 4:11 PM, Ashta <sewa...@gmail.com> wrote: >>> >>> Hi all, >>> >>> I am trying to ge get the next month of the year. >>> >>> today <- Sys.Date() >>> xx<- format(today, format="%B%Y") >>> >>> I got "May2016", but I want Jun2016. How do I do that? >> >> today <- Sys.Date() >> nextmo<- paste0( month.abb[ as.numeric(format(today, format="%m"))+1] , >> format(today,"%Y") ) >> [1] "Jun2016" > > It occurred to me that at the end of the year you would want to increment the > year as well. This calculates the next month and increments the year value if > needed: > > today <- as.Date("2008-12-01") > nextmo<- paste0(m <- month.abb[(as.numeric(format(today, format="%m"))+1) > %/% 12] , > as.numeric( format(today,"%Y") ) + (m == "Jan") ) > nextmo > #[1] "Jan2009" >> >>> >>> My other question is that, I read a data and do some analysis and I >>> want to send all the results of the analysis to a pdf file >>> >>> Example >>> x5 <- runif(15, 5.0, 7.5) >>> x5 >>> >>> >>> I tried this one >>> >>> pdf(file=" test.pdf") >>> x5 >>> dev.off() >> >> pdf() opens a graphics device, so you need a function that establishes a >> coordinate system: >> >> x5 <- runif(15, 5.0, 7.5) >> pdf(file=" test.pdf"); >> plot(1,1,type="n") >> text(1, 1, paste(round(x5, 2), collapse="\n") ) >> dev.off() >> > > If you need to suppress the axes and their labels: > > pdf(file=" test.pdf"); plot(1,1, type="n", axes=FALSE, xlab="", ylab="") > text(1, 1, paste(round(x5, 2), collapse="\n") ) > dev.off() > >> I doubt that this is what you really want, and suspect you really need to be >> studying the capabilities supported by the knitr package. If I'm wrong about >> that and you want a system that supports drawing and text on a blank page, >> then first study: >> >>> library(grid) >>> help(pac=grid) >> >> If you choose that route then the text "R Graphics" by Paul Murrell will be >> indispensable. >> >> -- >> David Winsemius >> Alameda, CA, USA >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] month and output
Hi all, I am trying to ge get the next month of the year. today <- Sys.Date() xx<- format(today, format="%B%Y") I got "May2016", but I want Jun2016. How do I do that? My other question is that, I read a data and do some analysis and I want to send all the results of the analysis to a pdf file Example x5 <- runif(15, 5.0, 7.5) x5 I tried this one pdf(file=" test.pdf") x5 dev.off() I found the file is empty. I would appreciate if you help me out. Thanks in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] flag a record
Thank you very much Jim! It is working fine!! On Sun, Feb 28, 2016 at 1:46 AM, Jim Lemon <drjimle...@gmail.com> wrote: > Hi Ashta, > This does not seem too difficult: > > DF$flag<-"n" > for(thisname in unique(DF$Name)) { > if(any(DF$year[DF$Name == thisname] %in% c(2014,2015) & > DF$tag[DF$Name == thisname])) > DF$flag[DF$Name == thisname]<-"y" > } > > Jim > > On Sun, Feb 28, 2016 at 1:23 PM, Ashta <sewa...@gmail.com> wrote: >> Hi all, >> >> I have a data set represented by the following sample. >> >> I want flag records of an individual as "N", if if the tag column of >> an individual is equal to zero for the last two years. So in the >> following example, Alex1 records are flagged as "y", On the other >> hand Carla's records are flagged as "N" because all values of tag for >> Carla are zero. Another typical example is that Jon, although the tag >> values of Jon are greater than 0 it is flagged as "N", because his >> record are more than two years old. >> >> DF <- read.table(textConnection(" Name year tag >> Alex12011 0 >> Alex12012 1 >> Alex12013 0 >> Alex12014 1 >> >> Carla 2013 0 >> Carla 2014 0 >> Carla 2015 0 >> Carla 2012 0 >> >> Tom 2014 1 >> Tom 2015 1 >> >> Jon 2010 1 >> Jon 2011 1"),header = TRUE) >> >> I want create another variable " Flag with value Y or N" if an >> individual has a value greater than 0 in the tag column for the last >> two years then the flag value will be y otherwise it n. >> >> >> the outcome will be >> name year tagFlag >> Alex12011 0 y >> Alex12012 1 y >> Alex12013 0 y >> Alex12014 1 y >> >> Carla 2013 0 n >> Carla 2014 0 n >> Carla 2015 0 n >> Carla 2012 0 n >> >> Tom 2014 1 y >> Tom 2015 1 y >> >> Jon 2010 1 n >> Jon 2011 1 n >> >> Thank you in advance >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] flag a record
Hi all, I have a data set represented by the following sample. I want flag records of an individual as "N", if if the tag column of an individual is equal to zero for the last two years. So in the following example, Alex1 records are flagged as "y", On the other hand Carla's records are flagged as "N" because all values of tag for Carla are zero. Another typical example is that Jon, although the tag values of Jon are greater than 0 it is flagged as "N", because his record are more than two years old. DF <- read.table(textConnection(" Name year tag Alex12011 0 Alex12012 1 Alex12013 0 Alex12014 1 Carla 2013 0 Carla 2014 0 Carla 2015 0 Carla 2012 0 Tom 2014 1 Tom 2015 1 Jon 2010 1 Jon 2011 1"),header = TRUE) I want create another variable " Flag with value Y or N" if an individual has a value greater than 0 in the tag column for the last two years then the flag value will be y otherwise it n. the outcome will be name year tagFlag Alex12011 0 y Alex12012 1 y Alex12013 0 y Alex12014 1 y Carla 2013 0 n Carla 2014 0 n Carla 2015 0 n Carla 2012 0 n Tom 2014 1 y Tom 2015 1 y Jon 2010 1 n Jon 2011 1 n Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix summary
hi all, I have a square matrix (1000 by 1000), 1. I want calculate mean, min and max values for each column and row. 2, I want pick the coordinate value of the matrix that has the max and min value for each row and column. This an example 4 by 4 square matrix MeanMinMax 117 1213 2140.75 12117 213211 1 16.25 1 32 654323 7 34.57 65 586178 957358 95 Mean652537 31.25 Min2112111 Max 117617895 Thank you __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] LDheatmap
Hi all, I am looking for an R package that calculates a pair wise LD (linkage disequilibrium) I came up with library(LDheatmap). has any one used this library? I would appreciate if I get a help how to use this library for my set of data.. My data set look like Geno file Name1 1 1 2 2 2 2 Name2 2 2 2 2 2 2 Name3 2 2 2 2 2 2 Name4 2 2 2 2 2 2 Name5 1 1 2 2 2 2 NameN 1 1 1 2 2 2 2 The other file is map file Chromosome, SNP, Location (physical) Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional Random selection
Thank you David! I rerun the your script and it is giving me the first three time periods is it doing random sampling? tab.fan time X1 X2 22 5 230 33 1 300 55 2 10 On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote: > Use dput() to send data to the list as it is more compact: > >> dput(tab) > structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, > 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time", > "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) > > You can just remove the lines with X1 = 0 since you don't want to use them. > >> tab.sub <- tab[tab$X1>0, ] > > Then the following gives you a sample: > >> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] > > Note, that your "solution" of times 6, 7, and 8 will never appear because the > sum of the values is 586. > > > David L. Carlson > Department of Anthropology > Texas A University > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta > Sent: Saturday, November 21, 2015 11:53 AM > To: R help <r-help@r-project.org> > Subject: [R] Conditional Random selection > > Hi all, > > I have a data set that contains samples collected over time. In > each time period the total number of samples are given (X2) The goal > is to select 500 random samples.The selection should be based on > time (select time periods until I reach 500 samples). Also the time > period should have greater than 0 for X1 variable. X1 is an indicator > variable. > > Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 > > tab <- read.table(textConnection(" time X1 X2 > 1 0251 > 2 5230 > 3 1300 > 4 0 25 > 5 2 10 > 6 3 101 > 7 1 300 > 8 4 185 "),header = TRUE) > > In the above example, samples from time 1 and 4 will not be selected > ( X1 is zero) > So I could reach my target by selecting time 6,7, and 8 or time 2 and > 3 and so on. > > Can any one help to do that? > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional Random selection
Hi all, I have a data set that contains samples collected over time. In each time period the total number of samples are given (X2) The goal is to select 500 random samples.The selection should be based on time (select time periods until I reach 500 samples). Also the time period should have greater than 0 for X1 variable. X1 is an indicator variable. Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 tab <- read.table(textConnection(" time X1 X2 1 0251 2 5230 3 1300 4 0 25 5 2 10 6 3 101 7 1 300 8 4 185 "),header = TRUE) In the above example, samples from time 1 and 4 will not be selected ( X1 is zero) So I could reach my target by selecting time 6,7, and 8 or time 2 and 3 and so on. Can any one help to do that? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional Random selection
Hi Bert and all, I have related question. In each time period there were different locations where the samples were collected (S1). I want count the number of unique locations (S1) for each unique time period . So in time 1 the samples were collected from two locations and time 2 only from one location and time 3 from three locations.. tab <- read.table(textConnection(" time S1 rep 1 1 1 1 2 1 1 2 2 2 1 1 2 1 2 2 1 3 2 1 4 3 1 1 3 2 1 3 3 1 "),header = TRUE) what I want is time S1 12 21 33 Thank you again. On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewa...@gmail.com> wrote: > Thank you Bert! > > What I want is at least 500 samples based on random sampling of time > period. This allows samples collected at the same time period are > included together. > > Your script is doing what I wanted to do!! > > Many thanks > > > > > On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: >> David's "solution" is incorrect. It can also fail to give you times >> with a total of 500 items to sample from in the time periods. >> >> It is not entirely clear what you want. The solution below gives you a >> random sample of time periods in which X1>0 and the total number of >> samples among them is >= 500. It does not give you the fewest number >> of periods that can do this. Is this what you want? >> >> tab[with(tab,{ >> rownums<- sample(seq_len(nrow(tab))[X1>0]) >> sz <- cumsum(X2[rownums]) >> rownums[c(TRUE,sz<500)] >> }),] >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >>-- Clifford Stoll >> >> >> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewa...@gmail.com> wrote: >>> Thank you David! >>> >>> I rerun the your script and it is giving me the first three time periods >>> is it doing random sampling? >>> >>> tab.fan >>> time X1 X2 >>> 22 5 230 >>> 33 1 300 >>> 55 2 10 >>> >>> >>> >>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote: >>>> Use dput() to send data to the list as it is more compact: >>>> >>>>> dput(tab) >>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = >>>> c("time", >>>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >>>> >>>> You can just remove the lines with X1 = 0 since you don't want to use them. >>>> >>>>> tab.sub <- tab[tab$X1>0, ] >>>> >>>> Then the following gives you a sample: >>>> >>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >>>> >>>> Note, that your "solution" of times 6, 7, and 8 will never appear because >>>> the sum of the values is 586. >>>> >>>> >>>> David L. Carlson >>>> Department of Anthropology >>>> Texas A University >>>> >>>> -Original Message- >>>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta >>>> Sent: Saturday, November 21, 2015 11:53 AM >>>> To: R help <r-help@r-project.org> >>>> Subject: [R] Conditional Random selection >>>> >>>> Hi all, >>>> >>>> I have a data set that contains samples collected over time. In >>>> each time period the total number of samples are given (X2) The goal >>>> is to select 500 random samples.The selection should be based on >>>> time (select time periods until I reach 500 samples). Also the time >>>> period should have greater than 0 for X1 variable. X1 is an indicator >>>> variable. >>>> >>>> Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 >>>> >>>> tab <- read.table(textConnection(" time X1 X2 >>>> 1 0251 >>>> 2 5230 >>>> 3 1300 >>>> 4 0 25 >>>> 5 2 10 >>>> 6 3 101 >>>> 7 1 300
Re: [R] Conditional Random selection
Hi Rui , I tried that one before I send out my original message. it gave me only this, tapply(tab$S1, tab$time, function(x) length(unique(x))) 1 2 3 2 1 3 I am expecting an output of like this time S1 12 21 33 On Sat, Nov 21, 2015 at 2:38 PM, <ruipbarra...@sapo.pt> wrote: > Hello, > > Try > > tapply(tab$S1, tab$time, function(x) length(unique(x))) > > Hope this helps, > > Rui Barradas > > > Citando Ashta <sewa...@gmail.com>: > > Hi Bert and all, > I have related question. In each time period there were different > locations where the samples were collected (S1). I want count the > number of unique locations (S1) for each unique time period . So in > time 1 the samples were collected from two locations and time 2 only > from one location and time 3 from three locations.. > > tab <- read.table(textConnection(" time S1 rep > 1 1 1 > 1 2 1 > 1 2 2 > 2 1 1 > 2 1 2 > 2 1 3 > 2 1 4 > 3 1 1 > 3 2 1 > 3 3 1 "),header = TRUE) > > what I want is > > time S1 >12 >21 >33 > > Thank you again. > > > > On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewa...@gmail.com> wrote: > > Thank you Bert! > > What I want is at least 500 samples based on random sampling of time > period. This allows samples collected at the same time period are > included together. > > Your script is doing what I wanted to do!! > > Many thanks > > > > > On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > > David's "solution" is incorrect. It can also fail to give you times > with a total of 500 items to sample from in the time periods. > > It is not entirely clear what you want. The solution below gives you a > random sample of time periods in which X1>0 and the total number of > samples among them is >= 500. It does not give you the fewest number > of periods that can do this. Is this what you want? > > tab[with(tab,{ > rownums<- sample(seq_len(nrow(tab))[X1>0]) > sz <- cumsum(X2[rownums]) > rownums[c(TRUE,sz<500)] > }),] > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." >-- Clifford Stoll > > > On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewa...@gmail.com> wrote: > > Thank you David! > > I rerun the your script and it is giving me the first three time periods > is it doing random sampling? > > tab.fan > time X1 X2 > 22 5 230 > 33 1 300 > 55 2 10 > > > > On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote: > > Use dput() to send data to the list as it is more compact: > > dput(tab) > > structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, > 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = > c("time", > "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) > > You can just remove the lines with X1 = 0 since you don't want to use them. > > tab.sub <- tab[tab$X1>0, ] > > Then the following gives you a sample: > > tab.sub[cumsum(sample(tab.sub$X2))<=500, ] > > Note, that your "solution" of times 6, 7, and 8 will never appear because > the sum of the values is 586. > > > David L. Carlson > Department of Anthropology > Texas A University > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta > Sent: Saturday, November 21, 2015 11:53 AM > To: R help <r-help@r-project.org> > Subject: [R] Conditional Random selection > > Hi all, > > I have a data set that contains samples collected over time. In > each time period the total number of samples are given (X2) The goal > is to select 500 random samples.The selection should be based on > time (select time periods until I reach 500 samples). Also the time > period should have greater than 0 for X1 variable. X1 is an indicator > variable. > > Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 > > tab <- read.table(textConnection(" time X1 X2 > 1 0251 > 2 5230 > 3 1300 > 4 0 25 > 5 2 10 > 6 3 101 > 7 1 300 > 8 4 185 "),header = TRUE) > > In the above example, samples from time 1 and 4 will not be selected
Re: [R] Conditional Random selection
Thank you ! I was also able to do it this way, too! hc <- ddply(tab1, .(time), summarize, S1 = length(unique(S1))) On Sat, Nov 21, 2015 at 3:40 PM, <ruipbarra...@sapo.pt> wrote: > Hello, > > Is that a real doubt? Like Bert said, you should spend some time with an R > tutorial. All you need is to know how to form a data.frame. > > > tmp <- tapply(tab1$S1, tab1$time, function(x) length(unique(x))) > data.frame(time = names(tmp), S1 = tmp) > > Rui Barradas > > > Citando Ashta <sewa...@gmail.com>: > > Hi Rui , > > > > I tried that one before I send out my original message. > it gave me only this, > > tapply(tab$S1, tab$time, function(x) length(unique(x))) > 1 2 3 > 2 1 3 > > I am expecting an output of like this > > time S1 >12 >21 >33 > > > > > > > On Sat, Nov 21, 2015 at 2:38 PM, <ruipbarra...@sapo.pt> wrote: > > Hello, > > Try > > tapply(tab$S1, tab$time, function(x) length(unique(x))) > > Hope this helps, > > Rui Barradas > > > Citando Ashta <sewa...@gmail.com>: > > Hi Bert and all, > I have related question. In each time period there were different > locations where the samples were collected (S1). I want count the > number of unique locations (S1) for each unique time period . So in > time 1 the samples were collected from two locations and time 2 only > from one location and time 3 from three locations.. > > tab <- read.table(textConnection(" time S1 rep > 1 1 1 > 1 2 1 > 1 2 2 > 2 1 1 > 2 1 2 > 2 1 3 > 2 1 4 > 3 1 1 > 3 2 1 > 3 3 1 "),header = TRUE) > > what I want is > > time S1 >12 >21 >33 > > Thank you again. > > > > On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewa...@gmail.com> wrote: > > Thank you Bert! > > What I want is at least 500 samples based on random sampling of time > period. This allows samples collected at the same time period are > included together. > > Your script is doing what I wanted to do!! > > Many thanks > > > > > On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > > David's "solution" is incorrect. It can also fail to give you times > with a total of 500 items to sample from in the time periods. > > It is not entirely clear what you want. The solution below gives you a > random sample of time periods in which X1>0 and the total number of > samples among them is >= 500. It does not give you the fewest number > of periods that can do this. Is this what you want? > > tab[with(tab,{ > rownums<- sample(seq_len(nrow(tab))[X1>0]) > sz <- cumsum(X2[rownums]) > rownums[c(TRUE,sz<500)] > }),] > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." >-- Clifford Stoll > > > On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewa...@gmail.com> wrote: > > Thank you David! > > I rerun the your script and it is giving me the first three time periods > is it doing random sampling? > > tab.fan > time X1 X2 > 22 5 230 > 33 1 300 > 55 2 10 > > > > On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote: > > Use dput() to send data to the list as it is more compact: > > dput(tab) > > structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, > 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = > c("time", > "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) > > You can just remove the lines with X1 = 0 since you don't want to use them. > > tab.sub <- tab[tab$X1>0, ] > > Then the following gives you a sample: > > tab.sub[cumsum(sample(tab.sub$X2))<=500, ] > > Note, that your "solution" of times 6, 7, and 8 will never appear because > the sum of the values is 586. > > > David L. Carlson > Department of Anthropology > Texas A University > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta > Sent: Saturday, November 21, 2015 11:53 AM > To: R help <r-help@r-project.org> > Subject: [R] Conditional Random selection > > Hi all, > > I have a data set that contains samples collected over time. In > each time period the total number of samples are given (X2) The goal > is to select 500 random samp
Re: [R] Conditional Random selection
Thank you Bert! What I want is at least 500 samples based on random sampling of time period. This allows samples collected at the same time period are included together. Your script is doing what I wanted to do!! Many thanks On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > David's "solution" is incorrect. It can also fail to give you times > with a total of 500 items to sample from in the time periods. > > It is not entirely clear what you want. The solution below gives you a > random sample of time periods in which X1>0 and the total number of > samples among them is >= 500. It does not give you the fewest number > of periods that can do this. Is this what you want? > > tab[with(tab,{ > rownums<- sample(seq_len(nrow(tab))[X1>0]) > sz <- cumsum(X2[rownums]) > rownums[c(TRUE,sz<500)] > }),] > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." >-- Clifford Stoll > > > On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewa...@gmail.com> wrote: >> Thank you David! >> >> I rerun the your script and it is giving me the first three time periods >> is it doing random sampling? >> >> tab.fan >> time X1 X2 >> 22 5 230 >> 33 1 300 >> 55 2 10 >> >> >> >> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote: >>> Use dput() to send data to the list as it is more compact: >>> >>>> dput(tab) >>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = >>> c("time", >>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >>> >>> You can just remove the lines with X1 = 0 since you don't want to use them. >>> >>>> tab.sub <- tab[tab$X1>0, ] >>> >>> Then the following gives you a sample: >>> >>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >>> >>> Note, that your "solution" of times 6, 7, and 8 will never appear because >>> the sum of the values is 586. >>> >>> >>> David L. Carlson >>> Department of Anthropology >>> Texas A University >>> >>> -Original Message- >>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta >>> Sent: Saturday, November 21, 2015 11:53 AM >>> To: R help <r-help@r-project.org> >>> Subject: [R] Conditional Random selection >>> >>> Hi all, >>> >>> I have a data set that contains samples collected over time. In >>> each time period the total number of samples are given (X2) The goal >>> is to select 500 random samples.The selection should be based on >>> time (select time periods until I reach 500 samples). Also the time >>> period should have greater than 0 for X1 variable. X1 is an indicator >>> variable. >>> >>> Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 >>> >>> tab <- read.table(textConnection(" time X1 X2 >>> 1 0251 >>> 2 5230 >>> 3 1300 >>> 4 0 25 >>> 5 2 10 >>> 6 3 101 >>> 7 1 300 >>> 8 4 185 "),header = TRUE) >>> >>> In the above example, samples from time 1 and 4 will not be selected >>> ( X1 is zero) >>> So I could reach my target by selecting time 6,7, and 8 or time 2 and >>> 3 and so on. >>> >>> Can any one help to do that? >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ranking
Hi all, I have the following raw data some records don't have the second variable. test <- read.table(textConnection(" Country STATUS USA USAW USAW GER GERW GERw GERW UNKW UNK UNKW FRA FRA FRAW FRAW FRAW SPA SPAW SPA "),header = TRUE, sep= "\t") test It is not reading it correctly. Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 17 did not have 2 elements After reading I want change the status column to numeric so that I can use the table function test$STATUS <- ifelse(is.na(test$STATUS), 0, 1) at the end I want the following table (Country, Won, Lost , Number of games played and % of score ) and pick the top 3 countries. COUNTRY Won Lost NG%W USA 21 3 (2/3)*100 GER 31 4 (3/4)*100 UNK 21 3 (2/3)*100 FRA 3 25 (3/5)*100 SPA 1 2 3 (1/3)*100 Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ranking
Thank you David, My intention was if I change the status column to numeric 0= Lost and 1 Won, then I can use this numeric variables to calculate the Percent game Won by each country. how did you read the data first? That was my problem. The actual data is in a file have to be read or laded. Thank you ! On Sat, Nov 14, 2015 at 6:10 PM, David L Carlson <dcarl...@tamu.edu> wrote: > It is always good to read the manual page for a function, but especially when > it is not working as you expected. In this case if you look at the arguments > for read.table(), you will find one called fill=TRUE that is useful in this > case. > > Based on your ifelse(), you seem to be assuming that a blank is not missing > data but a lost game. You may also discover that in your example wins are > coded as w and W. Since character variables get converted to factors by > default, you could use something like: > >> levels(test$STATUS) <- c("L", "W", "W") >> addmargins(xtabs(~Country+STATUS, test), 2) >STATUS > Country L W Sum > FRA 2 3 5 > GER 1 3 4 > SPA 2 1 3 > UNK 1 2 3 > USA 1 2 3 > > I'll let you figure out how to get the last column. > > David L. Carlson > Department of Anthropology > Texas A University > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta > Sent: Saturday, November 14, 2015 4:28 PM > To: R help <r-help@r-project.org> > Subject: [R] Ranking > > Hi all, > > I have the following raw data some records don't have the second variable. > > test <- read.table(textConnection(" Country STATUS > USA > USAW > USAW > GER > GERW > GERw > GERW > UNKW > UNK > UNKW > FRA > FRA > FRAW > FRAW > FRAW > SPA > SPAW > SPA "),header = TRUE, sep= "\t") > test > > It is not reading it correctly. > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > line 17 did not have 2 elements > > > > After reading I want change the status column to numeric so that I > can use the table function > > test$STATUS <- ifelse(is.na(test$STATUS), 0, 1) > > at the end I want the following table (Country, Won, Lost , Number of > games played and % of score ) and pick the top 3 countries. > > COUNTRY Won Lost NG%W > USA 21 3 (2/3)*100 > GER 31 4 (3/4)*100 > UNK 21 3 (2/3)*100 > FRA 3 25 (3/5)*100 > SPA 1 2 3 (1/3)*100 > > Thank you in advance > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning
Sarah, Thank you very much. For the other variables I was trying to do the same job in different way because it is easier to list it Example test < which(dat$var1 !="BAA" | dat$var1 !="FAG" ) { dat <- dat[-test,]} and I did not get the right result. What am I missing here? On Wed, Nov 11, 2015 at 7:54 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > On Wed, Nov 11, 2015 at 8:44 PM, Ashta <sewa...@gmail.com> wrote: > > Hi Sarah, > > > > I used the following to clean my data, the program crushed several times. > > > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > > > What is the difference between these two > > > > test <- dat[dat$Var1 %in% "YYZ" | dat$Var1 %in% "MSN" ,] > > Besides that you're using %in% wrong? I told you how to proceed. > > myvalues <- c("YYZ", "MSN") > > test <- subset(dat, Var1 %in% myvalues) > > > > subset(dat, Var1 %in% myvalues) > X Var1 Freq > 3 3 MSN 1040 > 4 4 YYZ 300 > > > > > > > > > > > On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.gos...@gmail.com> > > wrote: > >> > >> Please keep replies on the list so others may participate in the > >> conversation. > >> > >> If you have a character vector containing the potential values, you > >> might look at %in% for one approach to subsetting your data. > >> > >> Var1 %in% myvalues > >> > >> Sarah > >> > >> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewa...@gmail.com> wrote: > >> > Thank you Sarah for your prompt response! > >> > > >> > I have the list of values of the variable Var1 it is around 20. > >> > How can I modify this one to include all the 20 valid values? > >> > > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> > > >> > Is there a way (efficient ) of doing it? > >> > > >> > Thank you again > >> > > >> > > >> > > >> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.gos...@gmail.com > > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewa...@gmail.com> wrote: > >> >> > Hi all, > >> >> > > >> >> > I have a data frame with huge rows and columns. > >> >> > > >> >> > When I looked at the data, it has several garbage values need to > be > >> >> > > >> >> > cleaned. For a sample I am showing you the frequency distribution > >> >> > of one variables > >> >> > > >> >> > Var1 Freq > >> >> > 1:3 > >> >> > 2]6 > >> >> > 3MSN 1040 > >> >> > 4YYZ 300 > >> >> > 5\\4 > >> >> > 6+ 3 > >> >> > 7. ?> 15 > >> >> > >> >> Please use dput() to provide your data. I made a guess at what you > had > >> >> in R, but could be wrong. > >> >> > >> >> > >> >> > and continues. > >> >> > > >> >> > I want to keep those rows that contain only a valid variable value > >> >> > > >> >> > In this case MSN and YYZ. I tried the following > >> >> > > >> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > >> >> > > >> >> > but I am not getting the desired result. > >> >> > >> >> What are you getting? How does it differ from the desired result? > >> >> > >> >> > I have > >> >> > > >> >> > Any help or idea? > >> >> > >> >> I get: > >> >> > >> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", > >> >> > "", > >> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = > >> >> c("X", > >> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) > >> >> > > >> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> >> > test > >> >> X Var1 Freq > >> >> 3 3 MSN 1040 > >> >> 4 4 YYZ 300 > >> >> > >> >> Which seems reasonable to me. > >> >> > >> >> > >> >> > > >> >> > [[alternative HTML version deleted]] > >> >> > >> >> Please don't post in HTML either: it introduces all sorts of errors > to > >> >> your message. > >> >> > >> >> Sarah > >> >> > > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cleaning
Hi all, I have a data frame with huge rows and columns. When I looked at the data, it has several garbage values need to be cleaned. For a sample I am showing you the frequency distribution of one variables Var1 Freq 1:3 2]6 3MSN 1040 4YYZ 300 5\\4 6+ 3 7. ?> 15 and continues. I want to keep those rows that contain only a valid variable value In this case MSN and YYZ. I tried the following *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* but I am not getting the desired result. I have Any help or idea? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning
Hi Sarah, I used the following to clean my data, the program crushed several times. *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* *What is the difference between these two**test <- dat[dat$Var1 **%in% "YYZ" | dat$Var1** %in% "MSN" ,]* On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > Please keep replies on the list so others may participate in the > conversation. > > If you have a character vector containing the potential values, you > might look at %in% for one approach to subsetting your data. > > Var1 %in% myvalues > > Sarah > > On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewa...@gmail.com> wrote: > > Thank you Sarah for your prompt response! > > > > I have the list of values of the variable Var1 it is around 20. > > How can I modify this one to include all the 20 valid values? > > > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > > > Is there a way (efficient ) of doing it? > > > > Thank you again > > > > > > > > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.gos...@gmail.com> > > wrote: > >> > >> Hi, > >> > >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewa...@gmail.com> wrote: > >> > Hi all, > >> > > >> > I have a data frame with huge rows and columns. > >> > > >> > When I looked at the data, it has several garbage values need to be > >> > > >> > cleaned. For a sample I am showing you the frequency distribution > >> > of one variables > >> > > >> > Var1 Freq > >> > 1:3 > >> > 2]6 > >> > 3MSN 1040 > >> > 4YYZ 300 > >> > 5\\4 > >> > 6+ 3 > >> > 7. ?> 15 > >> > >> Please use dput() to provide your data. I made a guess at what you had > >> in R, but could be wrong. > >> > >> > >> > and continues. > >> > > >> > I want to keep those rows that contain only a valid variable value > >> > > >> > In this case MSN and YYZ. I tried the following > >> > > >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > >> > > >> > but I am not getting the desired result. > >> > >> What are you getting? How does it differ from the desired result? > >> > >> > I have > >> > > >> > Any help or idea? > >> > >> I get: > >> > >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", > "", > >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = > c("X", > >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) > >> > > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> > test > >> X Var1 Freq > >> 3 3 MSN 1040 > >> 4 4 YYZ 300 > >> > >> Which seems reasonable to me. > >> > >> > >> > > >> > [[alternative HTML version deleted]] > >> > >> Please don't post in HTML either: it introduces all sorts of errors to > >> your message. > >> > >> Sarah > >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] curve
Thanks Sarah, 1. to shade or color (blue) the curve using the criterion that any values greater than 11,000 I think I was not clear in the above point. I want shade not the line but the area under the curve, and Your last line of code, segments(x0=mean(test1), y0=0, y1=curveheight) gave me the following error message Error in segments(x0 = mean(test1), y0 = 0, y1 = curveheight) : element 3 is empty; the part of the args list of '.Internal' being evaluated was: (x0, y0, x1, y1, col = col, lty = lty, lwd = lwd, ...) could you check it please On Mon, Dec 13, 2010 at 2:01 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Here's one way to do what I think you want: test- rnorm(5000,1000,100) test1 - subset(test, subset=(test 1100)) d - density(test) plot(d, main=Density of production, xlab=) lines(d$x[d$x 1100], d$y[d$x 1100], col=blue, lwd=2) curveheight - d$y[abs((d$x - mean(test1))) == min(abs((d$x - mean(test1] segments(x0=mean(test1), y0=0, y1=curveheight) Sarah On Mon, Dec 13, 2010 at 1:44 PM, Val valkr...@gmail.com wrote: Hi All, I generated 5000 samples using the following script test- rnorm(5000,1000,100) test1 - subset(test, subset=(test 1100)) d - density(test) plot(d, main=Density of production) abline(v=mean(test1) I wanted to do the following but faced difficulties 1. to shade or color (blue) the curve using the criterion that any values greater than 11,000 2. I drew a vertical line but I wanted the v-line within the curve not to stick outside the curve 3. to suppress the output produced at the bottom of the curve( N=5000 and bandwidth =16.22) Thanks in advance Val -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survival
Hi All, I was trying to find a function that handles Partially Linear Single-Index model in survival analysis, but was not lucky. Is thee a function in R for this type of analysis? Thanks A __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] likelihood
Hi all, Does any one know how to write the likelihood function for Poisson distribution in R when P(x=0). For normal case, it an be written as follows, n * log(lambda) - lambda * n * mean(dat) Any help is highly appreciated Ashta __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Histogram color
In a histogram , is it possible to have different colors? Example. I generated x - rnorm(100) hist(x) I want the histogram to have different colors based on the following condition mean(x)+sd(x) with red color and mean(x) - sd(x) with red color as well. The middle one with blue color. Is it possible to do that in R? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Step function
Hi All, Does the step function work in this model? I tried to run the following model but no result obtained. The computer is hanging and I killed the job several times. Below is the code. library(survival) m.fit=clogit(y~x1+x2+x3+x4, data=ftest) summary(m.fit) final- step(m.fit) Thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] output
Hi all, I am trying to interparete the result of the following output from lm; fit1 =lm(Feed _Intake ~ weight + season + weight*season) Season has three classes(x,y,z) Reults are Estimate (Intercept) 21.51559 weight 2.13051 factor(season)y 10.59739 factor(season)z1.30421 weight:factor(season)y 10.1 weight:factor(season)z 21.70288 My question are what is the estimate of season x? Could it be possible to change the output in the following way? factor(season)x factor(season)y weight:factor(season)x weight:factor(season)y Thanks in adavance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] output
Hi all, I have a data set such that the response variable size binary (Short or Long) Color has two classes (red and green) red=1 ; green=0 Lm1 - glm(size ~color, data =test, family = binomial()) Estimate Std. Errorz value (Intercept) 12.0523.11037-12.273 color 0.78500.06624 3.952 How do I get the probability of sizes for the two different colors(red and green)? On Mon, Jan 18, 2010 at 11:15 AM, Henrique Dallazuanna www...@gmail.com wrote: Try this: DF$season - relevel(DF$season, 'y') fit1 - lm(Feed_Intake ~ weight + season + weight*season, data = DF) On Mon, Jan 18, 2010 at 2:00 PM, Ashta sewa...@gmail.com wrote: Hi all, I am trying to interparete the result of the following output from lm; fit1 =lm(Feed _Intake ~ weight + season + weight*season) Season has three classes(x,y,z) Reults are Estimate (Intercept) 21.51559 weight 2.13051 factor(season)y 10.59739 factor(season)z 1.30421 weight:factor(season)y 10.1 weight:factor(season)z 21.70288 My question are what is the estimate of season x? Could it be possible to change the output in the following way? factor(season)x factor(season)y weight:factor(season)x weight:factor(season)y Thanks in adavance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hazard ratio
Hi all, I want to calculate hazard ratio within each covariate Example, one covariate has 3 classes (1,2 and 3) and x2 has 2 classes I want to compare the relative risk ratio within each class of the covariate. How do I get this result ? . The other question is that how do I interpret the second column in the second panel (i.e., exp(-coef)) I used the model coxfit1 - coxph(Surv(sdat$time, sdat$cens)~ y1+x2) coef exp(coef) se(coef) z Pr(|z|) y1-0.024084 0.976204 0.003077 -7.828 5.00e-15 *** x2 0.036161 1.036822 0.083921 0.431 0.6665 exp(coef) exp(-coef) lower .95upper .95 y1 0.9762 1.0244 0.9703 0.9821 x2 1.0368 0.9645 0.8796 1. Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hazard ratio
David, Thank you very much for your response. I fitted the model as factor instead of numeric. coxfit1 - coxph(Surv(sdat$time, sdat$cens)~factor(y1)+factor(x2) coef exp(coef) se(coef) z Pr(|z|) factor(y1)2 0.036161 1.036822 0.083921 0.431 0.6665 factor(y1)3 -0.510124 0.600421 0.088901 -5.738 9.57e-09 *** factor(x2)2 -0.510124 0.600421 0.088901 -5.738 9.57e-09 *** What are those values? Is it comparing in reference to the first class of each covariate? Thanks again. On Thu, Dec 10, 2009 at 8:33 AM, Ashta sewa...@gmail.com wrote: Hi all, I want to calculate hazard ratio within each covariate Example, one covariate has 3 classes (1,2 and 3) and x2 has 2 classes I want to compare the relative risk ratio within each class of the covariate. How do I get this result ? . The other question is that how do I interpret the second column in the second panel (i.e., exp(-coef)) I used the model coxfit1 - coxph(Surv(sdat$time, sdat$cens)~ y1+x2) coef exp(coef) se(coef) z Pr(|z|) y1 -0.024084 0.976204 0.003077 -7.828 5.00e-15 *** x2 0.036161 1.036822 0.083921 0.431 0.6665 exp(coef) exp(-coef) lower .95 upper .95 y1 0.9762 1.0244 0.9703 0.9821 x2 1.0368 0.9645 0.8796 1. Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PH Model assumption
Hi all, I was trying to test the assumption of proportional hazards assumption, I used the cox.zph function cox.zph(coxfit6) Results are: rhochisqp x1 -0.03961.397 2.37e-01 x2 0.11079.715 1.83e-03 x3 -0.08857.7435.39e-03 x4 0.03661.0922.96e-01 x5 0.0242 0.4555.00e-01 GLOBAL NA 30.9529.57e-06 Are all these covariates fulfilled the assumption of proportional hazards? Thanks again. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stepAIC function
Hi All, I am trying to run the following script but have problem, coxm- coxph(Surv(sdat$time, sdat$cens)~hd+nawtg+nwwg+ntpg+cy+nseas,data=sdat) coxm-stepAIC(coxm,~.^2) The error message is Error: could not find function stepAIC I tried to install the package but I could not find it. Where can i get it? The other question is that I want to get the Kaplan-Meier Estimate for each covariate in the model, Like covaraite n Events Mean, S.E.(mean) ,Median, 95% LCL, 95% UCL 0 14 10 2.87 .03 2.2 1.938 infi 1 11 9 1.06 .67 1.1 0.29 2.48 I used sdat.fit0 - survfit(Surv(sdat$time, sdat$cens)~sdat$ntpg, data = sdat, type = kaplan-meier, conf.type=plain) sdat.fit0 Instead I got the following, Call: survfit(formula = Surv(sdat$time, sdat$cens) ~ sdat$ntpg, data = sdat, type = kaplan-meier, conf.type = plain) records n.max n.start events median 0.95LCL 0.95UCL sdat$ntpg=03576 35763576311 NA NA NA sdat$ntpg=14851 48514851466 NA NA NA I would appreciate if some one can help me. thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] look up and Missing
HI R-Users Assume that I have a data frame 'temp' with several variables (v1,v2,v3,v4,v5.). v1 v2 v3 v4 v5 1 2 3 36 5 2 420 2 -9 5 43 6 2 1 34 1, I want to look at the entire row values of when v2 =-9 like 2 -9 5 43 I wrote K- list(if(temp$v2)==-9)) I wrote the like this but it gave me which is not correct. False false false false false 2. I want assign that values as missing if v2 = -9. (ie., I want exclude from the analysis How do I do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Frequency
Thank you Jorge and res - table(unlist(x)) res[order(res, decreasing = TRUE)] # 10 4 6 3 5 7 9 18 # 3 2 2 1 1 1 1 1 This one works fine for me. Is it possible to transpose it? I tried t(res[order(res, decreasing = TRUE)]), but it did not work! I want the result like this 10 2 4 2 6 2 3 1 . . . . On Mon, Nov 2, 2009 at 1:45 PM, Jorge Ivan Velez jorgeivanve...@gmail.com wrote: Hi Val, Here is a suggestion: res - table(unlist(x)) res[order(res, decreasing = TRUE)] # 10 4 6 3 5 7 9 18 # 3 2 2 1 1 1 1 1 HTH, Jorge On Mon, Nov 2, 2009 at 1:35 PM, Val wrote: BAYESIAN INFERENCES FOR MILKING TEMPERAMENT IN CANADIAN HOLSTEINS Hi All, I have a data set x with several variables. Sample of the data is shown below V1 v2 v3 v4 5 6 9 10 3 4 7 10 4 6 10 18 I want the frequency of each data point sorted by their occurrence. Below is the output that I want 10 =3 6=2 4=2 9=1 5=1 7=1 3=1 How do I do it in R? Thanks in advance Val [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wavelets
Hi all, I am trying to do wavelets and I got an error message saying The length of data is not a power of 2 Is there a way of handing that? or should the data length be exactly the power of 2? I am using R version 2.9.2 (2009-08-24) The is library(wavethresh). wds - wd(ds$v,filter.number=1) Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inserting rows
Hi all, I have the data set df with three varaibles, x1 x2 x3 1 2 5 2 4 1 5 6 0 1 1 2 I want to insert more rows ( eg, 3 rows with value filled with zeros) 1 2 5 2 4 1 5 6 6 1 1 2 0 0 0 0 0 0 0 0 0 Can any body help me out? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting
Hi All, Assume that I have the following data set with two variables and I want count the number of observation with identical values and number of time each factor changed from x1 to x2. x1 x2 1 1 1 0 0 1 0 1 0 0 1 1 0 1 The output should be x1 changed 0 3 # has changed 3 times 1 1 # has changed 1 time x1 unchanged 0 1 # has unchanged only 1 time 1 2 # has unchanged 2 times Can someone help me how to do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
Hi Bill and all, On Tue, Oct 20, 2009 at 12:09 PM, William Dunlap wdun...@tibco.com wrote: From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers Sent: Tuesday, October 20, 2009 8:48 AM To: Ashta Cc: R help Subject: Re: [R] Counting How about unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum) chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum) -Peter Ehlers When I hear 'count' I think first of the table() function. E.g., d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1)) with(d, table(x1, x1==x2)) x1 FALSE TRUE 0 3 1 1 1 2 or with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged x1 Changed Unchanged 0 3 1 1 1 2 or use dimnames- to change the labels on the table itself. This works very well for numeric. How about if the factors are character such as F and M (male and female) ? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Ashta wrote: Hi All, Assume that I have the following data set with two variables and I want count the number of observation with identical values and number of time each factor changed from x1 to x2. x1 x2 1 1 1 0 0 1 0 1 0 0 1 1 0 1 The output should be x1 changed 0 3 # has changed 3 times 1 1 # has changed 1 time x1 unchanged 0 1 # has unchanged only 1 time 1 2 # has unchanged 2 times Can someone help me how to do it in R? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spline
Hi All, I am using R version 2.9.2 (2009-08-24) window version and I wanted to use the library(spline) Error in library(spline) : there is no package called 'spline' I tried to install packages as well and it is not there either. Am I missing something there. Where can I get this library? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survival and nonparametric
Hi all, Has any body the exprience to iclude a nonparametric component into the survival analysis using R package? *Can someone recommend *me * some ** references? * Thanks a lot Ashta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting
*Hi all, * *Assume that I have the following data set with tow variables and I want count the number of observation with identical values * ** *x1 x2* * 1 1 * * 1 0 * * 0 1* * 0 1* * 0 0* * 1 1* * 0 1 * I want the following output ** * * *n1=3 # number of identical observation between x1 and x2 variables* *n2=4 # number of different observation* How do I do it in R? Thanks a lot ** [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random number
Hi All, I have the matrix called 'X' with 200 rows and 12 variables. I want to create 2 new variables (V1 and V2) based on random number generator p1-rnorm(200. mean=0, std=1) p2-rnorm(200. mean=0, std=1) x - cbind(x, v1=ifelse(x[,'p1'] 0.4, 1, 0), v2=ifelse(x[,'p2'] 0.6, 0, 1)) I found the following error message *Error: unexpected symbol in p1-rnorm(200. mean Any help? * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tabulation
Hi all, I have a data set x1 x2 x3 1 2 1 1 2 3 2 1 2 1 2 1 3 1 1 I want to tabulate in the following way. 1 2 3 x13 2 1 x22 3 0 x33 1 1 It is just like frequency distribution Any help is highly appreciated [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating new variables
Hi all, I have a data set called x with200 rows and 12 columns. I want create two more columns based on probability. ie if p 0 .4 then v1 =1 else v1=0; if p 0 .6 then v2 =1 else v2=0; Finally x will have 14 variables. Can any one show me how to do that? Thanks Ashta . [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating new variables
Thanks. This helps. How do I generate P? Will this work? p1-pnorm(mean=0, std=1) p2-pnorm(mean=0, std=1) x - cbind(x, v1=ifelse(x[,'p'] 0.4, 1, 0), v2=ifelse(x[,'2'] 0.6, 0, 1)) If the 'data set' is a dataframe, the following will work: x$v1 - ifelse(x$p 0.4, 1, 0) x$v2 - ifelse(x$p 0.6, 1, 0) If it is matrix, try x - cbind(x, v1=ifelse(x[,'p'] 0.4, 1, 0), v2=ifelse(x[,'p'] 0.6, 1, 2)) On Sat, Oct 10, 2009 at 6:32 PM, jim holtman jholt...@gmail.com wrote: If the 'data set' is a dataframe, the following will work: x$v1 - ifelse(x$p 0.4, 1, 0) x$v2 - ifelse(x$p 0.6, 1, 0) If it is matrix, try x - cbind(x, v1=ifelse(x[,'p'] 0.4, 1, 0), v2=ifelse(x[,'p'] 0.6, 1, 2)) If helps a lot if you follow the posting rules and provide commented, minimal, self-contained, reproducible code. On Sat, Oct 10, 2009 at 6:04 PM, Ashta sewa...@gmail.com wrote: Hi all, I have a data set called x with200 rows and 12 columns. I want create two more columns based on probability. ie if p 0 .4 then v1 =1 else v1=0; if p 0 .6 then v2 =1 else v2=0; Finally x will have 14 variables. Can any one show me how to do that? Thanks Ashta . [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] row selection
Hi all, Thank you for your help. Now I am able to select every 5th row of the data from the main data set (x) using sub1- x[seq(1, nrow(x), by=5), ] So sub1 contains one fith of the data set X. I want also create another data set that will contain the remaining data set from X (ie., four fifth of the data set). Any help is highly appreciated. I have a matrix named x with N by C I want to select every 5 th rrow from matrix x I used the following code n- nrow(x) for(i in 1: n){ + b - a[i+5,] b } sc x[seq(1, nrow(x), by=5), ] -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Thursday, October 08, 2009 4:19 PM To: Ashta Cc: R help Subject: Re: [R] row selection On Oct 8, 2009, at 4:14 PM, Ashta wrote: Hi all, I have a matrix named x with N by C I want to select every 5 th rrow from matrix x I used the following code n- nrow(x) for(i in 1: n){ + b - a[i+5,] b } Error: subscript out of bounds What did you expect when i in your loop counter became one greater than the number of rows? David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. === P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S. News World Report (2008). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use\...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] row selection
Hi all, I have a matrix named x with N by C I want to select every 5 th rrow from matrix x I used the following code n- nrow(x) for(i in 1: n){ + b - a[i+5,] b } Error: subscript out of bounds Can any body point out the problem? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot
Hi All, Days - matrix(c(Monday, Tuesday, Wed, Thu, Fri, Sat, Sun),7,1) Hum -matrix(c(56,57,60,75,62,67,70), Temp-matrix(c(76,77,81,95,82,77,83), Using the above information I want plot humidity and temperature on Y-axis and days on X-axis Any help is appreciated! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot
Thanks Sara, Yes I did try. I could not get the Days on the X-axis blow is theerror message plot(Temp,Days) Error in plot.window(...) : need finite 'ylim' values In addition: Warning messages: 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf On Tue, Oct 6, 2009 at 10:19 AM, Sarah Goslee sarah.gos...@gmail.comwrote: Did you try it? With, perhaps, plot() ? And lines() ? You might do better with Days as a factor with the day names in order. (And why are two full and five abbreviated?) I don't understand why Hum and Temp are matrices rather than vectors, and why then you didn't specify dimensions, and for that matter why you are missing a closing paren but do have a comma in its place. Generally this list is happy to help, but we like some evidence that the querent has *tried* before inquiring. Sarah On Tue, Oct 6, 2009 at 10:05 AM, Ashta sewa...@gmail.com wrote: Hi All, Days - matrix(c(Monday, Tuesday, Wed, Thu, Fri, Sat, Sun),7,1) Hum -matrix(c(56,57,60,75,62,67,70), Temp-matrix(c(76,77,81,95,82,77,83), Using the above information I want plot humidity and temperature on Y-axis and days on X-axis Any help is appreciated! -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Legend
I have more than three lines in one and I want to add a legend for each line abline( m1, col = 'red' ) ablime( m2, col = 'blue' ) abline( m3, col = 'purple' ) How can I add a legend? . Is it also possible to increase the thickness of the lines? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Color of graph
I am trying to plot a line graph for 3 or more regression lines abline(m1) abline(m2) abline(m3) Can I change the color of each line? if so how? Thanks in advance Ashta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summary
My data is called xc and has more than 15 variables. When I used summary(xc) it gave me the detail description of each variable. Summary(xc) Y1x1 x2 x3 .. Min. :0. Min. : 1.000 Min. : 1.000 Min. : 1.000 1st Qu. :0. 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 2.000 Median :1. Median : 1.000 Median : 1.000 Median : 3.000 Mean:0.6505 Mean : 2.816 Mean : 3.542 Mean : 3.433 3rd Qu. :1. 3rd Qu.: 4.000 3rd Qu.: 6.000 3rd Qu.: 5.000 Max. :1. Max. :10.000 Max. :10.000 Max. :10.000 But I want the output in the following way. Y1x1 x2x3 .. Min. :0.1.0001.0001.000 1st Qu. :0.1.0001.0002.000 Median :1. 1.0001.0003.000 Mean:0.6505 2.8163.5423.433 3rd Qu. :1. 4.000 6.000 5.000 Max. :1. 10.000 10.000 :10.000 Is it possible to do it in R? Thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Binomial
Dear R-users, Suppose I have the following sample of data, 0 1 2 4 3 1 2 1 3 1 1 3 3 4 1 0 1 2 1 2 1 4 1 4 2 1 2 2 1 1 The first variable is the response variable where 0 is defective and 1 normal. The other four factors( x1,x2,x3,x4) that influence the outcome. I want to fit a binomial model . How do I do that? I am guessing the response variable should be transformed but not sure which family of transformation to use. It is easy to do it in SAS but I just want to learn using R Any help is highly appreciated Ashta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SEa nd CI
How can I get the the standard error and confidence interval for the prediction in a multiple regression model using the R command? for a simple regression I used *predict(xc, newdata=data.frame(var1=10.),se=T) where xc is the glm model using binomial and var1 is teh variable. * I can get the upper and lower intervals of the prediction Any help is welcome . [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling
Dear R-users, Suppose I have the followin g sample of data, 0 1 2 4 3 1 2 1 3 1 1 3 3 4 1 0 1 2 1 2 1 4 1 4 2 1 2 2 1 1 The first variable is the response variable where 0 is defective and 1 normal. The other four factors( x1,x2,x3,x4) that influence the outcome. I want to fit a binomial model in R . I want also to rder the factors based on their degree of influence the outcome. How do I do this in R. thanks in advance Ashta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading data
Dear R-users, I am a new user for R. I am eager to lean about it. I wanted to read and summary of the a simple data file I used the following, rel - read.table(C:/Documents and Settings/ashta/My Documents/R_data/rel.dat, quote=,header=FALSE,sep=,col.names= c(id,orel,nrel)) summary(rel) Below is the error message, rel - read.table(C:/Documents and Settings/ashta/My Documents/R_data/rel.dat, quote=,header=FALSE,sep=,col.names= + c(id,orel,nrel)) Error in file(file, r) : cannot open the connection In addition: Warning message: In file(file, r) : cannot open file 'file=C:/Documents and Settings/sewalem/My Documents/R_data/rel.dat': Invalid argument summary(rel) Error in summary(rel) : object 'rel' not found Does it need a library? Where can I get the library? Any help is highly appreciated Ashta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.