[R] Text search

2020-01-25 Thread Ashta
Hi all,
>From  one of the columns of the data frame I want to search and
extract  a text that contains Tall or   Short  and create new column
that should contain these texts in a corresponding row.
My example data and the desired output are shown below

dat<-read.table(text="obs Year char
1  2001 Tall156
2  2002 12565Tall
3  2003 all54
4  2004 Short
5  2005 54all
6  2006 7Short12 ",header=TRUE,stringsAsFactors=F)
dat$new <- "  "

Desired out put
obs Year charnew
   1 2001   Tall156  Tall
   2 2002 12565TallTall
   3 2003 all54
   5 2004 Short  Short
   6 2005 Shall54
   7 2006  7Short12   Short

How do I get my desired output?
Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove a row

2019-11-28 Thread Ashta
Thank you so much Bert.

Is it possible to split the varx into  three ( area code, region and
the numeric part)as a separate variable

On Thu, Nov 28, 2019 at 7:31 PM Bert Gunter  wrote:
>
> Use regular expressions.
>
> See ?regexp  and ?grep
>
> Using your example:
>
> > grep("^[[:digit:]]{1,3}[[:alpha:]]{1,2}[[:digit:]]{1,5}$",dat$varx,value = 
> > TRUE)
> [1] "9F209"   "2F250"   "121FL50"
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Nov 28, 2019 at 3:17 PM Ashta  wrote:
>>
>> Hi all,  I want to remove a row based on a condition in one of the
>> variables from a data frame.
>> When we split this string it should be composed of 3-2- 5 format (3
>> digits numeric, 2 characters and 5 digits  numeric).  Like
>> area code -region-numeric. The max length of the area code should be
>> 3, the  max length of region be should be 2,  followed by a max length
>> of  5  numeric digits.  The are code  can  be 1 digit, or 2 digits or
>> 3 digits  but not more than three digits.  So  the  max length of this
>> variable is 10.  Anything outside of this pattern should be excluded.
>> As an example
>>
>> dat <-read.table(text=" rown  varx
>> 1   9F209
>> 2  FL250
>> 3  2F250
>> 4  102250
>> 5  102FL
>> 6   102
>> 7  1212FL250
>> 8  121FL50",header=TRUE,stringsAsFactors=F)
>>
>> 1  9F209   # keep
>> 2  FL250   # remove, no area code
>> 3   2F250  # keep
>> 4  102250 # remove , no region code
>> 5  102FL   # remove , no numeric after region code
>> 6   102  # remove ,  no region code and numeric
>> 7  1212FL250  #remove, area code is more than three digits
>> 8  121FL50  # Keep
>>
>> The desired output should be
>> 1   9F209
>> 3   2F250
>> 8  121FL50
>>
>> How do I do this in an efficient way?
>>
>> Thank you in advance
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] remove a row

2019-11-28 Thread Ashta
Hi all,  I want to remove a row based on a condition in one of the
variables from a data frame.
When we split this string it should be composed of 3-2- 5 format (3
digits numeric, 2 characters and 5 digits  numeric).  Like
area code -region-numeric. The max length of the area code should be
3, the  max length of region be should be 2,  followed by a max length
of  5  numeric digits.  The are code  can  be 1 digit, or 2 digits or
3 digits  but not more than three digits.  So  the  max length of this
variable is 10.  Anything outside of this pattern should be excluded.
As an example

dat <-read.table(text=" rown  varx
1   9F209
2  FL250
3  2F250
4  102250
5  102FL
6   102
7  1212FL250
8  121FL50",header=TRUE,stringsAsFactors=F)

1  9F209   # keep
2  FL250   # remove, no area code
3   2F250  # keep
4  102250 # remove , no region code
5  102FL   # remove , no numeric after region code
6   102  # remove ,  no region code and numeric
7  1212FL250  #remove, area code is more than three digits
8  121FL50  # Keep

The desired output should be
1   9F209
3   2F250
8  121FL50

How do I do this in an efficient way?

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove

2017-12-09 Thread Ashta
Thank you All !!

Now, I have plenty of options to chose.


On Sat, Dec 9, 2017 at 1:21 PM, William Dunlap <wdun...@tibco.com> wrote:
> You could make numeric vectors, named by the group identifier, of the
> contraints
> and subscript it by group name:
>
>> DM <- read.table( text='GR x y
> + A 25 125
> + A 23 135
> + A 14 145
> + A 35 230
> + B 45 321
> + B 47 512
> + B 53 123
> + B 55 451
> + C 61 521
> + C 68 235
> + C 85 258
> + C 80 654',header = TRUE, stringsAsFactors = FALSE)
>>
>> GRmin <- c(A=15, B=40, C=60)
>> GRmax <- c(A=30, B=50, C=75)
>> subset(DM, x>=GRmin[GR] & x <=GRmax[GR])
>GR  x   y
> 1   A 25 125
> 2   A 23 135
> 5   B 45 321
> 6   B 47 512
> 9   C 61 521
> 10  C 68 235
>
> Or, if you want to completely avoid non-standard evaluation:
>> DM[ DM$x >= GRmin[DM$GR] & DM$x <= GRmax[DM$GR], ]
>GR  x   y
> 1   A 25 125
> 2   A 23 135
> 5   B 45 321
> 6   B 47 512
> 9   C 61 521
> 10  C 68 235
>
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sat, Dec 9, 2017 at 9:38 AM, David Winsemius <dwinsem...@comcast.net>
> wrote:
>>
>>
>> > On Dec 8, 2017, at 6:16 PM, David Winsemius <dwinsem...@comcast.net>
>> > wrote:
>> >
>> >
>> >> On Dec 8, 2017, at 4:48 PM, Ashta <sewa...@gmail.com> wrote:
>> >>
>> >> Hi David, Ista and all,
>> >>
>> >> I  have one related question  Within one group I want to keep records
>> >> conditionally.
>> >> example within
>> >> group A I want keep rows that have  " x" values  ranged  between 15 and
>> >> 30.
>> >> group B I want keep rows that have  " x" values  ranged  between  40
>> >> and 50.
>> >> group C I want keep rows that have  " x" values  ranged  between  60
>> >> and 75.
>> >
>> > When you have a problem where there are multiple "parallel: parameters,
>> > the function to "reach for" is `mapply`.
>> >
>> >mapply( your_selection_func, group_vec, min_vec, max_vec)
>> >
>> > ... and this will probably return the values as a list (of dataframes if
>> > you build the function correctly,  so you may may need to then do:
>> >
>> >do.call(rbind, ...)
>>
>>  do.call( rbind,
>> mapply( function(dat, grp, minx, maxx) {dat[ dat$GR==grp & dat$x >=
>> minx & dat$x <= maxx, ]},
>> grp=LETTERS[1:3], minx=c(15,40,60), maxx=c(30,50,75) ,
>> MoreArgs=list(dat=DM),
>> IMPLIFY=FALSE))
>>  GR  x   y
>> A.1   A 25 125
>> A.2   A 23 135
>> B.5   B 45 321
>> B.6   B 47 512
>> C.9   C 61 521
>> C.10  C 68 235
>>
>> >
>> > --
>> > David.
>> >>
>> >>
>> >> DM <- read.table( text='GR x y
>> >> A 25 125
>> >> A 23 135
>> >> A 14 145
>> >> A 35 230
>> >> B 45 321
>> >> B 47 512
>> >> B 53 123
>> >> B 55 451
>> >> C 61 521
>> >> C 68 235
>> >> C 85 258
>> >> C 80 654',header = TRUE, stringsAsFactors = FALSE)
>> >>
>> >>
>> >> The end result will be
>> >> A 25 125
>> >> A 23 135
>> >> B 45 321
>> >> B 47 512
>> >> C 61 521
>> >> C 68 235
>> >>
>> >> Thank you
>> >>
>> >> On Wed, Dec 6, 2017 at 10:34 PM, David Winsemius
>> >> <dwinsem...@comcast.net> wrote:
>> >>>
>> >>>> On Dec 6, 2017, at 4:27 PM, Ashta <sewa...@gmail.com> wrote:
>> >>>>
>> >>>> Thank you Ista! Worked fine.
>> >>>
>> >>> Here's another (possibly more direct in its logic?):
>> >>>
>> >>> DM[ !ave(DM$x, DM$GR, FUN= function(x) {!length(unique(x))==1}), ]
>> >>> GR  x   y
>> >>> 5  B 25 321
>> >>> 6  B 25 512
>> >>> 7  B 25 123
>> >>> 8  B 25 451
>> >>>
>> >>> --
>> >>> David
>> >>>
>> >>>> On Wed, Dec 6, 2017 at 5:59 PM, Ista Zahn <istaz...@gmail.com> wrote:
>> >>>>> Hi Ashta,
>> >>>>>
>> >>>>> There are many ways to do it. Here is one:
>> >>>>>
>> >

Re: [R] Remove

2017-12-08 Thread Ashta
Hi David, Ista and all,

I  have one related question  Within one group I want to keep records
conditionally.
example within
group A I want keep rows that have  " x" values  ranged  between 15 and 30.
group B I want keep rows that have  " x" values  ranged  between  40 and 50.
group C I want keep rows that have  " x" values  ranged  between  60 and 75.


DM <- read.table( text='GR x y
A 25 125
A 23 135
A 14 145
A 35 230
B 45 321
B 47 512
B 53 123
B 55 451
C 61 521
C 68 235
C 85 258
C 80 654',header = TRUE, stringsAsFactors = FALSE)


The end result will be
A 25 125
A 23 135
B 45 321
B 47 512
C 61 521
C 68 235

Thank you

On Wed, Dec 6, 2017 at 10:34 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On Dec 6, 2017, at 4:27 PM, Ashta <sewa...@gmail.com> wrote:
>>
>> Thank you Ista! Worked fine.
>
> Here's another (possibly more direct in its logic?):
>
>  DM[ !ave(DM$x, DM$GR, FUN= function(x) {!length(unique(x))==1}), ]
>   GR  x   y
> 5  B 25 321
> 6  B 25 512
> 7  B 25 123
> 8  B 25 451
>
> --
> David
>
>> On Wed, Dec 6, 2017 at 5:59 PM, Ista Zahn <istaz...@gmail.com> wrote:
>>> Hi Ashta,
>>>
>>> There are many ways to do it. Here is one:
>>>
>>> vars <- sapply(split(DM$x, DM$GR), var)
>>> DM[DM$GR %in% names(vars[vars > 0]), ]
>>>
>>> Best
>>> Ista
>>>
>>> On Wed, Dec 6, 2017 at 6:58 PM, Ashta <sewa...@gmail.com> wrote:
>>>> Thank you Jeff,
>>>>
>>>> subset( DM, "B" != x ), this works if I know the group only.
>>>> But if I don't know that group in this case "B", how do I identify
>>>> group(s) that  all elements of x have the same value?
>>>>
>>>> On Wed, Dec 6, 2017 at 5:48 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> 
>>>> wrote:
>>>>> subset( DM, "B" != x )
>>>>>
>>>>> This is covered in the Introduction to R document that comes with R.
>>>>> --
>>>>> Sent from my phone. Please excuse my brevity.
>>>>>
>>>>> On December 6, 2017 3:21:12 PM PST, David Winsemius 
>>>>> <dwinsem...@comcast.net> wrote:
>>>>>>
>>>>>>> On Dec 6, 2017, at 3:15 PM, Ashta <sewa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>> In a data set I have group(GR) and two variables   x and y. I want to
>>>>>>> remove a  group that have  the same record for the x variable in each
>>>>>>> row.
>>>>>>>
>>>>>>> DM <- read.table( text='GR x y
>>>>>>> A 25 125
>>>>>>> A 23 135
>>>>>>> A 14 145
>>>>>>> A 12 230
>>>>>>> B 25 321
>>>>>>> B 25 512
>>>>>>> B 25 123
>>>>>>> B 25 451
>>>>>>> C 11 521
>>>>>>> C 14 235
>>>>>>> C 15 258
>>>>>>> C 10 654',header = TRUE, stringsAsFactors = FALSE)
>>>>>>>
>>>>>>> In this example the output should contain group A and C  as group B
>>>>>>> has   the same record  for the variable x .
>>>>>>>
>>>>>>> The result will be
>>>>>>> A 25 125
>>>>>>> A 23 135
>>>>>>> A 14 145
>>>>>>> A 12 230
>>>>>>> C 11 521
>>>>>>> C 14 235
>>>>>>> C 15 258
>>>>>>> C 10 654
>>>>>>
>>>>>> Try:
>>>>>>
>>>>>> DM[ !duplicated(DM$x) , ]
>>>>>>>
>>>>>>> How do I do it R?
>>>>>>> Thank you.
>>>>>>>
>>>>>>> __
>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>> David Winsemius
>>>>>> Alameda, CA, USA
>>>>>>
>>>>>> 'Any technology distinguishable from magic is insufficiently advanced.'
>>>>>> -Gehm's Corollary to Clarke's Third Law
>>>>>>
>>>>>> __
>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> __
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'   
> -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove

2017-12-06 Thread Ashta
Thank you Ista! Worked fine.

On Wed, Dec 6, 2017 at 5:59 PM, Ista Zahn <istaz...@gmail.com> wrote:
> Hi Ashta,
>
> There are many ways to do it. Here is one:
>
> vars <- sapply(split(DM$x, DM$GR), var)
> DM[DM$GR %in% names(vars[vars > 0]), ]
>
> Best
> Ista
>
> On Wed, Dec 6, 2017 at 6:58 PM, Ashta <sewa...@gmail.com> wrote:
>> Thank you Jeff,
>>
>> subset( DM, "B" != x ), this works if I know the group only.
>> But if I don't know that group in this case "B", how do I identify
>> group(s) that  all elements of x have the same value?
>>
>> On Wed, Dec 6, 2017 at 5:48 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> 
>> wrote:
>>> subset( DM, "B" != x )
>>>
>>> This is covered in the Introduction to R document that comes with R.
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On December 6, 2017 3:21:12 PM PST, David Winsemius 
>>> <dwinsem...@comcast.net> wrote:
>>>>
>>>>> On Dec 6, 2017, at 3:15 PM, Ashta <sewa...@gmail.com> wrote:
>>>>>
>>>>> Hi all,
>>>>> In a data set I have group(GR) and two variables   x and y. I want to
>>>>> remove a  group that have  the same record for the x variable in each
>>>>> row.
>>>>>
>>>>> DM <- read.table( text='GR x y
>>>>> A 25 125
>>>>> A 23 135
>>>>> A 14 145
>>>>> A 12 230
>>>>> B 25 321
>>>>> B 25 512
>>>>> B 25 123
>>>>> B 25 451
>>>>> C 11 521
>>>>> C 14 235
>>>>> C 15 258
>>>>> C 10 654',header = TRUE, stringsAsFactors = FALSE)
>>>>>
>>>>> In this example the output should contain group A and C  as group B
>>>>> has   the same record  for the variable x .
>>>>>
>>>>> The result will be
>>>>> A 25 125
>>>>> A 23 135
>>>>> A 14 145
>>>>> A 12 230
>>>>> C 11 521
>>>>> C 14 235
>>>>> C 15 258
>>>>> C 10 654
>>>>
>>>>Try:
>>>>
>>>>DM[ !duplicated(DM$x) , ]
>>>>>
>>>>> How do I do it R?
>>>>> Thank you.
>>>>>
>>>>> __
>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>David Winsemius
>>>>Alameda, CA, USA
>>>>
>>>>'Any technology distinguishable from magic is insufficiently advanced.'
>>>>  -Gehm's Corollary to Clarke's Third Law
>>>>
>>>>__
>>>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide
>>>>http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove

2017-12-06 Thread Ashta
Thank you Jeff,

subset( DM, "B" != x ), this works if I know the group only.
But if I don't know that group in this case "B", how do I identify
group(s) that  all elements of x have the same value?

On Wed, Dec 6, 2017 at 5:48 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote:
> subset( DM, "B" != x )
>
> This is covered in the Introduction to R document that comes with R.
> --
> Sent from my phone. Please excuse my brevity.
>
> On December 6, 2017 3:21:12 PM PST, David Winsemius <dwinsem...@comcast.net> 
> wrote:
>>
>>> On Dec 6, 2017, at 3:15 PM, Ashta <sewa...@gmail.com> wrote:
>>>
>>> Hi all,
>>> In a data set I have group(GR) and two variables   x and y. I want to
>>> remove a  group that have  the same record for the x variable in each
>>> row.
>>>
>>> DM <- read.table( text='GR x y
>>> A 25 125
>>> A 23 135
>>> A 14 145
>>> A 12 230
>>> B 25 321
>>> B 25 512
>>> B 25 123
>>> B 25 451
>>> C 11 521
>>> C 14 235
>>> C 15 258
>>> C 10 654',header = TRUE, stringsAsFactors = FALSE)
>>>
>>> In this example the output should contain group A and C  as group B
>>> has   the same record  for the variable x .
>>>
>>> The result will be
>>> A 25 125
>>> A 23 135
>>> A 14 145
>>> A 12 230
>>> C 11 521
>>> C 14 235
>>> C 15 258
>>> C 10 654
>>
>>Try:
>>
>>DM[ !duplicated(DM$x) , ]
>>>
>>> How do I do it R?
>>> Thank you.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>David Winsemius
>>Alameda, CA, USA
>>
>>'Any technology distinguishable from magic is insufficiently advanced.'
>>  -Gehm's Corollary to Clarke's Third Law
>>
>>__
>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove

2017-12-06 Thread Ashta
Thank you David.
This will not work.  Tthis removes only duplicate records.
DM[ !duplicated(DM$x) , ]

My goal is to remove the group if all elements of x in that group have
 the same value.


On Wed, Dec 6, 2017 at 5:21 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On Dec 6, 2017, at 3:15 PM, Ashta <sewa...@gmail.com> wrote:
>>
>> Hi all,
>> In a data set I have group(GR) and two variables   x and y. I want to
>> remove a  group that have  the same record for the x variable in each
>> row.
>>
>> DM <- read.table( text='GR x y
>> A 25 125
>> A 23 135
>> A 14 145
>> A 12 230
>> B 25 321
>> B 25 512
>> B 25 123
>> B 25 451
>> C 11 521
>> C 14 235
>> C 15 258
>> C 10 654',header = TRUE, stringsAsFactors = FALSE)
>>
>> In this example the output should contain group A and C  as group B
>> has   the same record  for the variable x .
>>
>> The result will be
>> A 25 125
>> A 23 135
>> A 14 145
>> A 12 230
>> C 11 521
>> C 14 235
>> C 15 258
>> C 10 654
>
> Try:
>
> DM[ !duplicated(DM$x) , ]
>>
>> How do I do it R?
>> Thank you.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'   
> -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Remove

2017-12-06 Thread Ashta
Hi all,
In a data set I have group(GR) and two variables   x and y. I want to
remove a  group that have  the same record for the x variable in each
row.

DM <- read.table( text='GR x y
A 25 125
A 23 135
A 14 145
A 12 230
B 25 321
B 25 512
B 25 123
B 25 451
C 11 521
C 14 235
C 15 258
C 10 654',header = TRUE, stringsAsFactors = FALSE)

In this example the output should contain group A and C  as group B
has   the same record  for the variable x .

The result will be
A 25 125
A 23 135
A 14 145
A 12 230
C 11 521
C 14 235
C 15 258
C 10 654

How do I do it R?
Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data

2017-06-14 Thread Ashta
Hi Jim,
With a little  dig on my side , I have found the issue as to why the
script is skipping that file. The file is "ISO-8859 text, with CRLF
line terminators"

The file should be ASCII and I changed using  dos2unix  and CRLF line
terminators is eliminated but still I am not reading it. How can I
read those files  with "ISO-8859 text"?







On Tue, Jun 13, 2017 at 7:20 PM, jim holtman <jholt...@gmail.com> wrote:
> You need to provide reproducible data.  What does the file contain?  Why are
> you using 'sep=' when reading fixed format.  You might be able to attach the
> '.txt' to your email to help with the problem.  Also you did not state what
> the differences that you are seeing.  So help us out here.
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Tue, Jun 13, 2017 at 5:09 PM, Ashta <sewa...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I am using R to extract  data on a regular basis.
>> However, sometimes using the same script and the same data I am
>> getting different observation.
>> The library I am using and how I am reading  it is as follows.
>>
>> library(stringr)
>> namelist <- file("Adress1.txt",encoding="ISO-8859-1")
>> Name <- read.fwf(namelist,
>> colClasses="character", skip=2,sep="\t",fill=T,
>>   width =c(2,8,1,1,1,1,1,1,9,5)+1,col.names=ccol)
>>
>> Can some one suggest me how track the issue?
>> Is it the library issue or Java issue?
>> May I read as free format instead of fixed format?
>>
>> Thank you in advance
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading data

2017-06-13 Thread Ashta
Hi all,

I am using R to extract  data on a regular basis.
However, sometimes using the same script and the same data I am
getting different observation.
The library I am using and how I am reading  it is as follows.

library(stringr)
namelist <- file("Adress1.txt",encoding="ISO-8859-1")
Name <- read.fwf(namelist,
colClasses="character", skip=2,sep="\t",fill=T,
  width =c(2,8,1,1,1,1,1,1,9,5)+1,col.names=ccol)

Can some one suggest me how track the issue?
Is it the library issue or Java issue?
May I read as free format instead of fixed format?

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non date value

2017-04-15 Thread Ashta
Jeff,

I am sorry for that.


On Sat, Apr 15, 2017 at 12:04 AM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us> wrote:
> You don't follow instructions very well. Read the Posting Guide more 
> carefully.
> --
> Sent from my phone. Please excuse my brevity.
>
> On April 14, 2017 9:39:30 PM PDT, Ashta <sewa...@gmail.com> wrote:
>>DF1 is a data frame.   I am suspecting there might be non date value
>>in that column. My question is how to  remove  a non date values  from
>> that field.
>>example if Alex152 has  12253,. This value is not a date format.
>>
>>
>>On Fri, Apr 14, 2017 at 11:24 PM, Bert Gunter <bgunter.4...@gmail.com>
>>wrote:
>>> Show us str(DF1) . It is not a data frame.
>>>
>>> -- Bert
>>>
>>>
>>>
>>>
>>> On Fri, Apr 14, 2017 at 9:02 PM, Ashta <sewa...@gmail.com> wrote:
>>>> Hi all,
>>>> I am reading  a field data that contains several variables. The
>>sample
>>>> of the data with the first two variables is shown below.  I wanted
>>to
>>>> know the minimum  and maximum recording date   However, I have some
>>>> problem.
>>>>
>>>>
>>>> Name  Rdate V1 to  V20
>>>> Alex101/03/2015
>>>> Alex201/03/2014
>>>> Alex331/12/2012
>>>> Alex415/01/2011
>>>> Alex150  22/01/2010
>>>> Alex151  15/02/2011
>>>>
>>>>
>>>>
>>>> DF1=DF1[!is.na(DF1$Rdate),]
>>>> range(DF1$Rdate, na.rm=TRUE)
>>>>
>>>> Warning message:
>>>> In is.na(DF1$Rdate) :
>>>>   is.na() applied to non-(list or vector) of type 'NULL'
>>>> Error in DF1$Rdate : $ operator is invalid for atomic vectors
>>>> Execution halted
>>>>
>>>> I am expecting the Rdate field should contain  recording dates. I
>>am
>>>> suspecting there might be a non date  value in that columns. How do
>>I
>>>> remove that row if it is not a date format?
>>>>
>>>>
>>>> Thank you.
>>>>
>>>> __
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>__
>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non date value

2017-04-14 Thread Ashta
DF1 is a data frame.   I am suspecting there might be non date value
in that column. My question is how to  remove  a non date values  from
 that field.
example if Alex152 has  12253,. This value is not a date format.


On Fri, Apr 14, 2017 at 11:24 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
> Show us str(DF1) . It is not a data frame.
>
> -- Bert
>
>
>
>
> On Fri, Apr 14, 2017 at 9:02 PM, Ashta <sewa...@gmail.com> wrote:
>> Hi all,
>> I am reading  a field data that contains several variables. The sample
>> of the data with the first two variables is shown below.  I wanted to
>> know the minimum  and maximum recording date   However, I have some
>> problem.
>>
>>
>> Name  Rdate V1 to  V20
>> Alex101/03/2015
>> Alex201/03/2014
>> Alex331/12/2012
>> Alex415/01/2011
>> Alex150  22/01/2010
>> Alex151  15/02/2011
>>
>>
>>
>> DF1=DF1[!is.na(DF1$Rdate),]
>> range(DF1$Rdate, na.rm=TRUE)
>>
>> Warning message:
>> In is.na(DF1$Rdate) :
>>   is.na() applied to non-(list or vector) of type 'NULL'
>> Error in DF1$Rdate : $ operator is invalid for atomic vectors
>> Execution halted
>>
>> I am expecting the Rdate field should contain  recording dates. I  am
>> suspecting there might be a non date  value in that columns. How do I
>> remove that row if it is not a date format?
>>
>>
>> Thank you.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Non date value

2017-04-14 Thread Ashta
Hi all,
I am reading  a field data that contains several variables. The sample
of the data with the first two variables is shown below.  I wanted to
know the minimum  and maximum recording date   However, I have some
problem.


Name  Rdate V1 to  V20
Alex101/03/2015
Alex201/03/2014
Alex331/12/2012
Alex415/01/2011
Alex150  22/01/2010
Alex151  15/02/2011



DF1=DF1[!is.na(DF1$Rdate),]
range(DF1$Rdate, na.rm=TRUE)

Warning message:
In is.na(DF1$Rdate) :
  is.na() applied to non-(list or vector) of type 'NULL'
Error in DF1$Rdate : $ operator is invalid for atomic vectors
Execution halted

I am expecting the Rdate field should contain  recording dates. I  am
suspecting there might be a non date  value in that columns. How do I
remove that row if it is not a date format?


Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combine

2017-03-25 Thread Ashta
Hi all,

I have more than two files  and merge by a single column and preserve the
other columns.
Here is an example of two files

dat1 <- read.table(header=TRUE, text=' ID  T1 T2
ID1125245
ID2141264
ID3133281')

dat2 <- read.table(header=TRUE, text=' ID  G1 G2
ID225 46
ID4 4164
ID53381')

 How do I get the following output?

ID T1   T2   G1G2
ID11252450  0
ID2141264  2546
ID3133281   0  0
ID4   0   0 41   64
ID5   0  0  33   81

Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] find and

2017-03-18 Thread Ashta
Thank you Rudi and  Ulrik.

Rudi, your option worked for the small data set but when I applied to
the big data set it taking long and never finished and have to kill
it. I dont know why.


Ulrik's option worked fine for the big data set  (> 1.5M  records)
and took less than 2 minutes.

These two are giving me the same  results.
# Counting unique
DF4 %>%group_by(city) %>% filter(length(unique(var)) == 1)
# Counting not duplicated
DF4 %>%group_by(city) %>%filter(sum(!duplicated(var)) == 1)

 Thank yo again.


On Sat, Mar 18, 2017 at 10:40 AM, Ulrik Stervbo <ulrik.ster...@gmail.com> wrote:
> Using dplyr:
>
> library(dplyr)
>
> # Counting unique
> DF4 %>%
>   group_by(city) %>%
>   filter(length(unique(var)) == 1)
>
> # Counting not duplicated
> DF4 %>%
>   group_by(city) %>%
>   filter(sum(!duplicated(var)) == 1)
>
> HTH
> Ulrik
>
>
> On Sat, 18 Mar 2017 at 15:17 Rui Barradas <ruipbarra...@sapo.pt> wrote:
>>
>> Hello,
>>
>> I believe this does it.
>>
>>
>> sp <- split(DF4, DF4$city)
>> want <- do.call(rbind, lapply(sp, function(x)
>> if(length(unique(x$var)) == 1) x else NULL))
>> rownames(want) <- NULL
>> want
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 18-03-2017 13:51, Ashta escreveu:
>> > Hi all,
>> >
>> > I am trying to find a city that do not have the same "var" value.
>> > Within city the var should be the same otherwise exclude the city from
>> > the final data set.
>> > Here is my sample data and my attempt. City1 and city4 should be
>> > excluded.
>> >
>> > DF4 <- read.table(header=TRUE, text=' city  wk var
>> > city1  1  x
>> > city1  2  -
>> > city1  3  x
>> > city2  1  x
>> > city2  2  x
>> > city2  3  x
>> > city2  4  x
>> > city3  1  x
>> > city3  2  x
>> > city3  3  x
>> > city3  4  x
>> > city4  1  x
>> > city4  2  x
>> > city4  3  y
>> > city4  4  y
>> > city5  3  -
>> > city5  4  -')
>> >
>> > my attempt
>> >   test2  <-   data.table(DF4, key="city,var")
>> >   ID1<-   test2[ !duplicated(test2),]
>> >  dps <-   ID1$city[duplicated(ID1$city)]
>> > Ddup  <-   which(test2$city %in% dps)
>> >
>> >  if(length(Ddup) !=0)  {
>> >test2   <-  test2[- Ddup,]  }
>> >
>> > want <-  data.frame(test2)
>> >
>> >
>> > I want get the following result but I am not getting it.
>> >
>> > city wk var
>> >city2  1   x
>> >city2  2   x
>> >city2  3   x
>> >city2  4   x
>> >city3  1   x
>> >city3  2   x
>> >   city3  3   x
>> >   city3  4   x
>> >   city5  3   -
>> >   city5  4   -
>> >
>> > Can some help me out the problem is?
>> >
>> > Thank you.
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] find and

2017-03-18 Thread Ashta
Hi all,

I am trying to find a city that do not have the same "var" value.
Within city the var should be the same otherwise exclude the city from
the final data set.
Here is my sample data and my attempt. City1 and city4 should be excluded.

DF4 <- read.table(header=TRUE, text=' city  wk var
city1  1  x
city1  2  -
city1  3  x
city2  1  x
city2  2  x
city2  3  x
city2  4  x
city3  1  x
city3  2  x
city3  3  x
city3  4  x
city4  1  x
city4  2  x
city4  3  y
city4  4  y
city5  3  -
city5  4  -')

my attempt
 test2  <-   data.table(DF4, key="city,var")
 ID1<-   test2[ !duplicated(test2),]
dps <-   ID1$city[duplicated(ID1$city)]
   Ddup  <-   which(test2$city %in% dps)

if(length(Ddup) !=0)  {
  test2   <-  test2[- Ddup,]  }

want <-  data.frame(test2)


I want get the following result but I am not getting it.

   city wk var
  city2  1   x
  city2  2   x
  city2  3   x
  city2  4   x
  city3  1   x
  city3  2   x
 city3  3   x
 city3  4   x
 city5  3   -
 city5  4   -

Can some help me out the problem is?

Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeat

2017-02-25 Thread Ashta
Thank you so much David!

But if all element of a group has '-'  did not work. In this case year
2006 an example
If all values of flag  are '-' within year  then  I wan to set as N


dat=read.table(text = "Year month flag
2001 1   Z
2001 2   -
2001 4   X
2002 1   Z
2002 2   -
2003 1   -
2003 2   Z
2004 2   Z
2005 3   Z
2005 2   -
2005 3   -

2006 1   -
2006 2   - ",  header = TRUE)

dat$new <- with(dat, ave(flag, Year, FUN=function(s){ s[s=="-"] <- NA;
   zoo::na.locf(s) }) )

Error in `[<-.factor`(`*tmp*`, i, value = integer(0)) :
  replacement has length zero

On Sat, Feb 25, 2017 at 5:43 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On Feb 25, 2017, at 10:45 AM, Ashta <sewa...@gmail.com> wrote:
>>
>> Thank you David.
>> is it not possible to sort it by year and flag so that we can make '-'
>> in the second row ?  like this for that particular year.
>>
>>   2003 2 Z
>>   2003 1  -
>>
>
> I was a bit surprised by the results of htis since I had assumed than an 
> initial NA in a group would remain so, but apparently not:
>
> dat$new <- with(dat, ave(flag, Year, FUN=function(s){ s[s=="-"] <- NA; 
> zoo::na.locf(s) }) )
>
>> dat
>Year month flag new
> 1  2001 1Z   Z
> 2  2001 2-   Z
> 3  2001 4X   X
> 4  2002 1Z   Z
> 5  2002 2-   Z
> 6  2003 1-   Z
> 7  2003 2Z   Z
> 8  2004 2Z   Z
> 9  2005 3Z   Z
> 10 2005 2-   Z
> 11 2005 3-   Z
>
> David.
>
>>
>>
>> On Sat, Feb 25, 2017 at 12:14 PM, David Winsemius
>> <dwinsem...@comcast.net> wrote:
>>>
>>>> On Feb 25, 2017, at 8:09 AM, Ashta <sewa...@gmail.com> wrote:
>>>>
>>>> I have a data set and I want to repeat a column value based on other
>>>> column value,
>>>>
>>>> my data look like
>>>>
>>>> read.table(text = "Year month flag
>>>> 2001 1   Z
>>>> 2001 2   -
>>>> 2001 4   X
>>>> 2002 1   Z
>>>> 2002 2   -
>>>> 2003 1   -
>>>> 2003 2   Z
>>>> 2004 2   Z
>>>> 2005 3   Z
>>>> 2005 2   -
>>>> 2005 3   -",  header = TRUE)
>>>>
>>>> Within year If  flag = '-'  then i want replace  '-'  by the previous
>>>> row value of flag. In this example  for yea  2001 in month 2 flag is
>>>> '-' and I want replace it by the previous value of flag (i.e.,  'Z')
>>>> 2001 1   Z
>>>> 2001 2   Z
>>>> 2001 4   X
>>>>
>>>> If all values of flag  are '-' within year  then  I wan to set as N
>>>>
>>>> The complete out put result will be
>>>>
>>>> year month  flag
>>>> 2001 1   Z
>>>> 2001 2   z
>>>> 2001 4   X
>>>> 2002 1   Z
>>>> 2002 2   Z
>>>> 2003 1   Z
>>>> 2003 2   Z
>>>> 2004 2   Z
>>>> 2005 3   Z
>>>> 2005 2   N
>>>> 2005 3   N
>>>>
>>>> Thank you in advance
>>>>
>>>
>>> Your example doesn't actually match your verbal description of the 
>>> algorithm because you have not specified the rule that establishes values 
>>> for instances where the first value in a year is "-".
>>>
>>> The `na.locf` function in the 'zoo' package would be useful for the task 
>>> describe in your verbal description when used in conjunction with the 
>>> 'stats'-package's `ave` function.
>>>
>>> --
>>> David.
>>>
>>>
>>>> __
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>
> David Winsemius
> Alameda, CA, USA
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeat

2017-02-25 Thread Ashta
Thank you David.
is it not possible to sort it by year and flag so that we can make '-'
 in the second row ?  like this for that particular year.

   2003 2 Z
   2003 1  -



On Sat, Feb 25, 2017 at 12:14 PM, David Winsemius
<dwinsem...@comcast.net> wrote:
>
>> On Feb 25, 2017, at 8:09 AM, Ashta <sewa...@gmail.com> wrote:
>>
>> I have a data set and I want to repeat a column value based on other
>> column value,
>>
>> my data look like
>>
>> read.table(text = "Year month flag
>> 2001 1   Z
>> 2001 2   -
>> 2001 4   X
>> 2002 1   Z
>> 2002 2   -
>> 2003 1   -
>> 2003 2   Z
>> 2004 2   Z
>> 2005 3   Z
>> 2005 2   -
>> 2005 3   -",  header = TRUE)
>>
>> Within year If  flag = '-'  then i want replace  '-'  by the previous
>> row value of flag. In this example  for yea  2001 in month 2 flag is
>> '-' and I want replace it by the previous value of flag (i.e.,  'Z')
>> 2001 1   Z
>> 2001 2   Z
>> 2001 4   X
>>
>> If all values of flag  are '-' within year  then  I wan to set as N
>>
>> The complete out put result will be
>>
>> year month  flag
>> 2001 1   Z
>> 2001 2   z
>> 2001 4   X
>> 2002 1   Z
>> 2002 2   Z
>> 2003 1   Z
>> 2003 2   Z
>> 2004 2   Z
>> 2005 3   Z
>> 2005 2   N
>> 2005 3   N
>>
>> Thank you in advance
>>
>
> Your example doesn't actually match your verbal description of the algorithm 
> because you have not specified the rule that establishes values for instances 
> where the first value in a year is "-".
>
> The `na.locf` function in the 'zoo' package would be useful for the task 
> describe in your verbal description when used in conjunction with the 
> 'stats'-package's `ave` function.
>
> --
> David.
>
>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Repeat

2017-02-25 Thread Ashta
I have a data set and I want to repeat a column value based on other
column value,

my data look like

read.table(text = "Year month flag
2001 1   Z
2001 2   -
2001 4   X
2002 1   Z
2002 2   -
2003 1   -
2003 2   Z
2004 2   Z
2005 3   Z
2005 2   -
2005 3   -",  header = TRUE)

Within year If  flag = '-'  then i want replace  '-'  by the previous
row value of flag. In this example  for yea  2001 in month 2 flag is
'-' and I want replace it by the previous value of flag (i.e.,  'Z')
2001 1   Z
2001 2   Z
2001 4   X

If all values of flag  are '-' within year  then  I wan to set as N

The complete out put result will be

year month  flag
2001 1   Z
2001 2   z
2001 4   X
2002 1   Z
2002 2   Z
2003 1   Z
2003 2   Z
2004 2   Z
2005 3   Z
2005 2   N
2005 3   N

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read

2016-11-28 Thread Ashta
Hi all,

I have a script that  reads a file (dat.csv)  from several folders.
However, in some folders the file name is (dat) with out csv  and in
other folders it is dat.csv.  The format of data is the same(only the
file name differs  with and without "csv".

Is it possible to read these files  depending on their name in one?
like read.csv("dat.csv"). How can I read both type of file names?

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difference

2016-10-28 Thread Ashta
Hi all thank you very much for your help. Worked very well for that
data set. I just found out that one of the data sets have another
level and do the same thing, I want to calculate the difference
between successive row values (num)  to the first row value within
city and year.

city, year, num
1, 2001,25
1, 2001,75
1, 2001,150
1, 2002,35
1, 2002,65
1, 2002,120
2, 2001,25
2, 2001,95
2, 2001,150
2, 2002,35
2, 2002,110
2, 2002,120

The result will be

city,year,num,Diff
1, 2001,25, 0
1, 2001,75, 50
1, 2001,150, 125
1, 2002,35, 0
1, 2002,65, 30
1, 2002,120, 85
2, 2001,25, 0
2, 2001,95, 70
2, 2001,150, 125
2, 2002,35, 0
2, 2002,110, 75
2, 2002,120, 85

Thank you again


On Fri, Oct 28, 2016 at 4:08 AM, P Tennant <philipt...@iinet.net.au> wrote:
> Hi,
>
> You could use an anonymous function to operate on each `year-block' of your
> dataset, then assign the result as a new column:
>
> d <- data.frame(year=c(rep(2001, 3), rep(2002, 3)),
> num=c(25,75,150,30,85,95))
>
> d$diff <- unlist(by(d$num, d$year, function(x) x - x[1]))
> d
>
>   year num diff
> 1 2001  250
> 2 2001  75   50
> 3 2001 150  125
> 4 2002  300
> 5 2002  85   55
> 6 2002  95   65
>
>
> Philip
>
>
> On 28/10/2016 3:20 PM, Ashta wrote:
>>
>> Hi all,
>>
>> I want to calculate the difference  between successive row values to
>> the first row value within year.
>> How do I get that?
>>
>>   Here isthe sample of data
>> Year   Num
>> 200125
>> 200175
>> 2001   150
>> 200230
>> 200285
>> 200295
>>
>> Desired output
>> Year   Num  diff
>> 200125   0
>> 200175  50
>> 2001  150125
>> 2002300
>> 200285  55
>> 200295  65
>>
>> Thank you.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] difference

2016-10-27 Thread Ashta
Hi all,

I want to calculate the difference  between successive row values to
the first row value within year.
How do I get that?

 Here isthe sample of data
Year   Num
200125
200175
2001   150
200230
200285
200295

Desired output
Year   Num  diff
200125   0
200175  50
2001  150125
2002300
200285  55
200295  65

Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subset and sumerize

2016-10-14 Thread Ashta
Hi all,

I am trying to summarize  big data set  by   selecting a row
conditionally. and tried  to do it in a loop

Here is  the sample of my data and my attempt

dat<-read.table(text=" ID,x1,x2,y
1,a,b,15
1,x,z,21
1,x,b,16
1,x,k,25
2,d,z,31
2,x,z,28
2,g,t,41
3,h,e,32
3,x,z,38
3,x,g,45
",sep=",",header=TRUE)

For  each unique ID,  I want to select  a data when x1= "x" and x2="z"
Here is the selected data (newdat)
ID,x1,x2,y
1,x,z,21
2,x,z,28
3,x,z,38

Then I want summarize  Y values and out put as follows
Summerize
summary(newdat[i])
##
ID   Min. 1st Qu.  MedianMean 3rd Qu.Max.
1
2
3
.
.
.
28


Here is my attempt but did not work,

trt=c(1:28)
for(i  in 1:length (trt))
{
  day[i]= newdat[which(newdat$ID== trt[i] &  newdat$x1 =="x" &
newdat$x2 =="z"),]
NR[i]=dim(day[i])[1]
print(paste("Number of Records  :", NR[i]))
sm[i]=summary(day[i])
}

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create variable

2016-10-12 Thread Ashta
Hi  David and  all,

I want run the following script in a loop but faced difficulty.

trt=c(1,2,2,4,5,6,7,8)
for(i  in 1:length (trt))
{
   try[i] <- (select  trt, date1, date2, datediff(date1,date2) as
d12diff [i]  from
 dateTable  where trt=[i]")
}

I would appreciate if you point me the problem.

Thank you in  advance


On Sun, Oct 9, 2016 at 11:16 AM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On Oct 9, 2016, at 7:56 AM, Ashta <sewa...@gmail.com> wrote:
>>
>> I am trying to query data from Hive service and create a variable.
>>
>>
>> dbGetQuery(hivecon,"select date1, date2 from  dateTable limit 10")
>> date1,  date2, Diif
>> 4/5/1999,  6/14/2000
>> 7/2/1999, 6/26/2000
>> 8/14/1999, 8/19/2000
>> 11/10/1999, 9/18/2000
>> 8/25/2000, 6/5/2001
>> 3/14/2012, 3/15/2004
>>
>>
>> Here is  what I wanted to do. While I am querying I want create a
>> variable diff= dat1e1-date2.
>> I may use this variable "diff"  to do some statistics (mean, mode,
>> etc) and also in the where clause l like as the following.
>>
>> test_date=dbGetQuery(hivecon,"select date1, date2 from  dateTable
>> where diff gt 1000 limit 10")
>>
>> I would appreciate if you suggest me how to do this.
>
> Sorry for the blank message earlier. My reading of the use of Hive queries is 
> that you would need to use the `datediff` function. I further suspect you 
> need to define a variable name to which then apply your limits. I also read 
> that hive dates are actually strings types represented as POSIX style 
> character values and might need a to_date funciton. This is all guesswork 
> since I don't have a hive cluster to run this against:
>
>  So perhaps something like one of these:
>
> try1 <- dbGetQuery(hivecon,"select date1, date2, 
> datediff(TO_DATE(date1),TO_DATE(date2)) as d12diff from  dateTable where 
> d12diff GT 1000 limit 10")
>
> try2 <- dbGetQuery(hivecon,"select date1, date2, datediff(dat1,date2) as 
> d12diff from  dateTable where d12diff GT 1000 limit 10")
>
> Obviously these are just guesses.
>
> --
> David.
>>
>>
>>
>> Here is the sample of the data and  result
>>
>> date1,  date2, Diif
>> 4/5/1999,  6/14/2000, -436
>> 7/2/1999, 6/26/2000, -360
>> 8/14/1999, 8/19/2000, -371
>> 11/10/1999, 9/18/2000, -313
>> 8/25/2000, 6/5/2001, -284
>> 3/14/2012, 3/15/2004, 2921
>>
>> Thank you in advance
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create variable

2016-10-09 Thread Ashta
Thank you so much David!  Your suggestions worked for me.


On Sun, Oct 9, 2016 at 11:16 AM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On Oct 9, 2016, at 7:56 AM, Ashta <sewa...@gmail.com> wrote:
>>
>> I am trying to query data from Hive service and create a variable.
>>
>>
>> dbGetQuery(hivecon,"select date1, date2 from  dateTable limit 10")
>> date1,  date2, Diif
>> 4/5/1999,  6/14/2000
>> 7/2/1999, 6/26/2000
>> 8/14/1999, 8/19/2000
>> 11/10/1999, 9/18/2000
>> 8/25/2000, 6/5/2001
>> 3/14/2012, 3/15/2004
>>
>>
>> Here is  what I wanted to do. While I am querying I want create a
>> variable diff= dat1e1-date2.
>> I may use this variable "diff"  to do some statistics (mean, mode,
>> etc) and also in the where clause l like as the following.
>>
>> test_date=dbGetQuery(hivecon,"select date1, date2 from  dateTable
>> where diff gt 1000 limit 10")
>>
>> I would appreciate if you suggest me how to do this.
>
> Sorry for the blank message earlier. My reading of the use of Hive queries is 
> that you would need to use the `datediff` function. I further suspect you 
> need to define a variable name to which then apply your limits. I also read 
> that hive dates are actually strings types represented as POSIX style 
> character values and might need a to_date funciton. This is all guesswork 
> since I don't have a hive cluster to run this against:
>
>  So perhaps something like one of these:
>
> try1 <- dbGetQuery(hivecon,"select date1, date2, 
> datediff(TO_DATE(date1),TO_DATE(date2)) as d12diff from  dateTable where 
> d12diff GT 1000 limit 10")
>
> try2 <- dbGetQuery(hivecon,"select date1, date2, datediff(dat1,date2) as 
> d12diff from  dateTable where d12diff GT 1000 limit 10")
>
> Obviously these are just guesses.
>
> --
> David.
>>
>>
>>
>> Here is the sample of the data and  result
>>
>> date1,  date2, Diif
>> 4/5/1999,  6/14/2000, -436
>> 7/2/1999, 6/26/2000, -360
>> 8/14/1999, 8/19/2000, -371
>> 11/10/1999, 9/18/2000, -313
>> 8/25/2000, 6/5/2001, -284
>> 3/14/2012, 3/15/2004, 2921
>>
>> Thank you in advance
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] create variable

2016-10-09 Thread Ashta
I am trying to query data from Hive service and create a variable.


dbGetQuery(hivecon,"select date1, date2 from  dateTable limit 10")
date1,  date2, Diif
4/5/1999,  6/14/2000
7/2/1999, 6/26/2000
8/14/1999, 8/19/2000
11/10/1999, 9/18/2000
8/25/2000, 6/5/2001
3/14/2012, 3/15/2004


Here is  what I wanted to do. While I am querying I want create a
variable diff= dat1e1-date2.
I may use this variable "diff"  to do some statistics (mean, mode,
etc) and also in the where clause l like as the following.

test_date=dbGetQuery(hivecon,"select date1, date2 from  dateTable
where diff gt 1000 limit 10")

I would appreciate if you suggest me how to do this.



Here is the sample of the data and  result

date1,  date2, Diif
4/5/1999,  6/14/2000, -436
7/2/1999, 6/26/2000, -360
8/14/1999, 8/19/2000, -371
11/10/1999, 9/18/2000, -313
8/25/2000, 6/5/2001, -284
3/14/2012, 3/15/2004, 2921

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix

2016-07-16 Thread Ashta
HI Denes, Duncan,Michael and all,

Thank you very much  for the helpful suggestion.  Some of my data sets
were not square matrix, however, Denes's suggestion,"
as.data.frame.table() ", handled that one.

Thank you again.


On Sat, Jul 16, 2016 at 7:27 PM, Dénes Tóth <toth.de...@ttk.mta.hu> wrote:
>
>
> On 07/17/2016 01:39 AM, Duncan Murdoch wrote:
>>
>> On 16/07/2016 6:25 PM, Ashta wrote:
>>  > Hi all,
>>  >
>>  > I have a large square matrix (60 x 60)  and found it hard to
>>  > visualize. Is it possible to change it  as shown below?
>>  >
>>  > Sample example (3 x 3)
>>  >
>>  > A   B   C
>>  > A  3   4   5
>>  > B  4   7   8
>>  > C  5   8   9
>>  >
>>  > Desired output
>>  > A A  3
>>  > A B  4
>>  > A C  5
>>  > B B  7
>>  > B C  8
>>  > C C  9
>>
>> Yes, use matrix indexing.  I don't think the 3600 values are going to be
>> very easy to read, but here's how to produce them:
>>
>> m <- matrix(1:3600, 60, 60)
>> indices <- expand.grid(row = 1:60, col = 1:60)
>> cbind(indices$row, indices$col, m[as.matrix(indices)])
>>
>
> Or use as.data.frame.table():
>
> m <- matrix(1:9, 3, 3,
> dimnames = list(dimA = letters[1:3],
> dimB = letters[1:3]))
> m
> as.data.frame.table(m, responseName = "value")
>
> ---
>
> I do not know what you mean by "visualize", but image() or heatmap() are
> good starting points if you need a plot of the values. If you really need to
> inspect the raw values, you can try interactive (scrollable) tables, e.g.:
>
> library(DT)
> m <- provideDimnames(matrix(1:3600, 60, 60))
> datatable(m, options = list(pageLength = 60))
>
>
> Cheers,
>   Denes
>
>
>
>
>> Duncan Murdoch
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Matrix

2016-07-16 Thread Ashta
Hi all,

I have a large square matrix (60 x 60)  and found it hard to
visualize. Is it possible to change it  as shown below?

Sample example (3 x 3)

A   B   C
A  3   4   5
B  4   7   8
C  5   8   9

Desired output
A A  3
A B  4
A C  5
B B  7
B C  8
C C  9

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] not common records

2016-06-03 Thread Ashta
Thank you Jeff. Solved.

On Fri, Jun 3, 2016 at 12:47 AM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us> wrote:
> ?merge
>
> Pay attention to the all-whatever parameters.
> --
> Sent from my phone. Please excuse my brevity.
>
> On June 2, 2016 7:04:47 PM PDT, Ashta <sewa...@gmail.com> wrote:
>>
>> I have 2 data sets.  File1 and File2. Some records are common to both
>> data sets. For those common records I want get the difference between
>> d_x1z1= z1-x1   and d_x2z2= z2-x2.
>>
>> File1<- data.frame(var = c(561,752,800,900),  x1= c(23,35,40,15), x2=
>> c(125,284,280,347))
>> File2<- data.frame(var = c(561,752,800,1001), z1= c(43,45,40,65), z2=
>> c(185,299,280,310))
>>
>> Record  900  15347   appears only in File1
>> Record   100165310  appears only in File2
>>
>> File3 should look like as follows
>>
>> File3
>> var  x1 x2 z1z2d_x1z1   d_x2z2
>> 561  23125  43165 20 40
>> 752  35284  45299  8  15
>> 800  40280  40280  0   0
>> 900  15347  NA   NA  NA   NA
>> 1001 NA  NA   65310 NA  NA
>>
>> How do I get those record not common in both data sets ?
>> merge(
>> File1,File2) gave me only for common "var"
>>
>> 
>>
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] not common records

2016-06-02 Thread Ashta
I have 2 data sets.  File1 and File2. Some records are common to both
data sets. For those common records I want get the difference between
d_x1z1= z1-x1   and d_x2z2= z2-x2.

File1<- data.frame(var = c(561,752,800,900),  x1= c(23,35,40,15), x2=
c(125,284,280,347))
File2<- data.frame(var = c(561,752,800,1001), z1= c(43,45,40,65), z2=
c(185,299,280,310))

Record  900  15347   appears only in File1
Record   100165310  appears only in File2

File3 should look like as follows

File3
var  x1 x2 z1z2d_x1z1   d_x2z2
561  23125  43165 20 40
752  35284  45299  8  15
800  40280  40280  0   0
900  15347  NA   NA  NA   NA
1001 NA  NA   65310 NA  NA

How do I get those record not common in both data sets ?
merge( File1,File2) gave me only for common "var"

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] month and output

2016-05-07 Thread Ashta
Thank you David!

On Sat, May 7, 2016 at 12:18 AM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On May 6, 2016, at 5:15 PM, Ashta <sewa...@gmail.com> wrote:
>>
>> Thank you very much David.
>>
>> So there is no general formal that works year all round.
>>
>> The first one work only Jan to Nov
>> today <- Sys.Date()
>> nextmo<- paste0( month.abb[ as.numeric(format(today, format="%m"))+1] ,
>> format(today,"%Y") )
>> [1] "Jun2016"
>>
>> The second one works only  for the last month of the year.
>> today <- as.Date("2008-12-01")
>> nextmo<- paste0(m <- month.abb[(as.numeric(format(today,
>> format="%m"))+1) %/% 12] ,
>>  as.numeric( format(today,"%Y") ) + (m == "Jan") )
>
> Sorry;
>
> This works as intended:
>
>> today <- seq( from=as.Date("2008-1-01"), length=13, by="1 mo" )
>>
>> nextmo<- paste0( m <- month.abb[ as.numeric(format(today, format="%m")) %% 
>> 12+1] ,
> +as.numeric( format(today,"%Y") ) + (m=="Jan") ); nextmo
>  [1] "Feb2008" "Mar2008" "Apr2008" "May2008" "Jun2008" "Jul2008" "Aug2008" 
> "Sep2008"
>  [9] "Oct2008" "Nov2008" "Dec2008" "Jan2009" "Feb2009"
>
>
>
>> nextmo
>>
>>
>> Many thanks
>>
>>
>>
>>
>>
>> On Fri, May 6, 2016 at 6:40 PM, David Winsemius <dwinsem...@comcast.net> 
>> wrote:
>>>
>>>> On May 6, 2016, at 4:30 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>>>>
>>>>
>>>>> On May 6, 2016, at 4:11 PM, Ashta <sewa...@gmail.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to ge get the next month of the year.
>>>>>
>>>>> today <- Sys.Date()
>>>>> xx<- format(today, format="%B%Y")
>>>>>
>>>>> I got  "May2016",  but I want  Jun2016. How do I do that?
>>>>
>>>> today <- Sys.Date()
>>>> nextmo<- paste0( month.abb[ as.numeric(format(today, format="%m"))+1] ,
>>>>format(today,"%Y") )
>>>> [1] "Jun2016"
>>>
>>> It occurred to me that at the end of the year you would want to increment 
>>> the year as well. This calculates the next month and increments the year 
>>> value if needed:
>>>
>>> today <- as.Date("2008-12-01")
>>> nextmo<- paste0(m <- month.abb[(as.numeric(format(today, format="%m"))+1) 
>>> %/% 12] ,
>>>  as.numeric( format(today,"%Y") ) + (m == "Jan") )
>>> nextmo
>>> #[1] "Jan2009"
>>>>
>>>>>
>>>>> My other question is that, I read a data  and do some analysis  and I
>>>>> want to send all the results of the analysis to a pdf file
>>>>>
>>>>> Example
>>>>> x5 <- runif(15, 5.0, 7.5)
>>>>> x5
>>>>>
>>>>>
>>>>> I tried this one
>>>>>
>>>>> pdf(file=" test.pdf")
>>>>> x5
>>>>> dev.off()
>>>>
>>>> pdf() opens a graphics device, so you need a function that establishes a 
>>>> coordinate system:
>>>>
>>>> x5 <- runif(15, 5.0, 7.5)
>>>> pdf(file=" test.pdf");
>>>> plot(1,1,type="n")
>>>> text(1, 1, paste(round(x5, 2), collapse="\n") )
>>>> dev.off()
>>>>
>>>
>>> If you need to suppress the axes and their labels:
>>>
>>> pdf(file=" test.pdf"); plot(1,1, type="n", axes=FALSE, xlab="", ylab="")
>>> text(1, 1, paste(round(x5, 2), collapse="\n") )
>>> dev.off()
>>>
>>>> I doubt that this is what you really want, and suspect you really need to 
>>>> be studying the capabilities supported by the knitr package. If I'm wrong 
>>>> about that and you want a system that supports drawing and text on a blank 
>>>> page, then first study:
>>>>
>>>>> library(grid)
>>>>> help(pac=grid)
>>>>
>>>> If you choose that route then the text "R Graphics" by Paul Murrell will 
>>>> be indispensable.
>>>>
>>>> --
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>>> __
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>
> David Winsemius
> Alameda, CA, USA
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] month and output

2016-05-06 Thread Ashta
Thank you very much David.

So there is no general formal that works year all round.

The first one work only Jan to Nov
today <- Sys.Date()
nextmo<- paste0( month.abb[ as.numeric(format(today, format="%m"))+1] ,
 format(today,"%Y") )
[1] "Jun2016"

The second one works only  for the last month of the year.
today <- as.Date("2008-12-01")
 nextmo<- paste0(m <- month.abb[(as.numeric(format(today,
format="%m"))+1) %/% 12] ,
  as.numeric( format(today,"%Y") ) + (m == "Jan") )
 nextmo


Many thanks





On Fri, May 6, 2016 at 6:40 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On May 6, 2016, at 4:30 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>>
>>
>>> On May 6, 2016, at 4:11 PM, Ashta <sewa...@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> I am trying to ge get the next month of the year.
>>>
>>> today <- Sys.Date()
>>> xx<- format(today, format="%B%Y")
>>>
>>> I got  "May2016",  but I want  Jun2016. How do I do that?
>>
>> today <- Sys.Date()
>> nextmo<- paste0( month.abb[ as.numeric(format(today, format="%m"))+1] ,
>> format(today,"%Y") )
>> [1] "Jun2016"
>
> It occurred to me that at the end of the year you would want to increment the 
> year as well. This calculates the next month and increments the year value if 
> needed:
>
>  today <- as.Date("2008-12-01")
>  nextmo<- paste0(m <- month.abb[(as.numeric(format(today, format="%m"))+1) 
> %/% 12] ,
>   as.numeric( format(today,"%Y") ) + (m == "Jan") )
>  nextmo
> #[1] "Jan2009"
>>
>>>
>>> My other question is that, I read a data  and do some analysis  and I
>>> want to send all the results of the analysis to a pdf file
>>>
>>> Example
>>> x5 <- runif(15, 5.0, 7.5)
>>> x5
>>>
>>>
>>> I tried this one
>>>
>>> pdf(file=" test.pdf")
>>> x5
>>> dev.off()
>>
>> pdf() opens a graphics device, so you need a function that establishes a 
>> coordinate system:
>>
>> x5 <- runif(15, 5.0, 7.5)
>> pdf(file=" test.pdf");
>> plot(1,1,type="n")
>> text(1, 1, paste(round(x5, 2), collapse="\n") )
>> dev.off()
>>
>
> If you need to suppress the axes and their labels:
>
>  pdf(file=" test.pdf"); plot(1,1, type="n", axes=FALSE, xlab="", ylab="")
>  text(1, 1, paste(round(x5, 2), collapse="\n") )
>  dev.off()
>
>> I doubt that this is what you really want, and suspect you really need to be 
>> studying the capabilities supported by the knitr package. If I'm wrong about 
>> that and you want a system that supports drawing and text on a blank page, 
>> then first study:
>>
>>> library(grid)
>>> help(pac=grid)
>>
>> If you choose that route then the text "R Graphics" by Paul Murrell will be 
>> indispensable.
>>
>> --
>> David Winsemius
>> Alameda, CA, USA
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] month and output

2016-05-06 Thread Ashta
Hi all,

I am trying to ge get the next month of the year.

today <- Sys.Date()
xx<- format(today, format="%B%Y")

I got  "May2016",  but I want  Jun2016. How do I do that?

My other question is that, I read a data  and do some analysis  and I
want to send all the results of the analysis to a pdf file

Example
x5 <- runif(15, 5.0, 7.5)
x5


I tried this one

 pdf(file=" test.pdf")
 x5
dev.off()

I found the file is empty. I would appreciate if you help me out.

Thanks in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] flag a record

2016-02-28 Thread Ashta
Thank you very much Jim!
It is working fine!!

On Sun, Feb 28, 2016 at 1:46 AM, Jim Lemon <drjimle...@gmail.com> wrote:
> Hi Ashta,
> This does not seem too difficult:
>
> DF$flag<-"n"
> for(thisname in unique(DF$Name)) {
>  if(any(DF$year[DF$Name == thisname] %in% c(2014,2015) &
>   DF$tag[DF$Name == thisname]))
>   DF$flag[DF$Name == thisname]<-"y"
> }
>
> Jim
>
> On Sun, Feb 28, 2016 at 1:23 PM, Ashta <sewa...@gmail.com> wrote:
>>  Hi all,
>>
>>  I have a data set represented by the following sample.
>>
>> I want flag records of an individual as "N", if  if the tag column of
>> an individual  is equal to zero for the last  two years. So in the
>> following example, Alex1 records are flagged as "y",  On the other
>> hand Carla's records are flagged as "N" because all values of tag  for
>> Carla are zero. Another typical example is that Jon,  although the tag
>> values of Jon are greater than 0 it is flagged as "N", because his
>> record  are more than two years old.
>>
>> DF <- read.table(textConnection(" Name  year  tag
>> Alex12011 0
>> Alex12012 1
>> Alex12013 0
>> Alex12014 1
>>
>> Carla 2013  0
>> Carla 2014  0
>> Carla 2015  0
>> Carla 2012  0
>>
>> Tom 2014   1
>> Tom 2015   1
>>
>>  Jon  2010  1
>>  Jon 2011   1"),header = TRUE)
>>
>> I want create another variable " Flag  with value Y or  N"  if an
>> individual has a  value greater than 0 in the tag column  for the last
>> two years  then  the flag value will be y otherwise  it n.
>>
>>
>> the outcome will be
>>   name   year  tagFlag
>> Alex12011 0  y
>> Alex12012 1  y
>> Alex12013 0  y
>> Alex12014 1  y
>>
>> Carla 2013  0 n
>> Carla 2014  0 n
>> Carla 2015  0 n
>> Carla 2012  0 n
>>
>> Tom 2014   1  y
>> Tom 2015   1  y
>>
>>  Jon 2010   1  n
>>  Jon 2011   1   n
>>
>> Thank you in advance
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] flag a record

2016-02-27 Thread Ashta
 Hi all,

 I have a data set represented by the following sample.

I want flag records of an individual as "N", if  if the tag column of
an individual  is equal to zero for the last  two years. So in the
following example, Alex1 records are flagged as "y",  On the other
hand Carla's records are flagged as "N" because all values of tag  for
Carla are zero. Another typical example is that Jon,  although the tag
values of Jon are greater than 0 it is flagged as "N", because his
record  are more than two years old.

DF <- read.table(textConnection(" Name  year  tag
Alex12011 0
Alex12012 1
Alex12013 0
Alex12014 1

Carla 2013  0
Carla 2014  0
Carla 2015  0
Carla 2012  0

Tom 2014   1
Tom 2015   1

 Jon  2010  1
 Jon 2011   1"),header = TRUE)

I want create another variable " Flag  with value Y or  N"  if an
individual has a  value greater than 0 in the tag column  for the last
two years  then  the flag value will be y otherwise  it n.


the outcome will be
  name   year  tagFlag
Alex12011 0  y
Alex12012 1  y
Alex12013 0  y
Alex12014 1  y

Carla 2013  0 n
Carla 2014  0 n
Carla 2015  0 n
Carla 2012  0 n

Tom 2014   1  y
Tom 2015   1  y

 Jon 2010   1  n
 Jon 2011   1   n

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matrix summary

2016-02-12 Thread Ashta
hi all,

I have  a square matrix (1000 by 1000),
1. I want calculate  mean,  min and max values for each column and row.

2, I want pick the  coordinate value of the matrix that has the max
and min value for each row and column.
This an example 4 by 4 square matrix


  MeanMinMax
117   1213 2140.75   12117
 213211   1 16.25   1   32
 654323   7  34.57   65
 586178 957358  95
Mean652537   31.25
Min2112111
Max 117617895


Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] LDheatmap

2016-02-03 Thread Ashta
Hi all,

I am  looking for an R package that calculates  a pair wise LD
(linkage disequilibrium) I came up with  library(LDheatmap).  has any
one used this library? I would appreciate if I get a help how to use
this library for my set of data..


My data set look like

Geno file
Name1 1 1 2 2 2 2
Name2 2 2 2 2 2 2
Name3 2 2 2 2 2 2
Name4  2 2 2 2 2 2
Name5 1 1 2 2 2 2


NameN  1   1 1 2 2 2 2


The other file is map file
Chromosome, SNP, Location (physical)


Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional Random selection

2015-11-21 Thread Ashta
Thank you  David!

I rerun the your script and it is giving me the first three time periods
is it doing random sampling?

  tab.fan
  time X1  X2
22  5 230
33  1 300
55  2  10



On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote:
> Use dput() to send data to the list as it is more compact:
>
>> dput(tab)
> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time",
> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L))
>
> You can just remove the lines with X1 = 0 since you don't want to use them.
>
>> tab.sub <- tab[tab$X1>0, ]
>
> Then the following gives you a sample:
>
>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>
> Note, that your "solution" of times 6, 7, and 8 will never appear because the 
> sum of the values is 586.
>
>
> David L. Carlson
> Department of Anthropology
> Texas A University
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta
> Sent: Saturday, November 21, 2015 11:53 AM
> To: R help <r-help@r-project.org>
> Subject: [R] Conditional Random selection
>
> Hi all,
>
> I have a data set that contains samples collected over time.   In
> each time period the total number of samples are given (X2)   The goal
> is to  select 500  random samples.The selection should be based on
> time  (select time periods until I reach 500 samples). Also the time
> period should have greater than 0 for  X1 variable. X1 is an indicator
> variable.
>
> Select "time" until reaching the  sum of X2  is > 500 and if   X1 is  >  0
>
> tab  <- read.table(textConnection(" time   X1 X2
> 1  0251
> 2  5230
> 3  1300
> 4  0 25
> 5  2 10
> 6  3 101
> 7  1 300
>  8 4 185   "),header = TRUE)
>
> In the above example,  samples from time 1 and 4  will not be selected
> ( X1 is zero)
> So I could reach my target by selecting time 6,7, and 8 or  time 2 and
> 3 and so on.
>
> Can any one help to do that?
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional Random selection

2015-11-21 Thread Ashta
Hi all,

I have a data set that contains samples collected over time.   In
each time period the total number of samples are given (X2)   The goal
is to  select 500  random samples.The selection should be based on
time  (select time periods until I reach 500 samples). Also the time
period should have greater than 0 for  X1 variable. X1 is an indicator
variable.

Select "time" until reaching the  sum of X2  is > 500 and if   X1 is  >  0

tab  <- read.table(textConnection(" time   X1 X2
1  0251
2  5230
3  1300
4  0 25
5  2 10
6  3 101
7  1 300
 8 4 185   "),header = TRUE)

In the above example,  samples from time 1 and 4  will not be selected
( X1 is zero)
So I could reach my target by selecting time 6,7, and 8 or  time 2 and
3 and so on.

Can any one help to do that?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional Random selection

2015-11-21 Thread Ashta
Hi  Bert  and all,
I have related question.  In each  time period there were different
locations where the samples were collected (S1).   I  want count  the
number of unique locations (S1)  for each unique time period . So in
time 1 the samples were collected from two locations and time 2 only
from one location and time 3  from  three locations..

tab  <- read.table(textConnection(" time   S1  rep
1  1   1
1  2   1
1  2   2
2  1   1
2  1   2
2  1   3
2  1   4
3  1   1
3  2   1
3  3   1   "),header = TRUE)

what I want is

time  S1
12
21
33

Thank you again.



On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewa...@gmail.com> wrote:
>  Thank you Bert!
>
> What I want is at least 500 samples based on random  sampling of time
> period. This allows samples  collected at the same time period are
> included together.
>
> Your script is doing what I wanted to do!!
>
> Many thanks
>
>
>
>
> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
>> David's "solution" is incorrect. It can also fail to give you times
>> with a total of 500 items to sample from in the time periods.
>>
>> It is not entirely clear what you want. The solution below gives you a
>> random sample of time periods in which X1>0 and the total number of
>> samples among them is >= 500. It does not give you the fewest number
>> of periods that can do this. Is this what you want?
>>
>> tab[with(tab,{
>>   rownums<- sample(seq_len(nrow(tab))[X1>0])
>>   sz <- cumsum(X2[rownums])
>>   rownums[c(TRUE,sz<500)]
>> }),]
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>-- Clifford Stoll
>>
>>
>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewa...@gmail.com> wrote:
>>> Thank you  David!
>>>
>>> I rerun the your script and it is giving me the first three time periods
>>> is it doing random sampling?
>>>
>>>   tab.fan
>>>   time X1  X2
>>> 22  5 230
>>> 33  1 300
>>> 55  2  10
>>>
>>>
>>>
>>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote:
>>>> Use dput() to send data to the list as it is more compact:
>>>>
>>>>> dput(tab)
>>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
>>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = 
>>>> c("time",
>>>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L))
>>>>
>>>> You can just remove the lines with X1 = 0 since you don't want to use them.
>>>>
>>>>> tab.sub <- tab[tab$X1>0, ]
>>>>
>>>> Then the following gives you a sample:
>>>>
>>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>>>>
>>>> Note, that your "solution" of times 6, 7, and 8 will never appear because 
>>>> the sum of the values is 586.
>>>>
>>>>
>>>> David L. Carlson
>>>> Department of Anthropology
>>>> Texas A University
>>>>
>>>> -Original Message-
>>>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta
>>>> Sent: Saturday, November 21, 2015 11:53 AM
>>>> To: R help <r-help@r-project.org>
>>>> Subject: [R] Conditional Random selection
>>>>
>>>> Hi all,
>>>>
>>>> I have a data set that contains samples collected over time.   In
>>>> each time period the total number of samples are given (X2)   The goal
>>>> is to  select 500  random samples.The selection should be based on
>>>> time  (select time periods until I reach 500 samples). Also the time
>>>> period should have greater than 0 for  X1 variable. X1 is an indicator
>>>> variable.
>>>>
>>>> Select "time" until reaching the  sum of X2  is > 500 and if   X1 is  >  0
>>>>
>>>> tab  <- read.table(textConnection(" time   X1 X2
>>>> 1  0251
>>>> 2  5230
>>>> 3  1300
>>>> 4  0 25
>>>> 5  2 10
>>>> 6  3 101
>>>> 7  1 300

Re: [R] Conditional Random selection

2015-11-21 Thread Ashta
Hi  Rui ,

I tried that one  before I send out my original message.
it gave me only this,

tapply(tab$S1, tab$time, function(x) length(unique(x)))
1 2 3
2 1 3

I am expecting an output of like this

 time  S1
12
21
33






On Sat, Nov 21, 2015 at 2:38 PM,  <ruipbarra...@sapo.pt> wrote:
> Hello,
>
> Try
>
> tapply(tab$S1, tab$time, function(x) length(unique(x)))
>
> Hope this helps,
>
> Rui Barradas
>
>
> Citando Ashta <sewa...@gmail.com>:
>
> Hi  Bert  and all,
> I have related question.  In each  time period there were different
> locations where the samples were collected (S1).   I  want count  the
> number of unique locations (S1)  for each unique time period . So in
> time 1 the samples were collected from two locations and time 2 only
> from one location and time 3  from  three locations..
>
> tab  <- read.table(textConnection(" time   S1  rep
> 1  1   1
> 1  2   1
> 1  2   2
> 2  1   1
> 2  1   2
> 2  1   3
> 2  1   4
> 3  1   1
> 3  2   1
> 3  3   1   "),header = TRUE)
>
> what I want is
>
> time  S1
>12
>21
>33
>
> Thank you again.
>
>
>
> On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewa...@gmail.com> wrote:
>
> Thank you Bert!
>
> What I want is at least 500 samples based on random  sampling of time
> period. This allows samples  collected at the same time period are
> included together.
>
> Your script is doing what I wanted to do!!
>
> Many thanks
>
>
>
>
> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
>
> David's "solution" is incorrect. It can also fail to give you times
> with a total of 500 items to sample from in the time periods.
>
> It is not entirely clear what you want. The solution below gives you a
> random sample of time periods in which X1>0 and the total number of
> samples among them is >= 500. It does not give you the fewest number
> of periods that can do this. Is this what you want?
>
> tab[with(tab,{
>   rownums<- sample(seq_len(nrow(tab))[X1>0])
>   sz <- cumsum(X2[rownums])
>   rownums[c(TRUE,sz<500)]
> }),]
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>-- Clifford Stoll
>
>
> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewa...@gmail.com> wrote:
>
> Thank you  David!
>
> I rerun the your script and it is giving me the first three time periods
> is it doing random sampling?
>
>   tab.fan
>   time X1  X2
> 22  5 230
> 33  1 300
> 55  2  10
>
>
>
> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote:
>
> Use dput() to send data to the list as it is more compact:
>
> dput(tab)
>
> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names =
> c("time",
> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L))
>
> You can just remove the lines with X1 = 0 since you don't want to use them.
>
> tab.sub <- tab[tab$X1>0, ]
>
> Then the following gives you a sample:
>
> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>
> Note, that your "solution" of times 6, 7, and 8 will never appear because
> the sum of the values is 586.
>
>
> David L. Carlson
> Department of Anthropology
> Texas A University
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta
> Sent: Saturday, November 21, 2015 11:53 AM
> To: R help <r-help@r-project.org>
> Subject: [R] Conditional Random selection
>
> Hi all,
>
> I have a data set that contains samples collected over time.   In
> each time period the total number of samples are given (X2)   The goal
> is to  select 500  random samples.The selection should be based on
> time  (select time periods until I reach 500 samples). Also the time
> period should have greater than 0 for  X1 variable. X1 is an indicator
> variable.
>
> Select "time" until reaching the  sum of X2  is > 500 and if   X1 is  >  0
>
> tab  <- read.table(textConnection(" time   X1 X2
> 1  0251
> 2  5230
> 3  1300
> 4  0 25
> 5  2 10
> 6  3 101
> 7  1 300
> 8 4 185   "),header = TRUE)
>
> In the above example,  samples from time 1 and 4  will not be selected

Re: [R] Conditional Random selection

2015-11-21 Thread Ashta
Thank you !

 I was also able to do it this way, too!

hc <- ddply(tab1, .(time), summarize, S1 = length(unique(S1)))


On Sat, Nov 21, 2015 at 3:40 PM,  <ruipbarra...@sapo.pt> wrote:
> Hello,
>
> Is that a real doubt? Like Bert said, you should spend some time with an R
> tutorial. All you need is to know how to form a data.frame.
>
>
> tmp <- tapply(tab1$S1, tab1$time, function(x) length(unique(x)))
> data.frame(time = names(tmp), S1 = tmp)
>
> Rui Barradas
>
>
> Citando Ashta <sewa...@gmail.com>:
>
> Hi  Rui ,
>
>
>
> I tried that one  before I send out my original message.
> it gave me only this,
>
> tapply(tab$S1, tab$time, function(x) length(unique(x)))
> 1 2 3
> 2 1 3
>
> I am expecting an output of like this
>
> time  S1
>12
>21
>33
>
>
>
>
>
>
> On Sat, Nov 21, 2015 at 2:38 PM,  <ruipbarra...@sapo.pt> wrote:
>
> Hello,
>
> Try
>
> tapply(tab$S1, tab$time, function(x) length(unique(x)))
>
> Hope this helps,
>
> Rui Barradas
>
>
> Citando Ashta <sewa...@gmail.com>:
>
> Hi  Bert  and all,
> I have related question.  In each  time period there were different
> locations where the samples were collected (S1).   I  want count  the
> number of unique locations (S1)  for each unique time period . So in
> time 1 the samples were collected from two locations and time 2 only
> from one location and time 3  from  three locations..
>
> tab  <- read.table(textConnection(" time   S1  rep
> 1  1   1
> 1  2   1
> 1  2   2
> 2  1   1
> 2  1   2
> 2  1   3
> 2  1   4
> 3  1   1
> 3  2   1
> 3  3   1   "),header = TRUE)
>
> what I want is
>
> time  S1
>12
>21
>33
>
> Thank you again.
>
>
>
> On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewa...@gmail.com> wrote:
>
> Thank you Bert!
>
> What I want is at least 500 samples based on random  sampling of time
> period. This allows samples  collected at the same time period are
> included together.
>
> Your script is doing what I wanted to do!!
>
> Many thanks
>
>
>
>
> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
>
> David's "solution" is incorrect. It can also fail to give you times
> with a total of 500 items to sample from in the time periods.
>
> It is not entirely clear what you want. The solution below gives you a
> random sample of time periods in which X1>0 and the total number of
> samples among them is >= 500. It does not give you the fewest number
> of periods that can do this. Is this what you want?
>
> tab[with(tab,{
>   rownums<- sample(seq_len(nrow(tab))[X1>0])
>   sz <- cumsum(X2[rownums])
>   rownums[c(TRUE,sz<500)]
> }),]
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>-- Clifford Stoll
>
>
> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewa...@gmail.com> wrote:
>
> Thank you  David!
>
> I rerun the your script and it is giving me the first three time periods
> is it doing random sampling?
>
>   tab.fan
>   time X1  X2
> 22  5 230
> 33  1 300
> 55  2  10
>
>
>
> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote:
>
> Use dput() to send data to the list as it is more compact:
>
> dput(tab)
>
> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names =
> c("time",
> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L))
>
> You can just remove the lines with X1 = 0 since you don't want to use them.
>
> tab.sub <- tab[tab$X1>0, ]
>
> Then the following gives you a sample:
>
> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>
> Note, that your "solution" of times 6, 7, and 8 will never appear because
> the sum of the values is 586.
>
>
> David L. Carlson
> Department of Anthropology
> Texas A University
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta
> Sent: Saturday, November 21, 2015 11:53 AM
> To: R help <r-help@r-project.org>
> Subject: [R] Conditional Random selection
>
> Hi all,
>
> I have a data set that contains samples collected over time.   In
> each time period the total number of samples are given (X2)   The goal
> is to  select 500  random samp

Re: [R] Conditional Random selection

2015-11-21 Thread Ashta
 Thank you Bert!

What I want is at least 500 samples based on random  sampling of time
period. This allows samples  collected at the same time period are
included together.

Your script is doing what I wanted to do!!

Many thanks




On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
> David's "solution" is incorrect. It can also fail to give you times
> with a total of 500 items to sample from in the time periods.
>
> It is not entirely clear what you want. The solution below gives you a
> random sample of time periods in which X1>0 and the total number of
> samples among them is >= 500. It does not give you the fewest number
> of periods that can do this. Is this what you want?
>
> tab[with(tab,{
>   rownums<- sample(seq_len(nrow(tab))[X1>0])
>   sz <- cumsum(X2[rownums])
>   rownums[c(TRUE,sz<500)]
> }),]
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>-- Clifford Stoll
>
>
> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewa...@gmail.com> wrote:
>> Thank you  David!
>>
>> I rerun the your script and it is giving me the first three time periods
>> is it doing random sampling?
>>
>>   tab.fan
>>   time X1  X2
>> 22  5 230
>> 33  1 300
>> 55  2  10
>>
>>
>>
>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarl...@tamu.edu> wrote:
>>> Use dput() to send data to the list as it is more compact:
>>>
>>>> dput(tab)
>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = 
>>> c("time",
>>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L))
>>>
>>> You can just remove the lines with X1 = 0 since you don't want to use them.
>>>
>>>> tab.sub <- tab[tab$X1>0, ]
>>>
>>> Then the following gives you a sample:
>>>
>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>>>
>>> Note, that your "solution" of times 6, 7, and 8 will never appear because 
>>> the sum of the values is 586.
>>>
>>>
>>> David L. Carlson
>>> Department of Anthropology
>>> Texas A University
>>>
>>> -Original Message-
>>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta
>>> Sent: Saturday, November 21, 2015 11:53 AM
>>> To: R help <r-help@r-project.org>
>>> Subject: [R] Conditional Random selection
>>>
>>> Hi all,
>>>
>>> I have a data set that contains samples collected over time.   In
>>> each time period the total number of samples are given (X2)   The goal
>>> is to  select 500  random samples.The selection should be based on
>>> time  (select time periods until I reach 500 samples). Also the time
>>> period should have greater than 0 for  X1 variable. X1 is an indicator
>>> variable.
>>>
>>> Select "time" until reaching the  sum of X2  is > 500 and if   X1 is  >  0
>>>
>>> tab  <- read.table(textConnection(" time   X1 X2
>>> 1  0251
>>> 2  5230
>>> 3  1300
>>> 4  0 25
>>> 5  2 10
>>> 6  3 101
>>> 7  1 300
>>>  8 4 185   "),header = TRUE)
>>>
>>> In the above example,  samples from time 1 and 4  will not be selected
>>> ( X1 is zero)
>>> So I could reach my target by selecting time 6,7, and 8 or  time 2 and
>>> 3 and so on.
>>>
>>> Can any one help to do that?
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ranking

2015-11-14 Thread Ashta
Hi all,

I have the following raw data some records  don't have the second variable.

test <- read.table(textConnection(" Country  STATUS
USA
USAW
USAW
GER
GERW
GERw
GERW
UNKW
UNK
UNKW
FRA
FRA
FRAW
FRAW
FRAW
SPA
SPAW
SPA  "),header = TRUE,  sep= "\t")
test

It is not reading it correctly.

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 17 did not have 2 elements



After reading   I want change the status column  to numeric so that I
can use the table function

test$STATUS <- ifelse(is.na(test$STATUS), 0,  1)

at the end I want the following table (Country, Won, Lost , Number of
games played and % of score ) and pick the top 3 countries.

COUNTRY   Won   Lost   NG%W
 USA 21 3  (2/3)*100
 GER 31 4  (3/4)*100
 UNK 21 3  (2/3)*100
 FRA 3 25  (3/5)*100
 SPA 1 2 3  (1/3)*100

Thank you in  advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ranking

2015-11-14 Thread Ashta
Thank you David,

My intention was if I change the status column  to numeric
0= Lost and 1 Won, then I can use this numeric variables  to calculate
the  Percent game Won by each country.
how did you read the data first?
That was my problem.   The actual data is in a file have to be read or laded.

Thank you !






On Sat, Nov 14, 2015 at 6:10 PM, David L Carlson <dcarl...@tamu.edu> wrote:
> It is always good to read the manual page for a function, but especially when 
> it is not working as you expected. In this case if you look at the arguments 
> for read.table(), you will find one called fill=TRUE that is useful in this 
> case.
>
> Based on your ifelse(), you seem to be assuming that a blank is not missing 
> data but a lost game. You may also discover that in your example wins are 
> coded as w and W.  Since character variables get converted to factors by 
> default, you could use something like:
>
>> levels(test$STATUS) <- c("L", "W", "W")
>> addmargins(xtabs(~Country+STATUS, test), 2)
>STATUS
> Country L W Sum
> FRA 2 3   5
> GER 1 3   4
> SPA 2 1   3
> UNK 1 2   3
> USA 1 2   3
>
> I'll let you figure out how to get the last column.
>
> David L. Carlson
> Department of Anthropology
> Texas A University
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ashta
> Sent: Saturday, November 14, 2015 4:28 PM
> To: R help <r-help@r-project.org>
> Subject: [R] Ranking
>
> Hi all,
>
> I have the following raw data some records  don't have the second variable.
>
> test <- read.table(textConnection(" Country  STATUS
> USA
> USAW
> USAW
> GER
> GERW
> GERw
> GERW
> UNKW
> UNK
> UNKW
> FRA
> FRA
> FRAW
> FRAW
> FRAW
> SPA
> SPAW
> SPA  "),header = TRUE,  sep= "\t")
> test
>
> It is not reading it correctly.
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>   line 17 did not have 2 elements
>
>
>
> After reading   I want change the status column  to numeric so that I
> can use the table function
>
> test$STATUS <- ifelse(is.na(test$STATUS), 0,  1)
>
> at the end I want the following table (Country, Won, Lost , Number of
> games played and % of score ) and pick the top 3 countries.
>
> COUNTRY   Won   Lost   NG%W
>  USA 21 3  (2/3)*100
>  GER 31 4  (3/4)*100
>  UNK 21 3  (2/3)*100
>  FRA 3 25  (3/5)*100
>  SPA 1 2 3  (1/3)*100
>
> Thank you in  advance
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning

2015-11-11 Thread Ashta
Sarah,

Thank you very much.   For the other variables
I was trying to do the same job in different way because it is easier to
list it

Example

test < which(dat$var1  !="BAA" | dat$var1 !="FAG" )
 {
dat <- dat[-test,]}   and I did not get the  right result. What am I
missing here?





On Wed, Nov 11, 2015 at 7:54 PM, Sarah Goslee <sarah.gos...@gmail.com>
wrote:

> On Wed, Nov 11, 2015 at 8:44 PM, Ashta <sewa...@gmail.com> wrote:
> > Hi Sarah,
> >
> > I used the following to clean my data, the program crushed several times.
> >
> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >
> > What is the difference between these two
> >
> > test <- dat[dat$Var1  %in% "YYZ" | dat$Var1 %in% "MSN" ,]
>
> Besides that you're using %in% wrong? I told you how to proceed.
>
> myvalues <- c("YYZ", "MSN")
>
> test <- subset(dat, Var1 %in% myvalues)
>
>
> > subset(dat, Var1 %in% myvalues)
>   X Var1 Freq
> 3 3  MSN 1040
> 4 4  YYZ  300
>
> >
> >
> >
> >
> > On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.gos...@gmail.com>
> > wrote:
> >>
> >> Please keep replies on the list so others may participate in the
> >> conversation.
> >>
> >> If you have a character vector containing the potential values, you
> >> might look at %in% for one approach to subsetting your data.
> >>
> >> Var1 %in% myvalues
> >>
> >> Sarah
> >>
> >> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewa...@gmail.com> wrote:
> >> > Thank you Sarah for your prompt response!
> >> >
> >> > I have the list of values of the variable Var1 it is around 20.
> >> > How can I modify this one to include all the 20 valid values?
> >> >
> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >> >
> >> > Is there a way (efficient )  of doing it?
> >> >
> >> > Thank you again
> >> >
> >> >
> >> >
> >> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.gos...@gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewa...@gmail.com> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > I have a data frame with  huge rows and columns.
> >> >> >
> >> >> > When I looked at the data,  it has several garbage values need to
> be
> >> >> >
> >> >> > cleaned. For a sample I am showing you the frequency distribution
> >> >> > of one variables
> >> >> >
> >> >> > Var1 Freq
> >> >> > 1:3
> >> >> > 2]6
> >> >> > 3MSN 1040
> >> >> > 4YYZ  300
> >> >> > 5\\4
> >> >> > 6+ 3
> >> >> > 7.   ?>   15
> >> >>
> >> >> Please use dput() to provide your data. I made a guess at what you
> had
> >> >> in R, but could be wrong.
> >> >>
> >> >>
> >> >> > and continues.
> >> >> >
> >> >> > I want to keep those rows that contain only a valid variable value
> >> >> >
> >> >> > In this  case MSN and YYZ. I tried the following
> >> >> >
> >> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
> >> >> >
> >> >> > but I am not getting the desired result.
> >> >>
> >> >> What are you getting? How does it differ from the desired result?
> >> >>
> >> >> >  I have
> >> >> >
> >> >> > Any help or idea?
> >> >>
> >> >> I get:
> >> >>
> >> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ",
> >> >> > "",
> >> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names =
> >> >> c("X",
> >> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
> >> >> >
> >> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >> >> > test
> >> >>   X Var1 Freq
> >> >> 3 3  MSN 1040
> >> >> 4 4  YYZ  300
> >> >>
> >> >> Which seems reasonable to me.
> >> >>
> >> >>
> >> >> >
> >> >> > [[alternative HTML version deleted]]
> >> >>
> >> >> Please don't post in HTML either: it introduces all sorts of errors
> to
> >> >> your message.
> >> >>
> >> >> Sarah
> >> >>
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleaning

2015-11-11 Thread Ashta
Hi all,

I have a data frame with  huge rows and columns.

When I looked at the data,  it has several garbage values need to be

cleaned. For a sample I am showing you the frequency distribution
of one variables

Var1 Freq
1:3
2]6
3MSN 1040
4YYZ  300
5\\4
6+ 3
7.   ?>   15

and continues.

I want to keep those rows that contain only a valid variable value

In this  case MSN and YYZ. I tried the following

*test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*

but I am not getting the desired result.

 I have

Any help or idea?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning

2015-11-11 Thread Ashta
Hi Sarah,

I used the following to clean my data, the program crushed several times.


*test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*



*What is the difference between these two**test <- dat[dat$Var1
**%in% "YYZ" | dat$Var1** %in% "MSN" ,]*




On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.gos...@gmail.com>
wrote:

> Please keep replies on the list so others may participate in the
> conversation.
>
> If you have a character vector containing the potential values, you
> might look at %in% for one approach to subsetting your data.
>
> Var1 %in% myvalues
>
> Sarah
>
> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewa...@gmail.com> wrote:
> > Thank you Sarah for your prompt response!
> >
> > I have the list of values of the variable Var1 it is around 20.
> > How can I modify this one to include all the 20 valid values?
> >
> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >
> > Is there a way (efficient )  of doing it?
> >
> > Thank you again
> >
> >
> >
> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.gos...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewa...@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I have a data frame with  huge rows and columns.
> >> >
> >> > When I looked at the data,  it has several garbage values need to be
> >> >
> >> > cleaned. For a sample I am showing you the frequency distribution
> >> > of one variables
> >> >
> >> > Var1 Freq
> >> > 1:3
> >> > 2]6
> >> > 3MSN 1040
> >> > 4YYZ  300
> >> > 5\\4
> >> > 6+ 3
> >> > 7.   ?>   15
> >>
> >> Please use dput() to provide your data. I made a guess at what you had
> >> in R, but could be wrong.
> >>
> >>
> >> > and continues.
> >> >
> >> > I want to keep those rows that contain only a valid variable value
> >> >
> >> > In this  case MSN and YYZ. I tried the following
> >> >
> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
> >> >
> >> > but I am not getting the desired result.
> >>
> >> What are you getting? How does it differ from the desired result?
> >>
> >> >  I have
> >> >
> >> > Any help or idea?
> >>
> >> I get:
> >>
> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ",
> "",
> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names =
> c("X",
> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
> >> >
> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> >> > test
> >>   X Var1 Freq
> >> 3 3  MSN 1040
> >> 4 4  YYZ  300
> >>
> >> Which seems reasonable to me.
> >>
> >>
> >> >
> >> > [[alternative HTML version deleted]]
> >>
> >> Please don't post in HTML either: it introduces all sorts of errors to
> >> your message.
> >>
> >> Sarah
> >>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] curve

2010-12-13 Thread Ashta
Thanks Sarah,

 1. to shade or color (blue) the curve using the criterion that any values
greater than 11,000

I think I was not clear in the above point. I want shade not the line but
the area under the curve,

and
Your last line of code,
segments(x0=mean(test1), y0=0, y1=curveheight)

gave me the  following error message

Error in segments(x0 = mean(test1), y0 = 0, y1 = curveheight) :
 element 3 is empty;
  the part of the args list of '.Internal' being evaluated was:
  (x0, y0, x1, y1, col = col, lty = lty, lwd = lwd, ...)

could you check it please



On Mon, Dec 13, 2010 at 2:01 PM, Sarah Goslee sarah.gos...@gmail.com
wrote:
 Here's one way to do what I think you want:


   test- rnorm(5000,1000,100)
   test1 - subset(test, subset=(test  1100))
   d - density(test)
   plot(d, main=Density of production, xlab=)


 lines(d$x[d$x  1100], d$y[d$x  1100], col=blue, lwd=2)

 curveheight - d$y[abs((d$x - mean(test1))) == min(abs((d$x -
mean(test1]
 segments(x0=mean(test1), y0=0, y1=curveheight)


 Sarah

 On Mon, Dec 13, 2010 at 1:44 PM, Val valkr...@gmail.com wrote:
 Hi All,

  I generated 5000 samples using the following script

test- rnorm(5000,1000,100)
test1 - subset(test, subset=(test  1100))
d - density(test)
plot(d, main=Density of production)
abline(v=mean(test1)

 I wanted to do the following but faced difficulties
 1. to shade or color (blue) the curve using the criterion that any values
 greater than 11,000
 2. I drew a vertical line  but I wanted the v-line within the curve not
to
 stick outside the curve
 3. to suppress the output  produced  at the bottom of the curve( N=5000
and
 bandwidth =16.22)

 Thanks  in advance
  Val




 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Survival

2010-03-20 Thread Ashta
Hi All,

I was trying to find  a function that handles Partially Linear
Single-Index model  in survival analysis, but was not lucky.

Is thee a function in R for this type of analysis?

Thanks
A

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] likelihood

2010-03-08 Thread Ashta
Hi all,

Does any one know how to write the likelihood function for Poisson distribution
in R when  P(x=0).

 For normal case, it an be written as follows,


  n  *  log(lambda)  -  lambda  *  n  *  mean(dat)



Any help is highly appreciated

Ashta

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Histogram color

2010-03-04 Thread Ashta
In a  histogram , is it possible to have different colors?
Example. I generated

x - rnorm(100)
hist(x)

I want the histogram to have different colors based on the following condition
 mean(x)+sd(x)   with red color and  mean(x) - sd(x) with red color as
well. The  middle  one with blue color.
Is it possible to do that in R?
Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Step function

2010-01-29 Thread Ashta
Hi All,

Does the step function work in this model?  I tried to run the
following model but  no result obtained. The computer is hanging and I
killed the job several times. Below is the code.

library(survival)
m.fit=clogit(y~x1+x2+x3+x4, data=ftest)
summary(m.fit)
final- step(m.fit)

Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] output

2010-01-18 Thread Ashta
Hi all,
I am trying to interparete  the result of the following output from  lm;


fit1 =lm(Feed _Intake ~ weight + season + weight*season)
Season has three classes(x,y,z)

Reults are

Estimate (Intercept)   21.51559
weight   2.13051
factor(season)y  10.59739
factor(season)z1.30421
weight:factor(season)y  10.1
weight:factor(season)z  21.70288

My question are  what is the estimate of season x?

Could it be possible to change the output in the following way?

factor(season)x
factor(season)y
weight:factor(season)x
weight:factor(season)y

Thanks in adavance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] output

2010-01-18 Thread Ashta
Hi all,

I have a data set  such that  the response variable size  binary (Short or Long)
Color has two classes (red and green)  red=1 ; green=0
Lm1 - glm(size ~color, data =test, family = binomial())
   Estimate   Std. Errorz value
(Intercept)  12.0523.11037-12.273
color  0.78500.06624   3.952
How do I get the probability of sizes for the two different colors(red
and green)?







On Mon, Jan 18, 2010 at 11:15 AM, Henrique Dallazuanna www...@gmail.com wrote:
 Try this:

 DF$season - relevel(DF$season, 'y')
 fit1 - lm(Feed_Intake ~ weight + season + weight*season, data = DF)

 On Mon, Jan 18, 2010 at 2:00 PM, Ashta sewa...@gmail.com wrote:
 Hi all,
 I am trying to interparete  the result of the following output from  lm;


 fit1 =lm(Feed _Intake ~ weight + season + weight*season)
 Season has three classes(x,y,z)

 Reults are

 Estimate (Intercept)               21.51559
 weight                                       2.13051
 factor(season)y                      10.59739
 factor(season)z                        1.30421
 weight:factor(season)y          10.1
 weight:factor(season)z          21.70288

 My question are  what is the estimate of season x?

 Could it be possible to change the output in the following way?

 factor(season)x
 factor(season)y
 weight:factor(season)x
 weight:factor(season)y

 Thanks in adavance

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hazard ratio

2009-12-10 Thread Ashta
Hi all,

I want to calculate  hazard  ratio within each covariate


Example, one covariate has 3 classes (1,2 and 3) and x2 has 2 classes

I want to compare the relative risk ratio within each class of the covariate.
 How do I get this result ? .


The other question is that how do I interpret  the second column in
the second panel  (i.e., exp(-coef))

I used the model
coxfit1 - coxph(Surv(sdat$time, sdat$cens)~ y1+x2)

   coef   exp(coef)  se(coef)   z  Pr(|z|)
y1-0.024084  0.976204  0.003077 -7.828 5.00e-15 ***
x2 0.036161  1.036822  0.083921  0.431   0.6665

 exp(coef)  exp(-coef)   lower .95upper .95
 y1  0.9762 1.0244   0.9703 0.9821
x2  1.0368  0.9645   0.8796 1.


Thanks in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hazard ratio

2009-12-10 Thread Ashta
David,

Thank you very much for your response.

I fitted the model as  factor instead of numeric.

coxfit1 - coxph(Surv(sdat$time, sdat$cens)~factor(y1)+factor(x2)
   coef exp(coef)  se(coef)  z
   Pr(|z|)
factor(y1)2  0.036161  1.036822  0.083921  0.431   0.6665
factor(y1)3  -0.510124  0.600421  0.088901 -5.738  9.57e-09 ***
factor(x2)2  -0.510124  0.600421  0.088901 -5.738  9.57e-09 ***



What are those values?   Is it comparing in reference to the first
class of each covariate?

Thanks again.






On Thu, Dec 10, 2009 at 8:33 AM, Ashta sewa...@gmail.com wrote:
 Hi all,

 I want to calculate  hazard  ratio within each covariate


 Example, one covariate has 3 classes (1,2 and 3) and x2 has 2 classes

 I want to compare the relative risk ratio within each class of the covariate.
  How do I get this result ? .


 The other question is that how do I interpret  the second column in
 the second panel  (i.e., exp(-coef))

 I used the model
 coxfit1 - coxph(Surv(sdat$time, sdat$cens)~ y1+x2)

           coef       exp(coef)  se(coef)       z          Pr(|z|)
 y1    -0.024084  0.976204  0.003077 -7.828 5.00e-15 ***
 x2     0.036161  1.036822  0.083921  0.431   0.6665

         exp(coef)  exp(-coef)   lower .95    upper .95
  y1      0.9762     1.0244       0.9703         0.9821
 x2      1.0368      0.9645       0.8796         1.


 Thanks in advance


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] PH Model assumption

2009-12-10 Thread Ashta
Hi all,

I was trying to test the assumption of proportional hazards
assumption, I used the cox.zph function

cox.zph(coxfit6)

Results are:
   rhochisqp
x1  -0.03961.397   2.37e-01
x2   0.11079.715   1.83e-03
x3  -0.08857.7435.39e-03
x4   0.03661.0922.96e-01
x5   0.0242 0.4555.00e-01
GLOBAL NA 30.9529.57e-06


Are all these covariates fulfilled the assumption of proportional hazards?

Thanks again.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] stepAIC function

2009-12-04 Thread Ashta
Hi All,

I am trying to run the following script  but have problem,

coxm- coxph(Surv(sdat$time, sdat$cens)~hd+nawtg+nwwg+ntpg+cy+nseas,data=sdat)
coxm-stepAIC(coxm,~.^2)

The error message is
Error: could not find function stepAIC

I tried to install the package  but I could not find it. Where can i get it?


The other question is that I want to get the Kaplan-Meier Estimate
for each covariate in the model,
Like
covaraite   n Events   Mean, S.E.(mean) ,Median, 95% LCL, 95% UCL
   0 14  10   2.87   .03  2.2
1.938 infi
   1 11  9  1.06  .67  1.1
 0.29   2.48

I used

sdat.fit0 - survfit(Surv(sdat$time, sdat$cens)~sdat$ntpg, data = sdat,
type = kaplan-meier, conf.type=plain)
sdat.fit0

Instead I got the following,


Call: survfit(formula = Surv(sdat$time, sdat$cens) ~ sdat$ntpg, data = sdat,
type = kaplan-meier, conf.type = plain)

   records  n.max   n.start  events   median
0.95LCL   0.95UCL
sdat$ntpg=03576  35763576311  NA  NA  NA
sdat$ntpg=14851  48514851466  NA  NA  NA


I would appreciate if some one can help me.

thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] look up and Missing

2009-11-08 Thread Ashta
HI  R-Users

Assume that I have a data frame 'temp' with several variables (v1,v2,v3,v4,v5.).

  v1 v2 v3  v4 v5
   1  2   3   36
   5  2  420
   2 -9   5   43
   6  2   1   34

1, I want to look at the entire row values of when v2 =-9
   like
 2 -9   5   43

I wrote
K- list(if(temp$v2)==-9))

I wrote the like this but  it gave me  which is not correct.
   False false false false false

2. I want assign that values  as missing if   v2 = -9.  (ie., I want
exclude from the analysis

How do I do it  in R?

Thanks in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Frequency

2009-11-02 Thread Ashta
Thank you Jorge and

 res - table(unlist(x))
 res[order(res, decreasing = TRUE)]
 # 10  4  6  3  5  7  9 18
 #  3  2  2  1  1  1  1  1

This one works fine for me.  Is it possible to transpose it?
I tried  t(res[order(res, decreasing = TRUE)]), but it did not work!

I want the result like this
10  2
 4   2
 6   2
 3   1
  .  .
  .  .




On Mon, Nov 2, 2009 at 1:45 PM, Jorge Ivan Velez
jorgeivanve...@gmail.com wrote:
 Hi Val,

 Here is a suggestion:

 res - table(unlist(x))
 res[order(res, decreasing = TRUE)]
 # 10  4  6  3  5  7  9 18
 #  3  2  2  1  1  1  1  1

 HTH,
 Jorge


 On Mon, Nov 2, 2009 at 1:35 PM, Val  wrote:

 BAYESIAN INFERENCES FOR MILKING TEMPERAMENT IN CANADIAN HOLSTEINS

 Hi All,

 I have a data  set x  with several variables. Sample of the data is shown
 below

  V1  v2  v3   v4

   5    6    9   10

  3    4    7   10

  4    6   10   18



 I want the frequency  of each  data point sorted by their occurrence.



 Below is the output that I want

 10    =3

 6=2

 4=2

 9=1

 5=1

 7=1

 3=1

 How do I do it in R?



 Thanks in advance



 Val

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Wavelets

2009-10-23 Thread Ashta
Hi all,

I am trying to do wavelets and I got  an error message saying The
length of data is not a power of 2
Is there a way of handing  that? or should the data length be  exactly
the power of  2?
I am using   R version 2.9.2 (2009-08-24)
The is  library(wavethresh).

wds - wd(ds$v,filter.number=1)

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Inserting rows

2009-10-23 Thread Ashta
Hi all,

I have the data set  df with three varaibles,

x1 x2 x3
1  2   5
2  4   1
5  6   0
1  1   2

I want to insert more rows ( eg, 3 rows with value  filled with zeros)
1  2   5
2  4   1
5  6   6
1  1   2
0  0  0
0  0  0
0  0  0

Can any body help me out?

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Counting

2009-10-20 Thread Ashta
Hi All,

Assume that I have the following data set  with two variables and I
want count the number of observation with identical values  and number
of time each factor changed from x1 to x2.

x1  x2
 1    1
 1    0
 0    1
 0    1
 0    0
 1    1
 0    1

The output should be
x1  changed
  0   3    # has changed 3 times
  1   1    # has changed 1 time
x1 unchanged
      0  1    # has unchanged only 1 time
  1  2     # has unchanged 2 times

Can someone help me how to do it in R?

Thanks in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting

2009-10-20 Thread Ashta
Hi Bill and all,


On Tue, Oct 20, 2009 at 12:09 PM, William Dunlap wdun...@tibco.com wrote:
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers
 Sent: Tuesday, October 20, 2009 8:48 AM
 To: Ashta
 Cc: R help
 Subject: Re: [R] Counting

 How about

   unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum)
   chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum)

   -Peter Ehlers

 When I hear 'count' I think first of the table() function.
 E.g.,
    d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1))
    with(d, table(x1, x1==x2))

   x1  FALSE TRUE
     0     3    1
     1     1    2
 or
    with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged

   x1  Changed Unchanged
     0       3         1
     1       1         2
 or use dimnames- to change the labels on the table itself.

 This works very well for  numeric.
 How about if the factors are character such  as F and M  (male and female) ?





 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 Ashta wrote:
  Hi All,
 
  Assume that I have the following data set  with two variables and I
  want count the number of observation with identical values
 and number
  of time each factor changed from x1 to x2.
 
  x1  x2
   1    1
   1    0
   0    1
   0    1
   0    0
   1    1
   0    1
 
  The output should be
  x1  changed
                        0   3    # has changed 3 times
                        1   1    # has changed 1 time
  x1 unchanged
                        0  1    # has unchanged only 1 time
                        1  2     # has unchanged 2 times
 
  Can someone help me how to do it in R?
 
  Thanks in advance
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Spline

2009-10-19 Thread Ashta
Hi All,

I am using R version 2.9.2 (2009-08-24) window version
and I wanted to use the

 library(spline)
Error in library(spline) : there is no package called 'spline'

I tried to install packages as well and it is not there either.

Am I missing something there. Where can I get this library?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Survival and nonparametric

2009-10-14 Thread Ashta
Hi all,

Has any body  the exprience  to iclude a nonparametric component  into the
survival analysis using  R
package? *Can someone recommend *me * some ** references? *

Thanks a lot
Ashta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Counting

2009-10-13 Thread Ashta
*Hi all,
*

*Assume that I have the following data set  with tow variables and I want
count the number of observation with identical values
*

**

*x1 x2*

* 1   1 *

* 1   0 *

* 0   1*

* 0   1*

* 0   0*

* 1   1*

* 0   1
*


I want the  following output
**

*
*

*n1=3  # number of identical observation between x1 and x2 variables*

*n2=4  # number of different observation*


How do I do it in R?


Thanks a lot




**

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Random number

2009-10-11 Thread Ashta
Hi All,
I have the matrix called  'X' with 200 rows and 12 variables.   I want to
create 2 new variables (V1 and V2) based on random number generator

p1-rnorm(200. mean=0, std=1)
p2-rnorm(200. mean=0, std=1)
x - cbind(x, v1=ifelse(x[,'p1']  0.4, 1, 0), v2=ifelse(x[,'p2']  0.6, 0,
1))

I found the following error message
*Error: unexpected symbol in p1-rnorm(200. mean

Any help?

*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tabulation

2009-10-10 Thread Ashta
Hi all,

I have a data set
x1  x2 x3
 1   2   1
 1   2   3
 2   1   2
 1   2   1
 3   1   1

 I want to tabulate in the following way.
1   2   3
 x13   2   1
 x22   3   0
 x33   1   1

It is just like frequency distribution


Any help is highly appreciated

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating new variables

2009-10-10 Thread Ashta
Hi all,

I have a data set called x with200 rows  and 12  columns.  I want
create  two more columns based on  probability. ie
 if p 0 .4 then  v1 =1 else v1=0;
 if p 0 .6 then  v2 =1 else v2=0;

Finally x will have 14 variables.

Can any one show me how to do that?

Thanks
Ashta


.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating new variables

2009-10-10 Thread Ashta
Thanks. This helps. How do I generate P?
Will this work?

p1-pnorm(mean=0, std=1)
p2-pnorm(mean=0, std=1)

x - cbind(x, v1=ifelse(x[,'p']  0.4, 1, 0), v2=ifelse(x[,'2']  0.6, 0,
1))






If the 'data set' is a dataframe, the following will work:

x$v1 - ifelse(x$p  0.4, 1, 0)
x$v2 - ifelse(x$p  0.6, 1, 0)

If it is matrix, try

x - cbind(x, v1=ifelse(x[,'p']  0.4, 1, 0), v2=ifelse(x[,'p']  0.6, 1,
2))


On Sat, Oct 10, 2009 at 6:32 PM, jim holtman jholt...@gmail.com wrote:

 If the 'data set' is a dataframe, the following will work:

 x$v1 - ifelse(x$p  0.4, 1, 0)
 x$v2 - ifelse(x$p  0.6, 1, 0)

 If it is matrix, try

 x - cbind(x, v1=ifelse(x[,'p']  0.4, 1, 0), v2=ifelse(x[,'p']  0.6, 1,
 2))

 If helps a lot if you follow the posting rules and  provide commented,
 minimal, self-contained, reproducible code.

 On Sat, Oct 10, 2009 at 6:04 PM, Ashta sewa...@gmail.com wrote:
  Hi all,
 
  I have a data set called x with200 rows  and 12  columns.  I want
  create  two more columns based on  probability. ie
   if p 0 .4 then  v1 =1 else v1=0;
   if p 0 .6 then  v2 =1 else v2=0;
 
  Finally x will have 14 variables.
 
  Can any one show me how to do that?
 
  Thanks
  Ashta
 
 
  .
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] row selection

2009-10-09 Thread Ashta
 Hi all,

Thank you for your help. Now I am able to select every 5th row of the data
from the main data set (x)
using

sub1- x[seq(1, nrow(x), by=5), ]


So sub1 contains one fith of the data set  X.  I want also create another
data set that will contain the remaining  data set from X (ie., four fifth
of the data set).

Any help is highly appreciated.








I have a matrix  named x with N by  C
I want to select every 5 th rrow from matrix x I used the following
code
 n- nrow(x)
 for(i in 1: n){
 + b - a[i+5,]
 b
 }



sc  x[seq(1, nrow(x), by=5), ]


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of David Winsemius
 Sent: Thursday, October 08, 2009 4:19 PM
 To: Ashta
 Cc: R help
 Subject: Re: [R] row selection


 On Oct 8, 2009, at 4:14 PM, Ashta wrote:

  Hi all,
  I have a matrix  named x with N by  C
  I want to select every 5 th rrow from matrix x I used the following
  code
  n- nrow(x)
  for(i in 1: n){
  + b - a[i+5,]
  b
  }
  Error: subscript out of bounds

 What did you expect when i in your loop counter became one greater
 than the number of rows?

 


 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 ===

 P Please consider the environment before printing this e-mail

 Cleveland Clinic is ranked one of the top hospitals
 in America by U.S. News  World Report (2008).
 Visit us online at http://www.clevelandclinic.org for
 a complete listing of our services, staff and
 locations.


 Confidentiality Note:  This message is intended for use\...{{dropped:13}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] row selection

2009-10-08 Thread Ashta
Hi all,
I have a matrix  named x with N by  C
I want to select every 5 th rrow from matrix x
I used the following code
 n- nrow(x)
 for(i in 1: n){
+ b - a[i+5,]
b
}
Error: subscript out of bounds

Can any body point out the problem?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plot

2009-10-06 Thread Ashta
Hi All,


Days - matrix(c(Monday, Tuesday, Wed, Thu, Fri, Sat,
Sun),7,1)

Hum -matrix(c(56,57,60,75,62,67,70),

Temp-matrix(c(76,77,81,95,82,77,83),



Using the above information I want plot humidity and temperature on Y-axis
and days on X-axis

Any help is appreciated!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot

2009-10-06 Thread Ashta
Thanks Sara,

Yes I did try. I could not get the Days on the X-axis

blow is theerror message

plot(Temp,Days)
Error in plot.window(...) : need finite 'ylim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf




On Tue, Oct 6, 2009 at 10:19 AM, Sarah Goslee sarah.gos...@gmail.comwrote:

 Did you try it? With, perhaps, plot() ? And lines() ?

 You might do better with Days as a factor with the day names in order. (And
 why are two full and five abbreviated?)

 I don't understand why Hum and Temp are matrices rather than vectors,
 and why then you didn't specify dimensions, and for that matter why you
 are missing a closing paren but do have a comma in its place.

 Generally this list is happy to help, but we like some evidence that the
 querent has *tried* before inquiring.

 Sarah

 On Tue, Oct 6, 2009 at 10:05 AM, Ashta sewa...@gmail.com wrote:
  Hi All,
 
 
  Days - matrix(c(Monday, Tuesday, Wed, Thu, Fri, Sat,
  Sun),7,1)
 
  Hum -matrix(c(56,57,60,75,62,67,70),
 
  Temp-matrix(c(76,77,81,95,82,77,83),
 
 
 
  Using the above information I want plot humidity and temperature on
 Y-axis
  and days on X-axis
 
  Any help is appreciated!
 

 --
 Sarah Goslee
 http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Legend

2009-10-02 Thread Ashta
I have more than three lines in one  and I want to add a legend  for each
line

abline( m1, col = 'red' )
ablime( m2, col = 'blue' )
abline( m3, col = 'purple' )

How can I add a legend? . Is it also possible to increase the thickness of
the lines?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Color of graph

2009-10-01 Thread Ashta
I am trying to plot a line graph for 3 or more regression lines

abline(m1)
abline(m2)
abline(m3)

Can I change the color of each line? if so how?

Thanks in advance
Ashta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summary

2009-09-29 Thread Ashta
My data is called  xc and has more than 15 variables.


When I used summary(xc)   it gave me the detail description of each
variable.



Summary(xc)



  Y1x1  x2
x3 ..

 Min. :0.   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000

 1st Qu. :0.   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 2.000

 Median :1.   Median : 1.000   Median : 1.000   Median : 3.000

 Mean:0.6505   Mean   : 2.816   Mean   : 3.542   Mean   : 3.433

 3rd Qu. :1.   3rd Qu.: 4.000   3rd Qu.: 6.000   3rd Qu.: 5.000

 Max. :1.   Max.   :10.000   Max.   :10.000   Max.   :10.000



But I want the output in the following way.



   Y1x1 x2x3 ..

 Min. :0.1.0001.0001.000

 1st Qu. :0.1.0001.0002.000

 Median :1.   1.0001.0003.000

 Mean:0.6505   2.8163.5423.433

 3rd Qu. :1.   4.000 6.000   5.000

 Max. :1.   10.000  10.000  :10.000


Is it possible to do it in R?


Thanks in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Binomial

2009-09-25 Thread Ashta
Dear R-users,

Suppose I have the following sample of data,

 0   1   2  4  3
 1   2   1  3  1
 1   3   3  4  1
 0   1   2  1  2
 1   4   1  4  2
  1   2   2  1  1

The first variable is the response variable where 0 is defective and 1
normal. The other four factors( x1,x2,x3,x4) that influence the outcome. I
want to fit a binomial model . How do I do that? I am guessing the response
variable should be transformed  but not sure which family of transformation
to use.
It is easy to do it  in SAS but I just want to learn using R

Any help is highly appreciated

Ashta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SEa nd CI

2009-09-25 Thread Ashta
How can I get the the  standard error and confidence interval for the
prediction in a multiple regression model using the R command?

for a simple regression I used

*predict(xc, newdata=data.frame(var1=10.),se=T)
where xc is the glm model using binomial and var1 is teh variable.
 *
I can get the upper and lower intervals of the prediction

Any help is welcome

.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Modelling

2009-09-24 Thread Ashta
Dear R-users,

Suppose I have the followin g sample of data,

 0   1   2  4  3
 1   2   1  3  1
 1   3   3  4  1
 0   1   2  1  2
 1   4   1  4  2
  1   2   2  1  1

The first variable is the response variable where 0 is defective and 1
normal. The other four factors( x1,x2,x3,x4) that influence the outcome.

I want to fit a binomial model in  R . I want also to rder the factors based
on their degree of  influence the outcome.  How do I do this  in R.

thanks in advance

Ashta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading data

2009-09-23 Thread Ashta
Dear R-users,

 I am a new user for R. I am eager to lean about it.



I wanted to read and  summary of the  a simple data file



I used the following,





rel - read.table(C:/Documents and Settings/ashta/My
Documents/R_data/rel.dat, quote=,header=FALSE,sep=,col.names=

c(id,orel,nrel))

summary(rel)





Below is the error message,



rel - read.table(C:/Documents and Settings/ashta/My
Documents/R_data/rel.dat, quote=,header=FALSE,sep=,col.names=

+ c(id,orel,nrel))

Error in file(file, r) : cannot open the connection

In addition: Warning message:

In file(file, r) :

  cannot open file 'file=C:/Documents and Settings/sewalem/My
Documents/R_data/rel.dat': Invalid argument

 summary(rel)

Error in summary(rel) : object 'rel' not found



Does it need a library? Where can I get the library?



Any help is highly appreciated



Ashta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.