[R] csv file with two header rows

2013-04-26 Thread analys...@hotmail.com
Is there a way to use read.csv() on such a file without deleting one
of the header rows?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] C50 package in R

2013-04-26 Thread Indrajit Sen Gupta
Hi All,



I am trying to use the C50 package to build classification trees in R. 
Unfortunately there is not enought documentation around its use. Can anyone 
explain to me - how to prune the decision trees?



Regards,

Indrajit


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with dataEllipse function

2013-04-26 Thread Jana Makedonska
Hi Everyone,

I am working with the R function dataEllipse. I plot the 95% confidence
ellipses for several different samples in the same plot and I color-code
the ellipse of each sample, but I do not know how to specify a different
line pattern for each ellipse. I can only modify the pattern for all
ellipses with the lty argument. Any help will be highly appreciated.

Thanks in advance!
Jana

-- 


Jana Makedonska,
B.Sc. Biology, Universite Paul Sabatier Toulouse III
M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier II
Ph.D. candidate in Physical Anthropology and Part-time lecturer
Department of Anthropology
College of Arts  Sciences
State University of New York at Albany
1400 Washington Avenue
1 Albany, NY
Office phone: 518-442-4699
http://electricsongs.academia.edu/JanaMakedonska

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nls: example code throws error

2013-04-26 Thread Steven LeBlanc
Greets,

I'm trying to learn to use nls and was running the example code for an 
exponential model:

 x - -(1:100)/10
 y - 100 + 10 * exp(x / 2) + rnorm(x)/10
 nlmod - nls(y ~  Const + A * exp(B * x))
Error in B * x : non-numeric argument to binary operator
In addition: Warning message:
In nls(y ~ Const + A * exp(B * x)) :
  No starting values specified for some parameters.
Initializing ‘Const’ to '1.'.
Consider specifying 'start' or using a selfStart model

Presumably, the code should work if it is part of an example on the help page. 
In perusing various help forums for similar problems, it also appears that 
others believe this syntax should work in the model formula.

Any ideas?

Perhaps also, a pointer to a comprehensive and correct document that details 
model formulae syntax if someone has one?

Thanks  Best Regards,
Steven



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls: example code throws error

2013-04-26 Thread Gabor Grothendieck
On Thu, Apr 25, 2013 at 7:16 PM, Steven LeBlanc ores...@gmail.com wrote:
 Greets,

 I'm trying to learn to use nls and was running the example code for an 
 exponential model:

 x - -(1:100)/10
 y - 100 + 10 * exp(x / 2) + rnorm(x)/10
 nlmod - nls(y ~  Const + A * exp(B * x))
 Error in B * x : non-numeric argument to binary operator
 In addition: Warning message:
 In nls(y ~ Const + A * exp(B * x)) :
   No starting values specified for some parameters.
 Initializing ‘Const’ to '1.'.
 Consider specifying 'start' or using a selfStart model

 Presumably, the code should work if it is part of an example on the help 
 page. In perusing various help forums for similar problems, it also appears 
 that others believe this syntax should work in the model formula.

 Any ideas?


Try running in a clean session.  Having B - X in your workspace
would cause such an error.

--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split number into array

2013-04-26 Thread arun
Hi,Not sure about the criteria for deciding number of zeros.
 vec1- c(23,244,1343,45,153555,546899,75)
 lst1- strsplit(as.character(vec1),)
m1-max(sapply(lst1,length))
res- t(sapply(lst1,function(x) as.numeric(c(rep(0,m1-length(x)),x
 res
 #    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    0    0    0    0    0    0    0    0    2 3
#[2,]    0    0    0    0    0    0    0    2    4 4
#[3,]    0    0    0    0    0    0    1    3    4 3
#[4,]    0    0    0    0    0    0    0    0    4 5
#[5,]    0    0    0    0    1    5    3    5    5 5
#[6,]    5    4    6    8    9    9    9    9    9 9
#[7,]    0    0    0    0    0    0    0    0    7 5
A.K.



hi, I'm a new R user. 
I have a for cycle which generates number from 0 to N, and i wanna put this 
number into an array: 

i.e.  number 23 into an array of int: a(0,0,0,0,2,3) 

can you help me? 
Federico

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Distance matrices Combinations

2013-04-26 Thread arun
Hi,
Do you want this?

el- matrix(1:100,ncol=20)
 set.seed(25)
 el1- matrix(sample(1:100,20,replace=TRUE),ncol=1)
indx-sort(el1,index.return=TRUE)$ix[1:3]

 list(el[,indx],sort(el1)[1:3])
#[[1]]
 #    [,1] [,2] [,3]
#[1,]   41   21   11
#[2,]   42   22   12
#[3,]   43   23   13
#[4,]   44   24   14
#[5,]   45   25   15
#
#[[2]]
#[1]  7 13 15
A.K.




From: eliza botto eliza_bo...@hotmail.com
To: smartpink...@yahoo.com smartpink...@yahoo.com 
Sent: Thursday, April 25, 2013 4:45 PM
Subject: RE: [R] Distance matrices Combinations




dear arun,
I will see through it thoroughly if you give some 10 mins. Meanwhile can you 
please tell me that how we can change the following of your codes so that in 
el1 we could see the values not the indexes??

thanks,
Elisa

el1-matrix(o,ncol=1)
indx-sort(el1,index.return=F)$ix[1:3]
list(el[,indx],indx)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading data from a text file conditionally skipping lines

2013-04-26 Thread arun
Hi,
It would be better to give an example.
If your dataset is like the one attached:
con-file(Trial1.txt)
 Lines1- readLines(con)
 close(con)
#If the data you wanted to extract is numeric and the header and footer are 
characters,
dat1-read.table(text=Lines1[-grep([A-Za-z],Lines1)],sep=\t,header=FALSE)
dat1
#   V1 V2 V3 V4 V5
#1  38 43 39 44 45
#2  39 44 36 49 46
#3  42 45 47 49 37
#4  34 43 39 45 45
#5  38 42 39 44 47
#6  43 44 46 42 37
#7  32 49 38 42 45
#8  34 45 35 49 46
#9  44 45 46 49 37
#10 34 43 39 48 49
#11 38 42 39 47 47
#12 43 44 46 42 37
#13 37 43 39 44 45
#14 39 42 36 49 46
#15 42 45 47 49 37

#or
You mentioned that the data is repeated every so many lines.  Here also, 
there is repeating pattern.     

head(Lines1,10)
 #[1] Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam 
nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat #volutpat. 
    
 #[2] Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper 
suscipit #lobortis 
  
# [3] 38\t43\t39\t44\t45  

 
 #[4] 39\t44\t36\t49\t46  

 
 #[5] 42\t45\t47\t49\t37  

 
 #[6] Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse 
molestie #consequat.   
  
 #[7] Vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et 
iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis 
#dolore te feugait nulla facilisi.
 #[8] 34\t43\t39\t45\t45  

 
 #[9] 38\t42\t39\t44\t47  

 
#[10] 43\t44\t46\t42\t37  



dat2-read.table(text=Lines1[rep(rep(c(FALSE,TRUE),times=c(2,3)),5)],sep=\t,header=FALSE)
 identical(dat1,dat2)
#[1] TRUE

A.K.





I have a text file that is nicely formatted (tab separated). However, it has 
some header and footer information after every so many lines.  I do not want 
to read this information in my dataframe.  What is the best 
way to read this data into R.  Thanks for all the help! Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh 
euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. 
Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit 
lobortis
38  43  39  44  45
39  44  36  49  46
42  45  47  49  37
Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie 
consequat.
Vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto 
odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te 
feugait nulla facilisi.
34  43  39  45  45
38  42  39  44  47
43  44  46  42  37
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh 
euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. 
Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit 
lobortis
32  49  38  42  45
34  45  35  49  46
44  45  46  49  37
Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie 
consequat.
Vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto 
odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te 
feugait nulla facilisi.
34  43  39  48  49
38  42  39  47  47
43  44  46  42  37
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh 
euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. 
Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit 
lobortis
37  43  39  44  45
39  42  36  49  46
42  45  47  49  37
Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie 
consequat.
Vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto 
odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te 
feugait nulla facilisi.   

Re: [R] Scheirer-Ray-Hare

2013-04-26 Thread nguyenkinh
You can take a look at this, in Vietnamese but you can Gtranslate it
http://www.ytecongcong.com/2013/04/scheirer-ray-hare-test-kiem-dinh-phi-tham-so-two-way-anova/



--
View this message in context: 
http://r.789695.n4.nabble.com/Scheirer-Ray-Hare-tp3818476p4665439.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Looping through names of both dataframes and column-names

2013-04-26 Thread Daniel Egan
Hello all,

This seems like a pretty standard question - suppose I want to loop through
a set of similar data-frames, with similar variables, and create new
variables within them:

nl-seq(1,5)for (i in nl) {
  assign(paste0(df_,nl[i]),data.frame(x=seq(1:10),y=rnorm(10)))}
ls()[grep(df_,ls())]
nls-ls()[grep(df_,ls())]for (df in nls) {
  print(df)
  for (var in names(get(df))) {
print(var)
assign(paste0(df,$,paste0(var,_cs)),cumsum(get(df)[[var]]))
  }}
ls()[grep(df_,ls())]

The code above *almost* works, except that it creates a whole bunch of
objects of the form df_1$x_cs,df_1$yx_cs . What I want is 5 dataframes,
with the $ elements enclosed, as usual.

Any help or guidance would be appreciated.

Much thanks,
Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error installing boss package

2013-04-26 Thread Pramod Anugu
I am trying to install the package boss but i am getting error below:
Please advice

 

install.packages(boss)

--- Please select a CRAN mirror for use in this session ---

CRAN mirror

 

1: 0-Cloud   2: Argentina (La Plata)

3: Argentina (Mendoza)   4: Australia (Canberra)

5: Australia (Melbourne) 6: Austria

7: Belgium   8: Brazil (PR)

9: Brazil (RJ)  10: Brazil (SP 1)

11: Brazil (SP 2)12: Canada (BC)

13: Canada (NS)  14: Canada (ON)

15: Canada (QC 1)16: Canada (QC 2)

17: Chile18: China (Beijing 1)

19: China (Beijing 2)20: China (Guangzhou)

21: China (Hefei)22: China (Xiamen)

23: Colombia (Bogota)24: Colombia (Cali)

25: Denmark  26: Ecuador

27: France (Lyon 1)  28: France (Lyon 2)

29: France (Montpellier) 30: France (Paris 1)

31: France (Paris 2) 32: Germany (Berlin)

33: Germany (Bonn)   34: Germany (Falkenstein)

35: Germany (Goettingen) 36: Greece

37: Hungary  38: India

39: Indonesia40: Iran

41: Ireland  42: Italy (Milano)

43: Italy (Padua)44: Italy (Palermo)

45: Japan (Hyogo)46: Japan (Tsukuba)

47: Japan (Tokyo)48: Korea (Seoul 1)

49: Korea (Seoul 2)  50: Latvia

51: Mexico (Mexico City) 52: Mexico (Texcoco)

53: Netherlands (Amsterdam)  54: Netherlands (Utrecht)

55: New Zealand  56: Norway

57: Philippines  58: Poland

59: Portugal 60: Russia

61: Singapore62: Slovakia

63: South Africa (Cape Town) 64: South Africa (Johannesburg)

65: Spain (Madrid)   66: Sweden

67: Switzerland  68: Taiwan (Taichung)

69: Taiwan (Taipei)  70: Thailand

71: Turkey   72: UK (Bristol)

73: UK (London)  74: UK (St Andrews)

75: USA (CA 1)   76: USA (CA 2)

77: USA (IA) 78: USA (IN)

79: USA (KS) 80: USA (MD)

81: USA (MI) 82: USA (MO)

83: USA (OH) 84: USA (OR)

85: USA (PA 1)   86: USA (PA 2)

87: USA (TN) 88: USA (TX 1)

89: USA (WA 1)   90: USA (WA 2)

91: Venezuela92: Vietnam

 

 

Selection: 86

also installing the dependency 'ncdf'

 

trying URL 'http://cran.mirrors.hoobly.com/src/contrib/ncdf_1.6.6.tar.gz'

Content type 'application/x-gzip' length 79403 bytes (77 Kb)

opened URL

==

downloaded 77 Kb

 

trying URL 'http://cran.mirrors.hoobly.com/src/contrib/boss_1.2.tar.gz'

Content type 'application/x-gzip' length 9702 bytes

opened URL

==

downloaded 9702 bytes

 

* installing *source* package 'ncdf' ...

** package 'ncdf' successfully unpacked and MD5 sums checked

checking for nc-config... no

checking for gcc... gcc -std=gnu99

checking whether the C compiler works... yes

checking for C compiler default output file name... a.out

checking for suffix of executables...

checking whether we are cross compiling... no

checking for suffix of object files... o

checking whether we are using the GNU C compiler... yes

checking whether gcc -std=gnu99 accepts -g... yes

checking for gcc -std=gnu99 option to accept ISO C89... none needed

checking how to run the C preprocessor... gcc -std=gnu99 -E

checking for grep that handles long lines and -e... /bin/grep

checking for egrep... /bin/grep -E

checking for ANSI C header files... yes

checking for sys/types.h... yes

checking for sys/stat.h... yes

checking for stdlib.h... yes

checking for string.h... yes

checking for memory.h... yes

checking for strings.h... yes

checking for inttypes.h... yes

checking for stdint.h... yes

checking for unistd.h... yes

checking netcdf.h usability... no

checking netcdf.h presence... no

checking for netcdf.h... no

configure: error: netcdf header netcdf.h not found

ERROR: configuration failed for package 'ncdf'

* removing '/share/apps/R-2.15.3/lib64/R/library/ncdf'

ERROR: dependency 'ncdf' is not available for package 'boss'

* removing '/share/apps/R-2.15.3/lib64/R/library/boss'

 

The downloaded source packages are in

'/tmp/RtmppOWF74/downloaded_packages'

Updating HTML index of packages in '.Library'

Making packages.html  ... done

Warning messages:

1: In install.packages(boss) :

  installation of package 'ncdf' had non-zero exit status

2: In install.packages(boss) :

  installation of package 'boss' had non-zero exit status

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list

Re: [R] Selecting and then joining data blocks

2013-04-26 Thread arun
In addition,
If your matrix names do not follow any particular pattern:
tiger- matrix(1:20,ncol=5)
cat- matrix(21:40,ncol=5)
 dog- matrix(41:60,ncol=5)
 wolf- matrix(61:80,ncol=5)
vec- c(1,2,4,3,2,3,1)
vec2- c(tiger,cat,dog,wolf)
#Suppose, you wanted the order to be tiger, cat, dog, wolf
 vec2- factor(vec2,levels=vec2) 
 vec2
#[1] tiger cat   dog   wolf 
#Levels: tiger cat dog wolf
 res3-do.call(rbind,lapply(vec,function(i) 
get(as.character(vec2[as.numeric(vec2)==i]
res3
 #     [,1] [,2] [,3] [,4] [,5]
 #[1,]    1    5    9   13   17
 #[2,]    2    6   10   14   18
 #[3,]    3    7   11   15   19
 #[4,]    4    8   12   16   20
 #[5,]   21   25   29   33   37
 #[6,]   22   26   30   34   38
 #[7,]   23   27   31   35   39
 #[8,]   24   28   32   36   40
 #[9,]   61   65   69   73   77
#[10,]   62   66   70   74   78
#[11,]   63   67   71   75   79
#[12,]   64   68   72   76   80
#[13,]   41   45   49   53   57
#[14,]   42   46   50   54   58
#[15,]   43   47   51   55   59
#[16,]   44   48   52   56   60
#[17,]   21   25   29   33   37
#[18,]   22   26   30   34   38
#[19,]   23   27   31   35   39
#[20,]   24   28   32   36   40
#[21,]   41   45   49   53   57
#[22,]   42   46   50   54   58
#[23,]   43   47   51   55   59
#[24,]   44   48   52   56   60
#[25,]    1    5    9   13   17
#[26,]    2    6   10   14   18
#[27,]    3    7   11   15   19
#[28,]    4    8   12   16   20

A.K.





- Original Message -
From: arun smartpink...@yahoo.com
To: Preetam Pal lordpree...@gmail.com
Cc: 
Sent: Thursday, April 25, 2013 9:03 AM
Subject: Re: [R] Selecting and then joining data blocks

HI Preetam,

I created the matrices in a list because it was easier to create.  If you look 
at the second solution:


B1- lst1[[1]]
 B2- lst1[[2]]
 B3- lst1[[3]]
 B4- lst1[[4]]

Consider that B1, B2, B3, B4 are your actual matrices and apply the solution 
below:

 paste0(B,vec) #gives the names of the matrices
#[1] B1 B2 B4 B3 B2 B3 B1
using get(), will get the matrices stored in that names.

 res2-do.call(rbind,lapply(vec,function(i) get(paste0(B,i


If the names of the matrices are different, you need to change it accordingly.  
I programmed it based on the information you gave.

I hope this helps.
Arun



From: Preetam Pal lordpree...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Thursday, April 25, 2013 8:53 AM
Subject: Re: [R] Selecting and then joining data blocks



Hi Arun,



Thanks for your solution. But there is only 1  thing which i could not 
understand:

In my case, the 4 matirces(B1,B2,B3,B4) were already specified and i have to 
work with these only...how do I accommodate that (instead of letting R produce 
the big matrix by random sampling)..This might be very trivial, but i am a 
starter with R...I shall really appreciate if you could advise me on this.

Again thanks,
Preetam



On Thu, Apr 25, 2013 at 5:44 PM, arun smartpink...@yahoo.com wrote:

HI,
set.seed(24)
#creating the four matrix in a list

lst1-lapply(1:4,function(x) matrix(sample(1:40,20,replace=TRUE),ncol=5))
names(lst1)- paste0(B,1:4)
vec- c(1,2,4,3,2,3,1)
res-do.call(rbind,lapply(vec,function(i) lst1[[i]]))
dim(res)
#[1] 28  5


#or
B1- lst1[[1]]
 B2- lst1[[2]]
 B3- lst1[[3]]
 B4- lst1[[4]]

 res2-do.call(rbind,lapply(vec,function(i) get(paste0(B,i
 identical(res,res2)
#[1] TRUE
A.K.





- Original Message -
From: Preetam Pal lordpree...@gmail.com
To: r-help@r-project.org
Cc:
Sent: Thursday, April 25, 2013 7:51 AM
Subject: [R] Selecting and then joining data blocks

Hi all,

I have 4 matrices, each having  5 columns and 4 rows .denoted by
B1,B2,B3,B4.
I have generated a vector of 7 indices, say (1,2,4,3,2,3,1} which refers to
the index of the matrices to be chosen and then appended one on the top of
the next: like, in this case, I wish to have the following mega matrix:
B1over B2 over B4 over B3 over B2 over B3 over B1.

1 How can I achieve this?
2 I don't want to manually identify and arrange the matrices for each
vector of index values generated   (for which the code  I used is :
index=sample( 4,7,replace=T)). How can I automate the process?

Basically, I am doing bootstrapping , but the observations are actually 4X5
matrices.

Appreciate your help.


Thanks,
Preetam


---

Preetam Pal
(+91)-9432212774
M-Stat 2nd Year,                                             Room No. N-114
Statistics Division,                                           C.V.Raman
Hall
Indian Statistical Institute,                                 B.H.O.S.
Kolkata.

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




-- 
Preetam Pal                                                   (+91)-9432212774
M-Stat 2nd Year,                                     

Re: [R] connecting matrices

2013-04-26 Thread arun
Dear Elisa,
Try this:
el- matrix(1:100,ncol=20)
 set.seed(25)
 el1- matrix(sample(1:100,20,replace=TRUE),ncol=1)

In the example you showed, there were no column names.  

 list(el[,sort(el1)[1:3]],sort(el1,index.return=TRUE)$ix[1:3])
#[[1]]
 #    [,1] [,2] [,3]
#[1,]   31   61   71
#[2,]   32   62   72
#[3,]   33   63   73
#[4,]   34   64   74
#[5,]   35   65   75
#
#[[2]]
#[1] 9 5 3
A.K.






From: eliza botto eliza_bo...@hotmail.com
To: smartpink...@yahoo.com smartpink...@yahoo.com 
Sent: Thursday, April 25, 2013 9:54 AM
Subject: connecting matrices




Dear Arun,

[text file contains the exact format]
Although the last codes were absolutely correct and worked the way i want them 
to. I have an additional cover-up question. 
Suppose i have a matrix el... here i show you only some part of that 
matrix so that codes can work faster.

el
     [,595586] [,595587] [,595588] [,595589] [,595590] [,595591] [,595592] 
[,595593] [,595594] [,595595] [,595596] [,595597] [,595598] [,595599] [,595600] 
[,595601]
[1,]        55        55        55        55        55        55        55      
  55        55        55        56        56        56        56        56      
  56
[2,]        59        59        59        59        59        59        60      
  60        60        61        57        57        57        57        57      
  57
[3,]        60        60        60        61        61        62        61      
  61        62        62        58        58        58        58        58      
  59
[4,]        61        62        63        62        63        63        62      
  63        63        63        59        60        61        62        63      
  60
     [,595602] [,595603] [,595604] [,595605] [,595606] [,595607] [,595608] 
[,595609] [,595610] [,595611] [,595612] [,595613] [,595614] [,595615] [,595616] 
[,595617]
[1,]        56        56        56        56        56        56        56      
  56        56        56        56        56        56        56        56      
  56
[2,]        57        57        57        57        57        57        57      
  57        57        58        58        58        58        58        58      
  58
[3,]        59        59        59        60        60        60        61      
  61        62        59        59        59        59        60        60      
  60
[4,]        61        62        63        61        62        63        62      
  63        63        60        61        62        63        61        62      
  63


In connection to this matrix, there is another matrix which contains 
coordination values for each of the column of matrix el

el1

[595586,]   5.67   
[595587,]   55.90   
[595588,]   515   
[595589,]   755   
[595590,]   955   
[595591,]   5.95   
[595592,]   575   
[595593,]   505   
[595594,]   505   
[595595,]   515   
[595596,]   5612   
[595597,]   506   
[595598,]   576   
[595599,]   5126   
[595600,]   5216   
[595601,]   5666   
[595602,]   526   
[595603,]   5.6   
[595604,]   156   
[595605,]   4556   
[595606,]   5556   
[595607,]   1256   
[595608,]   1256   
[595609,]   8756   
[595610,]   5906   
[595611,]   789   
[595612,]   5006   
[595613,]   1256   
[595614,]   3356   
[595615,]   7756   
[595616,]   4456   
[595617,]   3356   

What i want in the end is a list of two elemens containing the 10 column of 
el which have the lowest values in matrix el1.

More precisely
[[1]]
[,595603][,595586][595591,]
56
575959
596062
626163

[[2]]
5.65.675.95

is it possible to carry out such operation??

thanks for your help

Elisa     

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] connecting matrices

2013-04-26 Thread arun
HI Elisa,
I guess there is a mistake.
Check whether this is what you wanted.

indx-sort(el1,index.return=TRUE)$ix[1:3]
list(el[,indx],indx)
#[[1]]
 #    [,1] [,2] [,3]
#[1,]   41   21   11
#[2,]   42   22   12
#[3,]   43   23   13
#[4,]   44   24   14
#[5,]   45   25   15
#
#[[2]]
#[1] 9 5 3
A.K.



- Original Message -
From: arun smartpink...@yahoo.com
To: eliza botto eliza_bo...@hotmail.com
Cc: R help r-help@r-project.org
Sent: Thursday, April 25, 2013 10:09 AM
Subject: Re: connecting matrices

Dear Elisa,
Try this:
el- matrix(1:100,ncol=20)
 set.seed(25)
 el1- matrix(sample(1:100,20,replace=TRUE),ncol=1)

In the example you showed, there were no column names.  

 list(el[,sort(el1)[1:3]],sort(el1,index.return=TRUE)$ix[1:3])
#[[1]]
 #    [,1] [,2] [,3]
#[1,]   31   61   71
#[2,]   32   62   72
#[3,]   33   63   73
#[4,]   34   64   74
#[5,]   35   65   75
#
#[[2]]
#[1] 9 5 3
A.K.






From: eliza botto eliza_bo...@hotmail.com
To: smartpink...@yahoo.com smartpink...@yahoo.com 
Sent: Thursday, April 25, 2013 9:54 AM
Subject: connecting matrices




Dear Arun,

[text file contains the exact format]
Although the last codes were absolutely correct and worked the way i want them 
to. I have an additional cover-up question. 
Suppose i have a matrix el... here i show you only some part of that 
matrix so that codes can work faster.

el
     [,595586] [,595587] [,595588] [,595589] [,595590] [,595591] [,595592] 
[,595593] [,595594] [,595595] [,595596] [,595597] [,595598] [,595599] [,595600] 
[,595601]
[1,]        55        55        55        55        55        55        55      
  55        55        55        56        56        56        56        56      
  56
[2,]        59        59        59        59        59        59        60      
  60        60        61        57        57        57        57        57      
  57
[3,]        60        60        60        61        61        62        61      
  61        62        62        58        58        58        58        58      
  59
[4,]        61        62        63        62        63        63        62      
  63        63        63        59        60        61        62        63      
  60
     [,595602] [,595603] [,595604] [,595605] [,595606] [,595607] [,595608] 
[,595609] [,595610] [,595611] [,595612] [,595613] [,595614] [,595615] [,595616] 
[,595617]
[1,]        56        56        56        56        56        56        56      
  56        56        56        56        56        56        56        56      
  56
[2,]        57        57        57        57        57        57        57      
  57        57        58        58        58        58        58        58      
  58
[3,]        59        59        59        60        60        60        61      
  61        62        59        59        59        59        60        60      
  60
[4,]        61        62        63        61        62        63        62      
  63        63        60        61        62        63        61        62      
  63


In connection to this matrix, there is another matrix which contains 
coordination values for each of the column of matrix el

el1

[595586,]   5.67   
[595587,]   55.90   
[595588,]   515   
[595589,]   755   
[595590,]   955   
[595591,]   5.95   
[595592,]   575   
[595593,]   505   
[595594,]   505   
[595595,]   515   
[595596,]   5612   
[595597,]   506   
[595598,]   576   
[595599,]   5126   
[595600,]   5216   
[595601,]   5666   
[595602,]   526   
[595603,]   5.6   
[595604,]   156   
[595605,]   4556   
[595606,]   5556   
[595607,]   1256   
[595608,]   1256   
[595609,]   8756   
[595610,]   5906   
[595611,]   789   
[595612,]   5006   
[595613,]   1256   
[595614,]   3356   
[595615,]   7756   
[595616,]   4456   
[595617,]   3356   

What i want in the end is a list of two elemens containing the 10 column of 
el which have the lowest values in matrix el1.

More precisely
[[1]]
[,595603][,595586][595591,]
56
575959
596062
626163

[[2]]
5.65.675.95

is it possible to carry out such operation??

thanks for your help

Elisa     

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Vectorized code for generating the Kac (Clement) matrix

2013-04-26 Thread Ravi Varadhan
Hi,
I am generating large Kac matrices (also known as Clement matrix).  This a 
tridiagonal matrix.  I was wondering whether there is a vectorized solution 
that avoids the `for' loops to the following code:

n - 1000

Kacmat - matrix(0, n+1, n+1)

for (i in 1:n) Kacmat[i, i+1] - n - i + 1

for (i in 2:(n+1)) Kacmat[i, i-1] - i-1

The above code is fast, but I am curious about vectorized ways to do this.

Thanks in advance.
Best,
Ravi

Ravi Varadhan, Ph.D.
Assistant Professor
The Center on Aging and Health
Division of Geriatric Medicine  Gerontology
Johns Hopkins University
rvarad...@jhmi.edumailto:rvarad...@jhmi.edu
410-502-2619


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Make R 3.0 open .RData files

2013-04-26 Thread Indrajit Sen Gupta
Another thing that you can try is changing the Path. Make sure the PATH 
environment variable has the path to R 3.0 before R 2.15.3 in the string.



Regards,

Indrajit





On Thu, 25 Apr 2013 22:10:52 +0530  wrote

a) See FAQ 2.17







b) Methods for configuring operating systems are off topic here. I will say 
there is a REGEDIT program in Windows, but there are potential permissions 
complications (you may not have them) and possible collateral damage (don't 
touch it if you don't understand it) that mean you should study up on this 
topic with an appropriate resource (book, forum, expert, system administrator, 
etc.) before attempting it.



---



Jeff NewmillerThe   .. Go Live...



DCN:Basics: ##.#.##.#. Live Go...



   Live:  OO#.. Dead: OO#.. Playing



Research Engineer (Solar/Batteries  O.O#.#.O#. with



/Software/Embedded Controllers).OO#..OO#. rocks...1k



--- 



Sent from my phone. Please excuse my brevity.







Dimitri Liakhovitski  wrote:







Brian, how do I remove the relevant old Registry entries?



Thank you!



Dimitri











On Thu, Apr 25, 2013 at 10:29 AM, Prof Brian Ripley



wrote:







 On 25/04/2013 14:00, Duncan Murdoch wrote:







 On 13-04-25 8:33 AM, Dimitri Liakhovitski wrote:







 Hello!







 I have Windows 7 Enterprise and two versions of R installed: 2.15.3



and



 3.0.0.



 Before I had R 3.0 I made it a setting that all .RData files - when



I



 double-click on them - were opened by R 2.15.3.



 Now I want them to be opened by R 3.0 instead of R 2.15.3 (but I



don't



 want



 to remove R 2.15.3. yet).







 I right-click on some .RData file, select Open with - Choose



default



 program and then click on Browse.







 I browse to the folder where my R 3.0 is installed, then to the



folder



 bin, then to the folder x64 and select Rgui.exe.



 However, when R opens - or after I shut R down and then



double-click on



 some .RData file and R opens, it is again R 2.15.3, not R3.0.







 What am I doing wrong?







 Of course, when I open R 3.0 directly, then it opens no problem.











 This is really a question about Windows 7, not about R, but I would



 guess you aren't telling it to make your choice permanent, or



perhaps



 you are not allowed by your administrator to make permanent changes



to



 file associations. You should ask for local help.











 We've encountered this for our student accounts, and think it is a



bug in



 Windows 7. If you remove the relevant old Registry entries first it



should



 work.











 --



 Brian D. Ripley, rip...@stats.ox.ac.uk



 Professor of Applied Statistics, 



http://www.stats.ox.ac.uk/~**ripley/



 University of Oxford,   Tel: +44 1865 272861 (self)



 1 South Parks Road,   +44 1865 272866 (PA)



 Oxford OX1 3TG, UKFax: +44 1865 272595











__



R-help@r-project.org mailing list



https://stat.ethz.ch/mailman/listinfo/r-help



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html



and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] time series plot: x-axis problem

2013-04-26 Thread Jerry
Hi,

I'm trying to plot a simple time series. I'm running into an issue with
x-axis

The codes below will produce a plot with correct x-axis showing from Jan to
Dec

 rr=c(3,2,4,5,4,5,3,3,6,2,4,2)

 (rr=ts(rr,start=c(2012,1),frequency=12))

 win.graph(width=6.5, height=2.5,pointsize=8)

 plot(rr, xlab=2012, ylab=event freq, xaxt = n, col=blue)

 axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)),
cex.axis = .9, tcl = -.5, las = 2)


However, if I change the start point from Jan 2012 to May 2012, which is

 (rr=ts(rr,start=c(2012,5),frequency=12))


Then run the codes below

 plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue)

 axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)),
cex.axis = .9, tcl = -.5, las = 2)


In the the new plot produced, x-axis is still showing from Jan to Dec, not
from May to April as I desired.

How to fix x-axis? Is it possible to fix it WITHOUT modifying the object
rr? Also, ideally, I would like to have each time point on x-axis showing
month/year, not just month. How to do that?

Any help and input will be much appreciated!

Thanks
Jerry

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can a column of a list be called?

2013-04-26 Thread Jana Makedonska
Hello Everyone,

I would like to know if I can call one of the columns of a list, to use it
as a variable in a function.

Thanks in advance for any advice!

Jana

-- 

Jana Makedonska,
B.Sc. Biology, Universite Paul Sabatier Toulouse III
M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier II
Ph.D. candidate in Physical Anthropology and Part-time lecturer
Department of Anthropology
College of Arts  Sciences
State University of New York at Albany
1400 Washington Avenue
1 Albany, NY
Office phone: 518-442-4699
http://electricsongs.academia.edu/JanaMakedonska
http://www.youtube.com/watch?v=OHbT9VvtonMhttp://www.youtube.com/watch?v=jRoMoLjzpf4list=PL5BF6ACDCC2E4AAA0index=7

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls: example code throws error

2013-04-26 Thread Duncan Mackay

Hi

Try

x - -(1:100)/10
set.seed(1)
y - 100 + 10 * exp(x / 2) + rnorm(x)/10

short cut to starting values
lm(log(y) ~-log(x+10))
Call:
lm(formula = log(y) ~ -log(x + 10))

Coefficients:
(Intercept)
  4.624

nlmod - nls(y ~  A + B * exp(C * x), start=list(A=90, B=5,C=0.1))

Formula: y ~ A + B * exp(C * x)

Parameters:
Estimate Std. Error t value Pr(|t|)
A 100.009079   0.017797  5619.4   2e-16
B   9.93   0.042718   234.1   2e-16
C   0.499529   0.004495   111.1   2e-16

Residual standard error: 0.09073 on 97 degrees of freedom

Number of iterations to convergence: 5
Achieved convergence tolerance: 0.0002475

I will leave you to plot the results as a check

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au

At 09:16 26/04/2013, you wrote:

Content-Type: text/plain
Content-Disposition: inline
Content-length: 899

Greets,

I'm trying to learn to use nls and was running the example code for 
an exponential model:


 x - -(1:100)/10
 y - 100 + 10 * exp(x / 2) + rnorm(x)/10
 nlmod - nls(y ~  Const + A * exp(B * x))
Error in B * x : non-numeric argument to binary operator
In addition: Warning message:
In nls(y ~ Const + A * exp(B * x)) :
  No starting values specified for some parameters.
Initializing 'Const' to '1.'.
Consider specifying 'start' or using a selfStart model

Presumably, the code should work if it is part of an example on the 
help page. In perusing various help forums for similar problems, it 
also appears that others believe this syntax should work in the model formula.


Any ideas?

Perhaps also, a pointer to a comprehensive and correct document that 
details model formulae syntax if someone has one?


Thanks  Best Regards,
Steven



[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping through names of both dataframes and column-names

2013-04-26 Thread Blaser Nello
Here are two possible ways to do it:

This would simplify your code a bit. But it changes the names of x_cs to
cs.x. 
for (df in nls) {
  assign(df, cbind(get(df), cs=apply(get(df), 2, cumsum)))
  }

This is closer to what you have done. 
for (df in nls) {
  print(df)
  for (var in names(get(df))) {
print(var)
assign(df, within(get(df), assign(paste0(var,_cs),
cumsum(get(df)[[var]]
  }}
ls()[grep(df_,ls())]

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Daniel Egan
Sent: Donnerstag, 25. April 2013 22:19
To: r-help@r-project.org
Subject: [R] Looping through names of both dataframes and column-names

Hello all,

This seems like a pretty standard question - suppose I want to loop
through a set of similar data-frames, with similar variables, and create
new variables within them:

nl-seq(1,5)for (i in nl) {
  assign(paste0(df_,nl[i]),data.frame(x=seq(1:10),y=rnorm(10)))}
ls()[grep(df_,ls())]
nls-ls()[grep(df_,ls())]for (df in nls) {
  print(df)
  for (var in names(get(df))) {
print(var)
assign(paste0(df,$,paste0(var,_cs)),cumsum(get(df)[[var]]))
  }}
ls()[grep(df_,ls())]

The code above *almost* works, except that it creates a whole bunch of
objects of the form df_1$x_cs,df_1$yx_cs . What I want is 5
dataframes, with the $ elements enclosed, as usual.

Any help or guidance would be appreciated.

Much thanks,
Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transferring R to another computer, R_HOME_DIR

2013-04-26 Thread Prof Brian Ripley

This is really an R-devel topic: it is not about using R.

R is usually (but not always) built so that everything except Rscript is 
relocatable by editing the 'R' script (and R_HOME and R_HOME_DIR are 
ignored in the environment, intentionally).


So you could edit the script, but not having Rscript working is a 
limitation.


Having said that, not all packages play by the same rules and e.g. some 
use -rpath to hardcode paths in package DSOs.



On 26/04/2013 06:13, lcn wrote:

Well, to my understanding, you planned to rsync the original compiled
folder from one machine to somewhere on another machine, and work with it.
Then how about create a file link on the second machine for /usr/lib64/R?
Or maybe I misunderstand your purpose?


If you have write permission there, you could install the R RPM.




On Thu, Apr 25, 2013 at 5:57 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote:


Hello,

I was looking at the R (installed on RHEL6) shell script and saw
R_HOME_DIR=/usr/lib64/R. Nowhere (and I could have got it wrong) does
it read in the environment value R_HOME_DIR. I have the need to rsync
the entire folder below /usr/lib64/R to another computer into another
directory location. Without changing the R shell script, how can i
force it read in R_HOME_DIR?

Or maybe i misunderstood the bash source?

(Note, i cannot recompile on target machine)

Cheers
Saptarshi

1. I also realize Rscript will not work (i think path is hard coded in the
source)


No, compiled it when it is compiled.



Beginning of /usr/lib64/R/bin/R

R_HOME_DIR=/usr/lib64/R
if test ${R_HOME_DIR} = /usr/lib64/R; then
case linux-gnu in
linux*)
  run_arch=`uname -m`
  case $run_arch in
 x86_64|mips64|ppc64|powerpc64|sparc64|s390x)
   libnn=lib64
   libnn_fallback=lib
 ;;
 *)
   libnn=lib
   libnn_fallback=lib64
 ;;
  esac
  if [ -x /usr/${libnn}/R/bin/exec/R ]; then
 R_HOME_DIR=/usr/lib64/R
  elif [ -x /usr/${libnn_fallback}/R/bin/exec/R ]; then
 R_HOME_DIR=/usr/lib64/R
  ## else -- leave alone (might be a sub-arch)
  fi
  ;;
   esac
fi

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum up column values according to row id

2013-04-26 Thread Matteo Mura
Thank you very much Doct. Carlson!!! The function you suggest me wors
perfectely!!!

Thanks a lot again,

Best whishes sincerely

Mt M


2013/4/24 David Carlson dcarl...@tamu.edu

 Something like this?

 mean6 - function(x) {
 if (length(x)  6) {
   mn - mean(x)
   } else {
 mn - mean(x[1:6])
 }
 return(mn)
 }

 aggregate(g~id, ipso, mean6)

 -
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77840-4352





 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Matteo Mura
 Sent: Wednesday, April 24, 2013 7:57 AM
 To: r-help@r-project.org
 Subject: [R] Sum up column values according to row id

 Dear All,

 here a problem I think many of you can solve in few minutes.

 I have a dataframe which contains values of plot id, diameters, heigths and
 basal area of trees, thus columns names are: id | dbh | h | g

 head(ipso, n=10)id dbh h  g
 1  FPE0164  36 13.62 0.10178760
 2  FPE0164  31 12.70 0.07547676
 21 FPE1127  57 18.85 0.25517586
 13 FPE1127  39 15.54 0.11945906
 12 FPE1127  34 14.78 0.09079203
 6  FPE1127  32 15.12 0.08042477
 5  FPE1127  28 14.13 0.06157522
 15 FPE1127  27 13.50 0.05725553
 19 FPE1127  25 13.28 0.04908739
 11 FPE1127  19 11.54 0.02835287

 from here I need to calculate the mean of the six greater g_ith for each
 id_ith. The clauses are that:

 if length(id) =6

 do the mean of the first six greaters g


 else
 do the mean of all the g_ith in the id_ith (in head print above e.g.
 for the id==FPE0164 do the mean of just these two values of g).

 The g are already ordered by id ascending and g descending using:

 ipso - ipso[with(ipso, order(ipso$id, -ipso$g)), ] # Order for id
 ascending
 and g descending

 I tried a lot of for loops and tapply() without results.

 Can anyone help me to solve this?

 Thanks for your attention

 Best whishes

 Matteo

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Remove reciprocal data from a grouped animal social contact dataset

2013-04-26 Thread Cat Cowie
Hi r-help forum,

I have been collecting contact data (with proximity logger collars)
between a few different species of animal. All animals wear the
collars, and any contact between the animals should be detected and
recorded by both collars. However, this isn't always the case and more
contacts may be recorded on one collar of the two. This is fine, it
depends on battery life and other things that I will have to discuss!
I now have each contact recorded as a 'group in 4 columns':

 head(data)
  recordstart duration pair
1  1 27/05/2012 04:40 4948 CO1 CO12
2  2 31/05/2012 04:48  278 CO1 CO12
3  3 31/05/2012 05:303 CO1 CO12
4  4 31/05/2012 05:51  159 CO1 CO12
5  5 31/05/2012 05:56   47 CO1 CO12
6  6 31/05/2012 06:02  107 CO1 CO12


The first column shows the record number, the second shows the start
date and time of the contact, the third shows the contact duration and
the fourth shows the pair of animals involved in the contact. In this
case the top 6 contacts are all between animals CO1 and CO12. There
are nearly 100,000 records. There were many animals that could have
contacted each other:

 animals
   animals
1  CO1
2  CO2
3  CO3
4  CO4
5  CO5
6  CO6
7  CO7
8  CO8
9  CO9
10CO10
11CO11
12CO12
13CO13
14CO14
15CO15
16CO16
17CO17
18 PO1
19 PO2
20 PO3
21 PO4
22 PO5
23 PO6
24 PO7
25 PO8
26 PO9
27PO10
28PO11
29PO12
30PO13
31 PI1
32 PI2
33 PI3
34 PI4
35 PI5
36 PI6
37 PI7
38 PI8
39 RD1
40 RD2
41 WB1
42 WB2

Because both collars may have recorded the single contact, I need to
remove the reciprocal contacts from this dataset. For example, you may
have records for CO1 CO2 that are mirrored by records for CO2 CO1.
If there are the same number of records it doesn't matter which of
these you select, as long as only one set is used for further
analysis. Where there is an unequal number of contacts recorded on the
two collars between a pair, I would like to select the records which
have the most contacts. So, if there were 10 records recorded for CO1
CO2 and 15 for CO2 CO1 I would like to reject the first 10 contacts
and retain the 15. There are some cases where only one version of the
group is recorded, e.g. just CO1 CO3, with no reciprocal CO3 CO1.
In this case I would like to retain the data that I have.

I would normally like to present you with my attempts so far but as a
relatively new (but enthusiastic!) R user I am struggling to know
where to start.

I present more data here... sadly dput(head(data, 200) is printing all
the dates (all nearly 100,000 of them, regardless of using head()!) so
I hope this is ok for now:

 head(data,300)
recordstart duration pair
11 27/05/2012 04:40 4948 CO1 CO12
22 31/05/2012 04:48  278 CO1 CO12
33 31/05/2012 05:303 CO1 CO12
44 31/05/2012 05:51  159 CO1 CO12
55 31/05/2012 05:56   47 CO1 CO12
66 31/05/2012 06:02  107 CO1 CO12
77 31/05/2012 06:08   86 CO1 CO12
88 31/05/2012 06:11  194 CO1 CO12
99 31/05/2012 06:20   87 CO1 CO12
10  10 31/05/2012 06:24   12 CO1 CO12
11  11 31/05/2012 06:32   11 CO1 CO12
12  12 31/05/2012 06:40  227 CO1 CO12
13  13 31/05/2012 06:47  115 CO1 CO12
14  14 12/04/2011 13:39  109 CO1 CO15
15  15 12/04/2011 22:293 CO1 CO15
16  16 12/04/2011 22:45   44 CO1 CO15
17  17 12/04/2011 23:20   55 CO1 CO15
18  18 13/04/2011 02:50   58 CO1 CO15
19  19 13/04/2011 03:15   11 CO1 CO15
20  20 13/04/2011 05:38   65 CO1 CO15
21  21 13/04/2011 08:55  122 CO1 CO15
22  22 13/04/2011 11:064 CO1 CO15
23  23 13/04/2011 13:47   53 CO1 CO15
24  24 13/04/2011 13:57   32 CO1 CO15
25  25 13/04/2011 14:32   16 CO1 CO15
26  26 13/04/2011 14:414 CO1 CO15
27  27 13/04/2011 21:53   33 CO1 CO15
28  28 14/04/2011 01:00   41 CO1 CO15
29  29 14/04/2011 01:075 CO1 CO15
30  30 14/04/2011 01:462 CO1 CO15
31  31 14/04/2011 06:433 CO1 CO15
32  32 14/04/2011 08:443 CO1 CO15
33  33 14/04/2011 08:51   64 CO1 CO15
34  34 14/04/2011 13:596 CO1 CO15
35  35 14/04/2011 14:11   11 CO1 CO15
36  36 14/04/2011 14:36  169 CO1 CO15
37  37 14/04/2011 14:42   19 CO1 CO15
38  38 14/04/2011 15:04   48 CO1 CO15
39  39 14/04/2011 15:102 CO1 CO15
40  40 14/04/2011 17:41   58 CO1 CO15
41  41 14/04/2011 18:333 CO1 CO15
42  42 15/04/2011 16:26   50 CO1 CO15
43  43 15/04/2011 20:123 CO1 CO15
44  44 16/04/2011 23:042 CO1 CO15
45  45 17/04/2011 02:577 CO1 CO15
46  46 17/04/2011 03:08   32 CO1 CO15
47  47 17/04/2011 

Re: [R] time series plot: x-axis problem

2013-04-26 Thread Rui Barradas

Hello,

Try the following.

(rr=ts(rr,start=c(2012,5),frequency=12))

plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue)

labs - format(as.Date(time(rr)), %b-%Y)

axis(1, time(rr), labs, cex.axis = .9, tcl = -.5, las = 2)


Hope this helps,

Rui Barradas

Em 25-04-2013 19:11, Jerry escreveu:

Hi,

I'm trying to plot a simple time series. I'm running into an issue with
x-axis

The codes below will produce a plot with correct x-axis showing from Jan to
Dec


rr=c(3,2,4,5,4,5,3,3,6,2,4,2)



(rr=ts(rr,start=c(2012,1),frequency=12))



win.graph(width=6.5, height=2.5,pointsize=8)



plot(rr, xlab=2012, ylab=event freq, xaxt = n, col=blue)



axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)),

cex.axis = .9, tcl = -.5, las = 2)


However, if I change the start point from Jan 2012 to May 2012, which is


(rr=ts(rr,start=c(2012,5),frequency=12))



Then run the codes below


plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue)



axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)),

cex.axis = .9, tcl = -.5, las = 2)


In the the new plot produced, x-axis is still showing from Jan to Dec, not
from May to April as I desired.

How to fix x-axis? Is it possible to fix it WITHOUT modifying the object
rr? Also, ideally, I would like to have each time point on x-axis showing
month/year, not just month. How to do that?

Any help and input will be much appreciated!

Thanks
Jerry

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sample size in box plot labels

2013-04-26 Thread Shane Carey
Hi,

I would like to put the sample number beside each lable in a boxplot.
How do I do this? Essentially, I need to count the sample size for each
factor, see below:
Thanks

boxplot(DATA$K_Merge~factor(DATA$UnitName_1),axes=FALSE,col=colours)
title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb))
axis(1, 1:21, labels=FALSE, las=2)
text(seq(1, 21, by=1), par(usr)[3], labels =
levels(factor(DATA$UnitName_1)), srt = 45,  adj = c(1.03,1.03), xpd = TRUE,
cex=1.8)
axis(2, seq(-1,5, 1), seq(-1, 5, 1))


-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error installing boss package

2013-04-26 Thread Jim Lemon

On 04/25/2013 11:42 PM, Pramod Anugu wrote:

I am trying to install the package boss but i am getting error below:
Please advice

...
checking netcdf.h usability... no

checking netcdf.h presence... no

checking for netcdf.h... no

configure: error: netcdf header netcdf.h not found

ERROR: configuration failed for package 'ncdf'
...


Hi Pramod,
I would suggest installing the netcdf packages:

yum install netcdf

or use whatever package management system you prefer.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample size in box plot labels

2013-04-26 Thread PIKAL Petr
Hi

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Shane Carey
 Sent: Friday, April 26, 2013 11:49 AM
 To: r-help@r-project.org
 Subject: [R] sample size in box plot labels
 
 Hi,
 
 I would like to put the sample number beside each lable in a boxplot.
 How do I do this? Essentially, I need to count the sample size for each
 factor, see below:
 Thanks
 
 boxplot(DATA$K_Merge~factor(DATA$UnitName_1),axes=FALSE,col=colours)
 title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb)) axis(1,
 1:21, labels=FALSE, las=2) text(seq(1, 21, by=1), par(usr)[3], labels
 = levels(factor(DATA$UnitName_1)), srt = 45,  adj = c(1.03,1.03), xpd =
 TRUE,
 cex=1.8)
 axis(2, seq(-1,5, 1), seq(-1, 5, 1))

Does not work without data.

Do you want something like this?

boxplot(Sepal.Length~Species, data=iris)
mtext(as.character(table(iris$Species)), 1, at=1:3)

Regards
Petr



 
 
 --
 Shane
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample size in box plot labels

2013-04-26 Thread Rui Barradas

Hello,

To count the sample sizes for each factor try

tapply(DATA$K_Merge, DATA$UnitName_1, FUN = length)


Hope this helps,

Rui Barradas

Em 26-04-2013 10:48, Shane Carey escreveu:

Hi,

I would like to put the sample number beside each lable in a boxplot.
How do I do this? Essentially, I need to count the sample size for each
factor, see below:
Thanks

boxplot(DATA$K_Merge~factor(DATA$UnitName_1),axes=FALSE,col=colours)
title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb))
axis(1, 1:21, labels=FALSE, las=2)
text(seq(1, 21, by=1), par(usr)[3], labels =
levels(factor(DATA$UnitName_1)), srt = 45,  adj = c(1.03,1.03), xpd = TRUE,
cex=1.8)
axis(2, seq(-1,5, 1), seq(-1, 5, 1))




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to make a raster image in R from my own data set

2013-04-26 Thread Jon Olav Skoien

Hi Kristi,

it takes a few extra steps to create a raster layer from your example 
data set, as it is not a gridded map in Lat lon (probably in some 
projection though). How exactly to do it depends on your data, but here 
are some hints:


1. If you actually need to read the data set from a link, then 
read.table.url is depreceted, just use read.table. You might need to 
call setInternet2(TRUE) first, as the example data is on an https-url.
2. Raster can read a range of inputs, but I am not sure if .csv is one 
of them, and definitely not if the data is not gridded. You can then 
first do interpolation with a Spatial*-object. Set the coordinates of 
your object, this creates a Spatial*-object, and add the projection:

coordinates(pts) = ~lon+lat
proj4string(pts) = CRS(+proj=longlat +datum=WGS84)
3. You will have to create your reference grid (spsample, from raster, 
or another existing grid you have available), and interpolate to this 
grid, using one of the many interpolation packages, such as geoR, 
automap, gstat, intamap.
4. The resulting object can easily be converted to raster through 
raster(interpolationResult[,resultname])


I hope this can help you getting started.
Generally you get quicker response to spatial questions from the 
r-sig-geo list.


Jon



On 24-Apr-13 16:56, Kristi Glover wrote:

Hi R-user,
I was trying to make a raster map with WGS84 projection in R, but I could not 
make it. I found one data set in Google that data is almost the same format as 
of mine. I wanted to make a raster map of temperature with 1 degree spatial 
resolution for the global scale.
I could make it in GIS software but I do have many variables (to be many raster 
images) and ultimately I am importing them to R for further analysis. 
Therefore, I wanted to make them in R, if possible.

It would be great if you give some hints on how script look like  in creating a 
raster map from my own data set (I have provided link for your references, this 
is an example data set).

I am really appropriating for your help.

#--
#create a raster map from scratch

install.packages(raster, dependencies=TRUE)
library(raster)  # raster data
install.packages(rgdal, dependencies=TRUE)
library(rgdal)  # input/output, projections
install.packages(rgeos, dependencies=TRUE)
library(rgeos)  # geometry ops
install.packages(spdep, dependencies=TRUE)
library(spdep)  # spatial dependence
install.packages(pastecs, dependencies=TRUE)
library(pastecs)
pts-read.table.url(https://www.betydb.org//miscanthusyield.csv;, header=T, 
sep=,)
proj4string(pts)=- CRS(+proj=longlat +datum=WGS84)
#---

Cheers,
Kristi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Jon Olav Skøien
Joint Research Centre - European Commission
Institute for Environment and Sustainability (IES)
Land Resource Management Unit

Via Fermi 2749, TP 440,  I-21027 Ispra (VA), ITALY

jon.sko...@jrc.ec.europa.eu
Tel:  +39 0332 789206

Disclaimer: Views expressed in this email are those of the individual and do 
not necessarily represent official views of the European Commission.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample size in box plot labels

2013-04-26 Thread Shane Carey
This works, great. Cheers


On Fri, Apr 26, 2013 at 12:02 PM, Rui Barradas ruipbarra...@sapo.pt wrote:

 Hello,

 To count the sample sizes for each factor try

 tapply(DATA$K_Merge, DATA$UnitName_1, FUN = length)


 Hope this helps,

 Rui Barradas

 Em 26-04-2013 10:48, Shane Carey escreveu:

  Hi,

 I would like to put the sample number beside each lable in a boxplot.
 How do I do this? Essentially, I need to count the sample size for each
 factor, see below:
 Thanks

 boxplot(DATA$K_Merge~factor(**DATA$UnitName_1),axes=FALSE,**col=colours)
 title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb))
 axis(1, 1:21, labels=FALSE, las=2)
 text(seq(1, 21, by=1), par(usr)[3], labels =
 levels(factor(DATA$UnitName_1)**), srt = 45,  adj = c(1.03,1.03), xpd =
 TRUE,
 cex=1.8)
 axis(2, seq(-1,5, 1), seq(-1, 5, 1))





-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vectorized code for generating the Kac (Clement) matrix

2013-04-26 Thread Berend Hasselman

On 25-04-2013, at 17:18, Ravi Varadhan ravi.varad...@jhu.edu wrote:

 Hi,
 I am generating large Kac matrices (also known as Clement matrix).  This a 
 tridiagonal matrix.  I was wondering whether there is a vectorized solution 
 that avoids the `for' loops to the following code:
 
 n - 1000
 
 Kacmat - matrix(0, n+1, n+1)
 
 for (i in 1:n) Kacmat[i, i+1] - n - i + 1
 
 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1
 
 The above code is fast, but I am curious about vectorized ways to do this.

You could vectorize like this

Kacmat - matrix(0, n+1, n+1)  
Kacmat[row(Kacmat)==col(Kacmat)-1] - n -(1:n) + 1
Kacmat[row(Kacmat)==col(Kacmat)+1] - 1:n

But this show that your version is pretty quick

f1 - function(n) {
Kacmat - matrix(0, n+1, n+1)
for (i in 1:n) Kacmat[i, i+1] - n - i + 1
for (i in 2:(n+1)) Kacmat[i, i-1] - i-1
Kacmat
}

f2 - function(n) {
Kacmat - matrix(0, n+1, n+1)  
Kacmat[row(Kacmat)==col(Kacmat)-1] - n -(1:n) + 1
Kacmat[row(Kacmat)==col(Kacmat)+1] -1:n
Kacmat
}

library(compiler)

f1.c - cmpfun(f1)
f2.c - cmpfun(f2)

n - 5000

system.time(K1 - f1(n))
system.time(K2 - f2(n))
system.time(K3 - f1.c(n))
system.time(K4 - f2.c(n))
identical(K2,K1)
identical(K3,K1)
identical(K4,K1)  

#  system.time(K1 - f1(n))
#user  system elapsed 
#   0.386   0.120   0.512 
#  system.time(K2 - f2(n))
#user  system elapsed 
#   3.779   1.141   4.940 
#  system.time(K3 - f1.c(n))
#user  system elapsed 
#   0.323   0.119   0.444 
#  system.time(K4 - f2.c(n))
#user  system elapsed 
#   3.607   0.852   4.472 
#  identical(K2,K1)
# [1] TRUE
#  identical(K3,K1)
# [1] TRUE
#  identical(K4,K1)
# [1] TRUE


Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls: example code throws error

2013-04-26 Thread Keith Jewell

On 26/04/2013 00:16, Steven LeBlanc wrote:
 Greets,

 I'm trying to learn to use nls and was running the example code for 
an exponential model:


  snip

 Perhaps also, a pointer to a comprehensive and correct document that 
details model formulae syntax if someone has one?


 Thanks  Best Regards,
 Steven

Others have pointed out that the error is probably from an unclean 
environment.


For model formula syntax, see ?nls
Under Arguments formula, follow the link to ?formula

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] labeling

2013-04-26 Thread Shane Carey
Hi,

I have a dataset as follows:
   Name
  N
Visean limestone  calcareous shale
2
Visean sandstone, mudstone  evaporite
 2
Westphalian shale, sandstone, siltstone  coal


How do I combine them so that I can label a plot with

Visean limestone  calcareous shale
   N=2

for example on two lines with  N=2 centered on the length of the Name
label?

Thanks
-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted Principle Components analysis

2013-04-26 Thread Dimitri Liakhovitski
The reason for  my asking is because I have to replicate the same analysis
done in SPSS and SAS.

Again, to make it clear - it's respondent-weighted Factor Analysis with a
desired number of factors. Method of extraction: Principal Components.
Rotation: Varimax.

The only solution I can think of is to multiply my respondent weight by 10
(or by 100) and round it so that the new weight has no decimals, then
repeat every row as many times as the new weight says and run regular,
unweighted principal on the new data. I've done it - but again, this does
not match the Factor Scores from SPSS and SAS exactly.

Any other ideas?
Thank you!


On Thu, Apr 25, 2013 at 9:21 AM, Dimitri Liakhovitski 
dimitri.liakhovit...@gmail.com wrote:

 Hello!

 I am doing Principle Componenets Analysis using psych package:

 mypc-principal(mydata,5,scores=TRUE)

 However, I was asked to run a case-weighted PCA - using an individual
 weight for each case.

 I could use corr from boot package to calculate the case-weighed
 intercorrelation matrix. But if I use the intercorrelation matrix as input
 (instead of the raw data), I am not going to get factor scores, which I do
 need to get.

 Any advice?
 Thank you very much!

 --
 Dimitri Liakhovitski




-- 
Dimitri Liakhovitski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble Computing Type III SS in a Cox Regression

2013-04-26 Thread Paul Miller
Sigh.

Message: 50
Date: Fri, 26 Apr 2013 10:13:52 +1200
From: Rolf Turner rolf.tur...@xtra.co.nz
To: Terry Therneau thern...@mayo.edu
Cc: r-help@r-project.org, Achim Zeileis achim.zeil...@uibk.ac.at
Subject: Re: [R] Trouble Computing Type III SS in a Cox Regression
Message-ID: 5179aaa0.8060...@xtra.co.nz
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 26/04/13 03:40, Terry Therneau wrote:

(In response to a question about computing type III sums of squares in a
Cox regression):

 SNIP

 If you have customers who think that the earth is flat, global warming 
 is a conspiracy, or that type III has special meaning this is a 
 re-education issue, and I can't much help with that.

Fortune nomination!

 cheers,

 Rolf



--- On Thu, 4/25/13, Terry Therneau thern...@mayo.edu wrote:

 From: Terry Therneau thern...@mayo.edu
 Subject: Re: Trouble Computing Type III SS in a Cox Regression
 To: Paul Miller pjmiller...@yahoo.com, r-help@R-project.org
 Received: Thursday, April 25, 2013, 10:40 AM
 You've missed the point of my earlier
 post, which is that type III is not an answerable
 question.
 
    1. There are lots of ways to compare Cox
 models, LRT is normally considered the most reliable by
 serious authors.  There is usually not much difference
 between score, Wald, and LRT tests though, and the other two
 are more convenient in many situations.
 
    2. Type III is a question that can't be
 addressed. SAS prints something out with that label, but
 since they don't document what it is, and people with
 in-depth knowlegde of Cox models (like me) cannot figure out
 what a sensible definition could actually be, there is
 nowhere to go.  How to do this in R can't be
 answered.  (It has nothing to do with interactions.)
 
   3. If you have customers who think that the earth is
 flat, global warming is a conspiracy, or that type III has
 special meaning this is a re-education issue, and I can't
 much help with that.
 
 Terry T.
 
 On 04/25/2013 07:59 AM, Paul Miller wrote
  Hi Dr. Therneau,
  
  Thanks for your reply to my question. I'm aware that
 many on the list do not like type III SS. I'm not
 particularly attached to the idea of using them but often
 produce output for others who see value in type III SS.
  
  You mention the problems with type III SS when testing
 interactions. I don't think we'll be doing that here though.
 So my type III SS could just as easily be called type II SS
 I think. If the SS I'm calculating are essentially type II
 SS, is that still problematic for a Cox model?
  
  People using type III SS generally want a measure of
 whether or not a variable is contributing something to their
 model or if it could just as easily be discarded. Is there a
 better way of addressing this question than by using type
 III (or perhaps type II) SS?
  
  A series of model comparisons using a LRT might be the
 answer. If it is, is there an efficient way of implementing
 this approach when there are many predictors? Another
 approach might be to run models through step or stepAIC in
 order to determine which predictors are useful and to
 discard the rest. Is that likely to be any good?
  
  Thanks,
  
  Paul


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with dataEllipse function

2013-04-26 Thread John Fox
Dear Jana,

The lty argument to dataEllipse() (in the car package) isn't vectorized. It 
could be, and I'll add that as a feature request. Actually, lty isn't an 
explicit argument to dataEllipse(); it's simply passed through to the lines() 
function, which draws the ellipses.

You should be able to do what you want by adding the ellipses one at a time to 
your plot (see the argument add in ?dataEllipse) or by using the coordinates of 
the ellipses, returned by dataEllipse(), to a customized graph.

I hope that this helps,
 John


John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

On Thu, 25 Apr 2013 20:00:20 -0400
 Jana Makedonska jmakedon...@gmail.com wrote:
 Hi Everyone,
 
 I am working with the R function dataEllipse. I plot the 95% confidence
 ellipses for several different samples in the same plot and I color-code
 the ellipse of each sample, but I do not know how to specify a different
 line pattern for each ellipse. I can only modify the pattern for all
 ellipses with the lty argument. Any help will be highly appreciated.
 
 Thanks in advance!
 Jana
 
 -- 
 
 
 Jana Makedonska,
 B.Sc. Biology, Universite Paul Sabatier Toulouse III
 M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier II
 Ph.D. candidate in Physical Anthropology and Part-time lecturer
 Department of Anthropology
 College of Arts  Sciences
 State University of New York at Albany
 1400 Washington Avenue
 1 Albany, NY
 Office phone: 518-442-4699
 http://electricsongs.academia.edu/JanaMakedonska
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] labeling

2013-04-26 Thread Jim Lemon

On 04/26/2013 10:15 PM, Shane Carey wrote:

Hi,

I have a dataset as follows:
Name
   N
Visean limestone  calcareous shale
2
Visean sandstone, mudstone  evaporite
  2
Westphalian shale, sandstone, siltstone  coal


How do I combine them so that I can label a plot with

Visean limestone  calcareous shale
N=2

for example on two lines with  N=2 centered on the length of the Name
label?



Hi Shane,
Look at the title function (graphics) and the main and sub arguments.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] csv file with two header rows

2013-04-26 Thread John Kane
I don't think so. read.csv is a striped down version of read.table.  You should 
be able to do this with the skip option there.

John Kane
Kingston ON Canada


 -Original Message-
 From: analys...@hotmail.com
 Sent: Thu, 25 Apr 2013 18:35:42 -0700 (PDT)
 To: r-help@r-project.org
 Subject: [R] csv file with two header rows
 
 Is there a way to use read.csv() on such a file without deleting one
 of the header rows?
 
 Thanks.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most 
webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can a column of a list be called?

2013-04-26 Thread Charles Determan Jr
If you are using the list as simply a collection of data frames a simple
example to accomplish what you are describing is this:

data(iris)
data(mtcars)
y=list(iris, mtcars)
#return Sepal.Length column from first data frame in list
#list[[number of list component]][number of column]
y[[1]][1]

Cheers,



On Thu, Apr 25, 2013 at 7:24 PM, Jana Makedonska jmakedon...@gmail.comwrote:

 Hello Everyone,

 I would like to know if I can call one of the columns of a list, to use it
 as a variable in a function.

 Thanks in advance for any advice!

 Jana

 --

 Jana Makedonska,
 B.Sc. Biology, Universite Paul Sabatier Toulouse III
 M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier
 II
 Ph.D. candidate in Physical Anthropology and Part-time lecturer
 Department of Anthropology
 College of Arts  Sciences
 State University of New York at Albany
 1400 Washington Avenue
 1 Albany, NY
 Office phone: 518-442-4699
 http://electricsongs.academia.edu/JanaMakedonska
 http://www.youtube.com/watch?v=OHbT9VvtonM
 http://www.youtube.com/watch?v=jRoMoLjzpf4list=PL5BF6ACDCC2E4AAA0index=7
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Charles Determan
Integrated Biosciences PhD Student
University of Minnesota

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to export graph value in R

2013-04-26 Thread Anup khanal
Dear exports,I have created a hypsometric curve (area-elevation curve) for my 
watershed by using simple command hypsometric(X,main=Hypsometric Curve,   
  xlab=Relative Area above Elevation, (a/A),ylab=Relative 
Elevation, (h/H), col=blue)It plots the hypsometric curve in RGraphics 
window, My question is how can I export values which is used to create this 
plot? I mean I want to know the value in y axis for certain x value.
Thanks in advance !

..Anup KhanalNorwegian Institute of science and Technology 
(NTNU)Trondheim, NorwayMob:(+47) 45174313
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Splitting data.frame and saving to csv files

2013-04-26 Thread Katherine Gobin
Dear R Forum,

I have a data.frame as

df = data.frame(date = c(2013-04-15, 2013-04-14, 2013-04-13, 
2013-04-12, 2013-04-11),
ABC_f = c(62.80739769,81.04525895,84.65712455,12.78237251,57.61345256),
LMN_d = c(21.16794336,54.6580401,63.8923307,87.59880367,87.07693716),
XYZ_p = c(55.8885464,94.1358684,84.0089114,98.99746696,64.71083712),
LMN_a = c(56.6768395,25.81530198,40.12268441,35.74175237,47.95892209),
ABC_e = c(11.36783959,62.29651784,47.63481552,32.27820673,52.12561419),
LMN_c = c(45.4484695,17.72362438,36.7690054,68.58912931,35.80767235), 
XYZ_zz = c(85.74755089,63.48582415,81.61107212,58.1572924,27.44132817),
PQR = c(71.22867519,95.09994812,83.62437819,30.18524735,25.81804865),
ABC_d =
 c(38.71089816,93.48216193,93.14432203,78.2738731,31.87170019),
ABC_m = c(40.28473769,43.97076327,47.38761559,97.33573412,22.06884976))


 df
    date    ABC_f    LMN_d    XYZ_p    LMN_a    ABC_e
1 2013-04-15 62.80740 21.16794 55.88855 56.67684 11.36784
2 2013-04-14 81.04526 54.65804 94.13587 25.81530 62.29652
3 2013-04-13 84.65712 63.89233 84.00891 40.12268 47.63482
4 2013-04-12 12.78237 87.59880 98.99747 35.74175 32.27821
5 2013-04-11 57.61345 87.07694 64.71084 47.95892 52.12561
 LMN_c   XYZ_zz  PQR    ABC_d    ABC_m
1 45.44847 85.74755 71.22868 38.71090 40.28474
2 17.72362 63.48582 95.09995 93.48216 43.97076
3 36.76901 81.61107 83.62438 93.14432 47.38762
4 68.58913 58.15729 30.18525 78.27387
 97.33573
5 35.80767 27.44133 25.81805 31.87170 22.06885

I need to identify columns with same labels and along-with the dates in the 
first column, save the columns in different csv files.

E.g. in the above data frame, I have 4 columns beginning with ABC so I need to 
save these four columns with the date in the first column as ABC.csv, then 
LMN_d, LMN_a, LMN_c in the LMN.csv file as date, LMN_a, LMN_c, LMN_d and so on. 
In my actual data.frame, I won't be aware how many such rates combinations are 
available. If there is no matching column as PQR, the PQR.csv file should 
have only date and PQR column. 

Kindly guide how do I split the data.frame and save the respective csv files.

Regards

Katherine











[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with dataEllipse function

2013-04-26 Thread Michael Friendly

On 4/25/2013 8:00 PM, Jana Makedonska wrote:

Hi Everyone,

I am working with the R function dataEllipse. I plot the 95% confidence
ellipses for several different samples in the same plot and I color-code
the ellipse of each sample, but I do not know how to specify a different
line pattern for each ellipse. I can only modify the pattern for all
ellipses with the lty argument. Any help will be highly appreciated.


lty is not an argument of car::dataEllipse, but is passed via ...
So, when you use the groups= argument, only a single value gets used.
You would have to modify the function to allow what you want.



--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.  Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample size in box plot labels

2013-04-26 Thread PIKAL Petr
Hi

actually it shall be the same result as

table(DATA$UnitName_1)

Both approaches does not work if there are NAs in your data.

tapply(DATA$K_Merge, DATA$UnitName_1, FUN = function(x) sum(!is.na(x)))

consideres also NA values.

Regards
Petr


---Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Shane Carey
 Sent: Friday, April 26, 2013 1:09 PM
 To: Rui Barradas
 Cc: r-help@r-project.org
 Subject: Re: [R] sample size in box plot labels
 
 This works, great. Cheers
 
 
 On Fri, Apr 26, 2013 at 12:02 PM, Rui Barradas ruipbarra...@sapo.pt
 wrote:
 
  Hello,
 
  To count the sample sizes for each factor try
 
  tapply(DATA$K_Merge, DATA$UnitName_1, FUN = length)
 
 
  Hope this helps,
 
  Rui Barradas
 
  Em 26-04-2013 10:48, Shane Carey escreveu:
 
   Hi,
 
  I would like to put the sample number beside each lable in a
 boxplot.
  How do I do this? Essentially, I need to count the sample size for
  each factor, see below:
  Thanks
 
 
 boxplot(DATA$K_Merge~factor(**DATA$UnitName_1),axes=FALSE,**col=colou
  rs) title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb))
  axis(1, 1:21, labels=FALSE, las=2) text(seq(1, 21, by=1),
  par(usr)[3], labels = levels(factor(DATA$UnitName_1)**), srt = 45,
  adj = c(1.03,1.03), xpd = TRUE,
  cex=1.8)
  axis(2, seq(-1,5, 1), seq(-1, 5, 1))
 
 
 
 
 
 --
 Shane
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vectorized code for generating the Kac (Clement) matrix

2013-04-26 Thread Enrico Schumann
On Thu, 25 Apr 2013, Ravi Varadhan ravi.varad...@jhu.edu writes:

 Hi, I am generating large Kac matrices (also known as Clement matrix).
 This a tridiagonal matrix.  I was wondering whether there is a
 vectorized solution that avoids the `for' loops to the following code:

 n - 1000

 Kacmat - matrix(0, n+1, n+1)

 for (i in 1:n) Kacmat[i, i+1] - n - i + 1

 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1

 The above code is fast, but I am curious about vectorized ways to do this.

 Thanks in advance.
 Best,
 Ravi



This may be a bit faster; but as Berend and you said, the original
function seems already fast.

n - 5000

f1 - function(n) {
Kacmat - matrix(0, n+1, n+1)
for (i in 1:n) Kacmat[i, i+1] - n - i + 1
for (i in 2:(n+1)) Kacmat[i, i-1] - i-1
Kacmat
}
f3 - function(n) {
n1 - n + 1L
res - numeric(n1 * n1)
dim(res) - c(n1, n1)
bw - n:1L ## bw = backward, fw = forward
fw - seq_len(n)
res[cbind(fw, fw + 1L)] - bw
res[cbind(fw + 1L, fw)] - fw
res
}

system.time(K1 - f1(n))
system.time(K3 - f3(n))
identical(K3, K1)

##user  system elapsed 
##   0.132   0.028   0.161 
##  
##user  system elapsed 
##   0.024   0.048   0.071 
## 


-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble Computing Type III SS in a Cox Regression

2013-04-26 Thread John Kane
Seconded

John Kane
Kingston ON Canada


 -Original Message-
 From: rolf.tur...@xtra.co.nz
 Sent: Fri, 26 Apr 2013 10:13:52 +1200
 To: thern...@mayo.edu
 Subject: Re: [R] Trouble Computing Type III SS in a Cox Regression
 
 On 26/04/13 03:40, Terry Therneau wrote:
 
 (In response to a question about computing type III sums of squares in a
 Cox regression):
 
  SNIP
 
 If you have customers who think that the earth is flat, global warming
 is a conspiracy, or that type III has special meaning this is a
 re-education issue, and I can't much help with that.
 
 Fortune nomination!
 
  cheers,
 
  Rolf
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can a column of a list be called?

2013-04-26 Thread Bert Gunter
Please read An Introduction to R or other basic R tutorial to learn
basic R operations before posting.

Please read the posting guide (link at bottom) or other similar online
guides for how to post a coherent question that will elicit an
accurate and helpful answer.

-- Bert

On Thu, Apr 25, 2013 at 5:24 PM, Jana Makedonska jmakedon...@gmail.com wrote:
 Hello Everyone,

 I would like to know if I can call one of the columns of a list, to use it
 as a variable in a function.

 Thanks in advance for any advice!

 Jana

 --

 Jana Makedonska,
 B.Sc. Biology, Universite Paul Sabatier Toulouse III
 M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier II
 Ph.D. candidate in Physical Anthropology and Part-time lecturer
 Department of Anthropology
 College of Arts  Sciences
 State University of New York at Albany
 1400 Washington Avenue
 1 Albany, NY
 Office phone: 518-442-4699
 http://electricsongs.academia.edu/JanaMakedonska
 http://www.youtube.com/watch?v=OHbT9VvtonMhttp://www.youtube.com/watch?v=jRoMoLjzpf4list=PL5BF6ACDCC2E4AAA0index=7

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting data.frame and saving to csv files

2013-04-26 Thread Bert Gunter
Hint:

nm - substring(names(df). 1,3)

gives the first 3 letters of the names, assuming this is the info
needed for classifying the names -- you were not explicit about this.
If some sort of pattern is used, ?grep may be what you need.

You can then pick columns from df by e.g. loopingt through unique(nm)...

etc.

-- Bert

On Fri, Apr 26, 2013 at 6:21 AM, Katherine Gobin
katherine_go...@yahoo.com wrote:
 Dear R Forum,

 I have a data.frame as

 df = data.frame(date = c(2013-04-15, 2013-04-14, 2013-04-13, 
 2013-04-12, 2013-04-11),
 ABC_f = c(62.80739769,81.04525895,84.65712455,12.78237251,57.61345256),
 LMN_d = c(21.16794336,54.6580401,63.8923307,87.59880367,87.07693716),
 XYZ_p = c(55.8885464,94.1358684,84.0089114,98.99746696,64.71083712),
 LMN_a = c(56.6768395,25.81530198,40.12268441,35.74175237,47.95892209),
 ABC_e = c(11.36783959,62.29651784,47.63481552,32.27820673,52.12561419),
 LMN_c = c(45.4484695,17.72362438,36.7690054,68.58912931,35.80767235),
 XYZ_zz = c(85.74755089,63.48582415,81.61107212,58.1572924,27.44132817),
 PQR = c(71.22867519,95.09994812,83.62437819,30.18524735,25.81804865),
 ABC_d =
  c(38.71089816,93.48216193,93.14432203,78.2738731,31.87170019),
 ABC_m = c(40.28473769,43.97076327,47.38761559,97.33573412,22.06884976))


 df
 dateABC_fLMN_dXYZ_pLMN_aABC_e
 1 2013-04-15 62.80740 21.16794 55.88855 56.67684 11.36784
 2 2013-04-14 81.04526 54.65804 94.13587 25.81530 62.29652
 3 2013-04-13 84.65712 63.89233 84.00891 40.12268 47.63482
 4 2013-04-12 12.78237 87.59880 98.99747 35.74175 32.27821
 5 2013-04-11 57.61345 87.07694 64.71084 47.95892 52.12561
  LMN_c   XYZ_zz  PQRABC_dABC_m
 1 45.44847 85.74755 71.22868 38.71090 40.28474
 2 17.72362 63.48582 95.09995 93.48216 43.97076
 3 36.76901 81.61107 83.62438 93.14432 47.38762
 4 68.58913 58.15729 30.18525 78.27387
  97.33573
 5 35.80767 27.44133 25.81805 31.87170 22.06885

 I need to identify columns with same labels and along-with the dates in the 
 first column, save the columns in different csv files.

 E.g. in the above data frame, I have 4 columns beginning with ABC so I need 
 to save these four columns with the date in the first column as ABC.csv, then 
 LMN_d, LMN_a, LMN_c in the LMN.csv file as date, LMN_a, LMN_c, LMN_d and so 
 on. In my actual data.frame, I won't be aware how many such rates 
 combinations are available. If there is no matching column as PQR, the 
 PQR.csv file should have only date and PQR column.

 Kindly guide how do I split the data.frame and save the respective csv files.

 Regards

 Katherine











 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove reciprocal data from a grouped animal social contact dataset

2013-04-26 Thread Adams, Jean
Cat,

It seems risky to me to assume that one collar is always outperforming
another one.  I would think there would be some cases where one collar
picked up on a contact that the other one missed AND that the other picked
up on a contact that the one missed.  If so, it may be best to keep all of
the records, and use the time information to update the start and duration
information (taking into account overlapping observations).

Defining a unique pair is pretty easy.

data - data.frame(
  record = 1:5,
  start = c(31/05/2012 04:48, 31/05/2012 05:30, 31/05/2012 05:51,
31/05/2012 05:56, 31/05/2012 06:02),
  duration = c(278, 3, 159, 47, 107),
   pair = c(CO1 CO2, CO1 CO2, CO2 CO1, CO2 CO1, CO2 CO1))
data$uniqpair - sapply(strsplit(data$pair,  ), function(x)
paste(sort(x), collapse= ))

Detecting overlapping observations will be more challenging.

As for your trouble with dput(head(data, 200)), perhaps the variable start
is defined as a factor in your data frame, so dput() was listing all of the
levels of start?

Jean




On Fri, Apr 26, 2013 at 4:08 AM, Cat Cowie cat.e.co...@gmail.com wrote:

 Hi r-help forum,

 I have been collecting contact data (with proximity logger collars)
 between a few different species of animal. All animals wear the
 collars, and any contact between the animals should be detected and
 recorded by both collars. However, this isn't always the case and more
 contacts may be recorded on one collar of the two. This is fine, it
 depends on battery life and other things that I will have to discuss!
 I now have each contact recorded as a 'group in 4 columns':

  head(data)
   recordstart duration pair
 1  1 27/05/2012 04:40 4948 CO1 CO12
 2  2 31/05/2012 04:48  278 CO1 CO12
 3  3 31/05/2012 05:303 CO1 CO12
 4  4 31/05/2012 05:51  159 CO1 CO12
 5  5 31/05/2012 05:56   47 CO1 CO12
 6  6 31/05/2012 06:02  107 CO1 CO12


 The first column shows the record number, the second shows the start
 date and time of the contact, the third shows the contact duration and
 the fourth shows the pair of animals involved in the contact. In this
 case the top 6 contacts are all between animals CO1 and CO12. There
 are nearly 100,000 records. There were many animals that could have
 contacted each other:

  animals
animals
 1  CO1
 2  CO2
 3  CO3
 4  CO4
 5  CO5
 6  CO6
 7  CO7
 8  CO8
 9  CO9
 10CO10
 11CO11
 12CO12
 13CO13
 14CO14
 15CO15
 16CO16
 17CO17
 18 PO1
 19 PO2
 20 PO3
 21 PO4
 22 PO5
 23 PO6
 24 PO7
 25 PO8
 26 PO9
 27PO10
 28PO11
 29PO12
 30PO13
 31 PI1
 32 PI2
 33 PI3
 34 PI4
 35 PI5
 36 PI6
 37 PI7
 38 PI8
 39 RD1
 40 RD2
 41 WB1
 42 WB2

 Because both collars may have recorded the single contact, I need to
 remove the reciprocal contacts from this dataset. For example, you may
 have records for CO1 CO2 that are mirrored by records for CO2 CO1.
 If there are the same number of records it doesn't matter which of
 these you select, as long as only one set is used for further
 analysis. Where there is an unequal number of contacts recorded on the
 two collars between a pair, I would like to select the records which
 have the most contacts. So, if there were 10 records recorded for CO1
 CO2 and 15 for CO2 CO1 I would like to reject the first 10 contacts
 and retain the 15. There are some cases where only one version of the
 group is recorded, e.g. just CO1 CO3, with no reciprocal CO3 CO1.
 In this case I would like to retain the data that I have.

 I would normally like to present you with my attempts so far but as a
 relatively new (but enthusiastic!) R user I am struggling to know
 where to start.

 I present more data here... sadly dput(head(data, 200) is printing all
 the dates (all nearly 100,000 of them, regardless of using head()!) so
 I hope this is ok for now:

  head(data,300)
 recordstart duration pair
 11 27/05/2012 04:40 4948 CO1 CO12
 22 31/05/2012 04:48  278 CO1 CO12
 33 31/05/2012 05:303 CO1 CO12
 44 31/05/2012 05:51  159 CO1 CO12
 55 31/05/2012 05:56   47 CO1 CO12
 66 31/05/2012 06:02  107 CO1 CO12
 77 31/05/2012 06:08   86 CO1 CO12
 88 31/05/2012 06:11  194 CO1 CO12
 99 31/05/2012 06:20   87 CO1 CO12
 10  10 31/05/2012 06:24   12 CO1 CO12
 11  11 31/05/2012 06:32   11 CO1 CO12
 12  12 31/05/2012 06:40  227 CO1 CO12
 13  13 31/05/2012 06:47  115 CO1 CO12
 14  14 12/04/2011 13:39  109 CO1 CO15
 15  15 12/04/2011 22:293 CO1 CO15
 16  16 12/04/2011 22:45   44 CO1 CO15
 17  17 12/04/2011 23:20   55 CO1 CO15
 18  18 13/04/2011 02:50   58 CO1 CO15
 19  19 13/04/2011 03:15   11 CO1 CO15
 20  20 13/04/2011 05:38   65 CO1 CO15
 21 

[R] Stepwise regression for multivariate case in R?

2013-04-26 Thread Jonathan Jansson
Hi! I am trying to make a stepwise regression in the multivariate case, using 
Wilks' Lambda test.
I've tried this: 
  greedy.wilks(cbind(Y1,Y2) ~ . , data=my.data )
But it only returns:
Error in model.frame.default(formula = X[, j] ~ grouping, drop.unused.levels = 
TRUE) : 
  variable lengths differ (found for 'grouping') 
What can be wrong here? I have checked and all variables in my.data is of the 
same length.
//Jonathan
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove reciprocal data from a grouped animal social contact dataset

2013-04-26 Thread skywalker atl
Hi

See

https://github.com/hongqin/RCompBio/blob/master/48states/48states-permutation-igraph.r

and

http://www.youtube.com/watch?v=GE2l3LYDQG0

Hope they are useful,

Hong Qin




On Fri, Apr 26, 2013 at 5:08 AM, Cat Cowie cat.e.co...@gmail.com wrote:

 Hi r-help forum,

 I have been collecting contact data (with proximity logger collars)
 between a few different species of animal. All animals wear the
 collars, and any contact between the animals should be detected and
 recorded by both collars. However, this isn't always the case and more
 contacts may be recorded on one collar of the two. This is fine, it
 depends on battery life and other things that I will have to discuss!
 I now have each contact recorded as a 'group in 4 columns':

  head(data)
   recordstart duration pair
 1  1 27/05/2012 04:40 4948 CO1 CO12
 2  2 31/05/2012 04:48  278 CO1 CO12
 3  3 31/05/2012 05:303 CO1 CO12
 4  4 31/05/2012 05:51  159 CO1 CO12
 5  5 31/05/2012 05:56   47 CO1 CO12
 6  6 31/05/2012 06:02  107 CO1 CO12


 The first column shows the record number, the second shows the start
 date and time of the contact, the third shows the contact duration and
 the fourth shows the pair of animals involved in the contact. In this
 case the top 6 contacts are all between animals CO1 and CO12. There
 are nearly 100,000 records. There were many animals that could have
 contacted each other:

  animals
animals
 1  CO1
 2  CO2
 3  CO3
 4  CO4
 5  CO5
 6  CO6
 7  CO7
 8  CO8
 9  CO9
 10CO10
 11CO11
 12CO12
 13CO13
 14CO14
 15CO15
 16CO16
 17CO17
 18 PO1
 19 PO2
 20 PO3
 21 PO4
 22 PO5
 23 PO6
 24 PO7
 25 PO8
 26 PO9
 27PO10
 28PO11
 29PO12
 30PO13
 31 PI1
 32 PI2
 33 PI3
 34 PI4
 35 PI5
 36 PI6
 37 PI7
 38 PI8
 39 RD1
 40 RD2
 41 WB1
 42 WB2

 Because both collars may have recorded the single contact, I need to
 remove the reciprocal contacts from this dataset. For example, you may
 have records for CO1 CO2 that are mirrored by records for CO2 CO1.
 If there are the same number of records it doesn't matter which of
 these you select, as long as only one set is used for further
 analysis. Where there is an unequal number of contacts recorded on the
 two collars between a pair, I would like to select the records which
 have the most contacts. So, if there were 10 records recorded for CO1
 CO2 and 15 for CO2 CO1 I would like to reject the first 10 contacts
 and retain the 15. There are some cases where only one version of the
 group is recorded, e.g. just CO1 CO3, with no reciprocal CO3 CO1.
 In this case I would like to retain the data that I have.

 I would normally like to present you with my attempts so far but as a
 relatively new (but enthusiastic!) R user I am struggling to know
 where to start.

 I present more data here... sadly dput(head(data, 200) is printing all
 the dates (all nearly 100,000 of them, regardless of using head()!) so
 I hope this is ok for now:

  head(data,300)
 recordstart duration pair
 11 27/05/2012 04:40 4948 CO1 CO12
 22 31/05/2012 04:48  278 CO1 CO12
 33 31/05/2012 05:303 CO1 CO12
 44 31/05/2012 05:51  159 CO1 CO12
 55 31/05/2012 05:56   47 CO1 CO12
 66 31/05/2012 06:02  107 CO1 CO12
 77 31/05/2012 06:08   86 CO1 CO12
 88 31/05/2012 06:11  194 CO1 CO12
 99 31/05/2012 06:20   87 CO1 CO12
 10  10 31/05/2012 06:24   12 CO1 CO12
 11  11 31/05/2012 06:32   11 CO1 CO12
 12  12 31/05/2012 06:40  227 CO1 CO12
 13  13 31/05/2012 06:47  115 CO1 CO12
 14  14 12/04/2011 13:39  109 CO1 CO15
 15  15 12/04/2011 22:293 CO1 CO15
 16  16 12/04/2011 22:45   44 CO1 CO15
 17  17 12/04/2011 23:20   55 CO1 CO15
 18  18 13/04/2011 02:50   58 CO1 CO15
 19  19 13/04/2011 03:15   11 CO1 CO15
 20  20 13/04/2011 05:38   65 CO1 CO15
 21  21 13/04/2011 08:55  122 CO1 CO15
 22  22 13/04/2011 11:064 CO1 CO15
 23  23 13/04/2011 13:47   53 CO1 CO15
 24  24 13/04/2011 13:57   32 CO1 CO15
 25  25 13/04/2011 14:32   16 CO1 CO15
 26  26 13/04/2011 14:414 CO1 CO15
 27  27 13/04/2011 21:53   33 CO1 CO15
 28  28 14/04/2011 01:00   41 CO1 CO15
 29  29 14/04/2011 01:075 CO1 CO15
 30  30 14/04/2011 01:462 CO1 CO15
 31  31 14/04/2011 06:433 CO1 CO15
 32  32 14/04/2011 08:443 CO1 CO15
 33  33 14/04/2011 08:51   64 CO1 CO15
 34  34 14/04/2011 13:596 CO1 CO15
 35  35 14/04/2011 14:11   11 CO1 CO15
 36  36 14/04/2011 14:36  169 CO1 CO15
 37  37 14/04/2011 14:42   19 CO1 CO15
 38  38 14/04/2011 15:04   48 CO1 CO15
 39  39 

Re: [R] Looping through names of both dataframes and column-names

2013-04-26 Thread Daniel Egan
Much thanks Blaser. That worked perfectly. This will improve my code
considerably. Greatly appreciated.

Regards,
Dan


On Fri, Apr 26, 2013 at 3:48 AM, Blaser Nello nbla...@ispm.unibe.ch wrote:

 Here are two possible ways to do it:

 This would simplify your code a bit. But it changes the names of x_cs to
 cs.x.
 for (df in nls) {
   assign(df, cbind(get(df), cs=apply(get(df), 2, cumsum)))
   }

 This is closer to what you have done.
 for (df in nls) {
   print(df)
   for (var in names(get(df))) {
 print(var)
 assign(df, within(get(df), assign(paste0(var,_cs),
 cumsum(get(df)[[var]]
   }}
 ls()[grep(df_,ls())]

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Daniel Egan
 Sent: Donnerstag, 25. April 2013 22:19
 To: r-help@r-project.org
 Subject: [R] Looping through names of both dataframes and column-names

 Hello all,

 This seems like a pretty standard question - suppose I want to loop
 through a set of similar data-frames, with similar variables, and create
 new variables within them:

 nl-seq(1,5)for (i in nl) {
   assign(paste0(df_,nl[i]),data.frame(x=seq(1:10),y=rnorm(10)))}
 ls()[grep(df_,ls())]
 nls-ls()[grep(df_,ls())]for (df in nls) {
   print(df)
   for (var in names(get(df))) {
 print(var)
 assign(paste0(df,$,paste0(var,_cs)),cumsum(get(df)[[var]]))
   }}
 ls()[grep(df_,ls())]

 The code above *almost* works, except that it creates a whole bunch of
 objects of the form df_1$x_cs,df_1$yx_cs . What I want is 5
 dataframes, with the $ elements enclosed, as usual.

 Any help or guidance would be appreciated.

 Much thanks,
 Dan

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 


[image: Betterment]*Daniel Egan |* Director of Behavioral Finance and
Investing at Betterment.com http://betterment.com/ | Follow us on
Twitterhttp://twitter.com/betterment
 and Facebook http://www.facebook.com/betterment
contact | d...@betterment.com - Office: 212.228.1328 - Mobile: 347-931-4897

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] time series plot: x-axis problem

2013-04-26 Thread arun


Hi,

labs - format(as.Date(time(rr)), %b-%Y)
#Error in as.Date.default(time(rr)) : 
 # do not know how to convert 'time(rr)' to class “Date”

#I guess this needs library(zoo)

library(zoo)
 labs - format(as.Date(time(rr)), %b-%Y)

sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

#or
z- zoo(rr)
 lab1-as.yearmon(index(z))
 plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue)
 axis(1, time(rr), lab1,    cex.axis = .9, tcl = -.5, las = 2)
A.K.



- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: Jerry i89...@gmail.com
Cc: r-help@r-project.org
Sent: Friday, April 26, 2013 5:25 AM
Subject: Re: [R] time series plot: x-axis problem

Hello,

Try the following.

(rr=ts(rr,start=c(2012,5),frequency=12))

plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue)

labs - format(as.Date(time(rr)), %b-%Y)

axis(1, time(rr), labs,    cex.axis = .9, tcl = -.5, las = 2)


Hope this helps,

Rui Barradas

Em 25-04-2013 19:11, Jerry escreveu:
 Hi,

 I'm trying to plot a simple time series. I'm running into an issue with
 x-axis

 The codes below will produce a plot with correct x-axis showing from Jan to
 Dec

 rr=c(3,2,4,5,4,5,3,3,6,2,4,2)

 (rr=ts(rr,start=c(2012,1),frequency=12))

 win.graph(width=6.5, height=2.5,pointsize=8)

 plot(rr, xlab=2012, ylab=event freq, xaxt = n, col=blue)

 axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)),
 cex.axis = .9, tcl = -.5, las = 2)


 However, if I change the start point from Jan 2012 to May 2012, which is

 (rr=ts(rr,start=c(2012,5),frequency=12))


 Then run the codes below

 plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue)

 axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)),
 cex.axis = .9, tcl = -.5, las = 2)


 In the the new plot produced, x-axis is still showing from Jan to Dec, not
 from May to April as I desired.

 How to fix x-axis? Is it possible to fix it WITHOUT modifying the object
 rr? Also, ideally, I would like to have each time point on x-axis showing
 month/year, not just month. How to do that?

 Any help and input will be much appreciated!

 Thanks
 Jerry

     [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vectorized code for generating the Kac (Clement) matrix

2013-04-26 Thread Berend Hasselman

On 26-04-2013, at 14:42, Enrico Schumann e...@enricoschumann.net wrote:

 On Thu, 25 Apr 2013, Ravi Varadhan ravi.varad...@jhu.edu writes:
 
 Hi, I am generating large Kac matrices (also known as Clement matrix).
 This a tridiagonal matrix.  I was wondering whether there is a
 vectorized solution that avoids the `for' loops to the following code:
 
 n - 1000
 
 Kacmat - matrix(0, n+1, n+1)
 
 for (i in 1:n) Kacmat[i, i+1] - n - i + 1
 
 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1
 
 The above code is fast, but I am curious about vectorized ways to do this.
 
 Thanks in advance.
 Best,
 Ravi
 
 
 
 This may be a bit faster; but as Berend and you said, the original
 function seems already fast.
 
 n - 5000
 
 f1 - function(n) {
Kacmat - matrix(0, n+1, n+1)
for (i in 1:n) Kacmat[i, i+1] - n - i + 1
for (i in 2:(n+1)) Kacmat[i, i-1] - i-1
Kacmat
 }
 f3 - function(n) {
n1 - n + 1L
res - numeric(n1 * n1)
dim(res) - c(n1, n1)
bw - n:1L ## bw = backward, fw = forward
fw - seq_len(n)
res[cbind(fw, fw + 1L)] - bw
res[cbind(fw + 1L, fw)] - fw
res
 }
 
 system.time(K1 - f1(n))
 system.time(K3 - f3(n))
 identical(K3, K1)
 
 ##user  system elapsed 
 ##   0.132   0.028   0.161 
 ##  
 ##user  system elapsed 
 ##   0.024   0.048   0.071 
 ## 

Using some of your code in my function I was able to speed up my function f2.
Complete code:

f1 - function(n) { #Ravi
Kacmat - matrix(0, n+1, n+1)
for (i in 1:n) Kacmat[i, i+1] - n - i + 1
for (i in 1:n) Kacmat[i+1, i] - i
Kacmat
}

f2 - function(n) { # Berend 1 modified to use 1L
Kacmat - matrix(0, n+1, n+1)
Kacmat[row(Kacmat)==col(Kacmat)-1L] - n:1L
Kacmat[row(Kacmat)==col(Kacmat)+1L] - 1L:n
Kacmat
}

f3 - function(n) { # Enrico
   n1 - n + 1L
   res - numeric(n1 * n1)
   dim(res) - c(n1, n1)
   bw - n:1L ## bw = backward, fw = forward
   fw - seq_len(n)
   res[cbind(fw, fw + 1L)] - bw
   res[cbind(fw + 1L, fw)] - fw
   res
}

f4 - function(n) {# Berend 2 using which with arr.ind=TRUE
Kacmat - matrix(0, n+1, n+1)
k1 - which(row(Kacmat)==col(Kacmat)-1L, arr.ind=TRUE)
k2 - which(row(Kacmat)==col(Kacmat)+1L, arr.ind=TRUE)

Kacmat[k1] - n:1L
Kacmat[k2] - 1L:n
Kacmat
}

library(compiler)

f1.c - cmpfun(f1)
f2.c - cmpfun(f2)
f3.c - cmpfun(f3)
f4.c - cmpfun(f4)

f1(n)
f2(n)
n - 5000

system.time(K1 - f1(n))
system.time(K2 - f2(n))
system.time(K3 - f3(n))
system.time(K4 - f4(n))

system.time(K1c - f1.c(n))
system.time(K2c - f2.c(n))
system.time(K3c - f3.c(n))
system.time(K4c - f4.c(n))
identical(K2,K1)
identical(K3,K1) 
identical(K4,K1)
identical(K1c,K1)
identical(K2c,K2)
identical(K3c,K3)
identical(K4c,K4)

Result:

#  system.time(K1 - f1(n))
#user  system elapsed 
#   0.387   0.120   0.511 
#  system.time(K2 - f2(n))
#user  system elapsed 
#   3.541   0.702   4.250 
#  system.time(K3 - f3(n))
#user  system elapsed 
#   0.108   0.089   0.199 
#  system.time(K4 - f4(n))
#user  system elapsed 
#   1.975   0.355   2.336 
#  
#  system.time(K1c - f1.c(n))
#user  system elapsed 
#   0.323   0.120   0.445 
#  system.time(K2c - f2.c(n))
#user  system elapsed 
#   3.374   0.422   3.807 
#  system.time(K3c - f3.c(n))
#user  system elapsed 
#   0.107   0.098   0.205 
#  system.time(K4c - f4.c(n))
#user  system elapsed 
#   1.816   0.384   2.203 
#  identical(K2,K1)
# [1] TRUE
#  identical(K3,K1) 
# [1] TRUE
#  identical(K4,K1)
# [1] TRUE
#  identical(K1c,K1)
# [1] TRUE
#  identical(K2c,K2)
# [1] TRUE
#  identical(K3c,K3)
# [1] TRUE
#  identical(K4c,K4)
# [1] TRUE

So Ravi's original and Enrico's versions are the quickest.
Using which with arr.ind made  my version run a lot quicker.

All in all an interesting exercise.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls: example code throws error

2013-04-26 Thread Ben Bolker
Keith Jewell k.jewell at campden.co.uk writes:

 Others have pointed out that the error is probably from an unclean 
 environment.
 

  Completely OT, but an unclean environment sounds sort of scary to me.
Like it contains zombies or something.
I don't know a better, short way to express the idea though.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted Principle Components analysis

2013-04-26 Thread David Carlson
When you run an unweighted analysis on all three systems, do the scores
agree? I would have expected that replicating the observations would give
you similar results.

You might be able to run the weighted analysis using princomp() instead of
principal since you can supply data and a covariance matrix (but the manual
page does not specifically mention supplying a correlation matrix - you
might have to run the analysis on standardized variables).

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Dimitri Liakhovitski
Sent: Friday, April 26, 2013 6:32 AM
To: r-help
Subject: Re: [R] Weighted Principle Components analysis

The reason for  my asking is because I have to replicate the same analysis
done in SPSS and SAS.

Again, to make it clear - it's respondent-weighted Factor Analysis with a
desired number of factors. Method of extraction: Principal Components.
Rotation: Varimax.

The only solution I can think of is to multiply my respondent weight by 10
(or by 100) and round it so that the new weight has no decimals, then
repeat every row as many times as the new weight says and run regular,
unweighted principal on the new data. I've done it - but again, this does
not match the Factor Scores from SPSS and SAS exactly.

Any other ideas?
Thank you!


On Thu, Apr 25, 2013 at 9:21 AM, Dimitri Liakhovitski 
dimitri.liakhovit...@gmail.com wrote:

 Hello!

 I am doing Principle Componenets Analysis using psych package:

 mypc-principal(mydata,5,scores=TRUE)

 However, I was asked to run a case-weighted PCA - using an individual 
 weight for each case.

 I could use corr from boot package to calculate the case-weighed 
 intercorrelation matrix. But if I use the intercorrelation matrix as 
 input (instead of the raw data), I am not going to get factor scores, 
 which I do need to get.

 Any advice?
 Thank you very much!

 --
 Dimitri Liakhovitski




--
Dimitri Liakhovitski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls: example code throws error

2013-04-26 Thread Duncan Murdoch

On 13-04-26 10:14 AM, Ben Bolker wrote:

Keith Jewell k.jewell at campden.co.uk writes:


Others have pointed out that the error is probably from an unclean
environment.



   Completely OT, but an unclean environment sounds sort of scary to me.
Like it contains zombies or something.


Isn't that accurate?  Undead objects causing your code to be full of bugs?

Duncan Murdoch


I don't know a better, short way to express the idea though.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Print occurrence / positions of words

2013-04-26 Thread S Ellison
 I have tried some different packages in order to build a R program
 which will take as input a text file, produce a list of the 
 words inside  that file. Each word should have a vector with 
 all the places that this  word exist in the file. 

How about

txt - paste(rep(this is a nice text with nice characters, 3), But this is 
not, collapse= )

library(stringr)
txt.vec -str_split(txt, [^[:alnum:]_]+)[[1]] 
#vector of all the words in their original sequence

tapply(1:length(txt.vec), txt.vec, c)
#Returns a list of vectors of locations of each word, sorted 
alphabetically




S Ellison

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] time series plot: x-axis problem

2013-04-26 Thread Rui Barradas

Hello,


Em 26-04-2013 14:30, arun escreveu:



Hi,

labs - format(as.Date(time(rr)), %b-%Y)
#Error in as.Date.default(time(rr)) :
  # do not know how to convert 'time(rr)' to class “Date”

#I guess this needs library(zoo)


You're right, I forgot because it was already loaded prior to running 
the code. Apologies to the OP.


Rui Barradas




library(zoo)
  labs - format(as.Date(time(rr)), %b-%Y)

sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

#or
z- zoo(rr)
  lab1-as.yearmon(index(z))
  plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue)
  axis(1, time(rr), lab1,cex.axis = .9, tcl = -.5, las = 2)
A.K.



- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: Jerry i89...@gmail.com
Cc: r-help@r-project.org
Sent: Friday, April 26, 2013 5:25 AM
Subject: Re: [R] time series plot: x-axis problem

Hello,

Try the following.

(rr=ts(rr,start=c(2012,5),frequency=12))

plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue)

labs - format(as.Date(time(rr)), %b-%Y)

axis(1, time(rr), labs,cex.axis = .9, tcl = -.5, las = 2)


Hope this helps,

Rui Barradas

Em 25-04-2013 19:11, Jerry escreveu:

Hi,

I'm trying to plot a simple time series. I'm running into an issue with
x-axis

The codes below will produce a plot with correct x-axis showing from Jan to
Dec


rr=c(3,2,4,5,4,5,3,3,6,2,4,2)



(rr=ts(rr,start=c(2012,1),frequency=12))



win.graph(width=6.5, height=2.5,pointsize=8)



plot(rr, xlab=2012, ylab=event freq, xaxt = n, col=blue)



axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)),

cex.axis = .9, tcl = -.5, las = 2)


However, if I change the start point from Jan 2012 to May 2012, which is


(rr=ts(rr,start=c(2012,5),frequency=12))



Then run the codes below


plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue)



axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)),

cex.axis = .9, tcl = -.5, las = 2)


In the the new plot produced, x-axis is still showing from Jan to Dec, not
from May to April as I desired.

How to fix x-axis? Is it possible to fix it WITHOUT modifying the object
rr? Also, ideally, I would like to have each time point on x-axis showing
month/year, not just month. How to do that?

Any help and input will be much appreciated!

Thanks
Jerry

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the joy of spreadsheets (off-topic)

2013-04-26 Thread S Ellison
 

 One might wonder if the Excel error was indeed THAT or 
 perhaps a way to get the desired results, give the other 
 issues in their analysis?

The prior for the incompetence/malice question is usually best set pretty 
heavily in favour of incompetence ...

S


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the joy of spreadsheets (off-topic)

2013-04-26 Thread William Dunlap
 The prior for the incompetence/malice question is usually best set pretty 
 heavily in
 favour of incompetence ...

The following comment on economic research is from a 2010 article in the 
Atlantic
reviewing John Ioannidis' work.
http://www.theatlantic.com/magazine/print/2010/11/lies-damned-lies-and-medical-science/308269/
  
  Medical research is not especially plagued with wrongness.
   Other meta-research experts have confirmed that similar issues
   distort research in all fields of science, from physics to economics
   (where the highly regarded economists J. Bradford DeLong and
   Kevin Lang once showed how a remarkably consistent paucity of
   strong evidence in published economics studies made it unlikely
   that any of them were right).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of S Ellison
 Sent: Friday, April 26, 2013 9:08 AM
 To: Thomas Adams; peter dalgaard
 Cc: r-help
 Subject: Re: [R] the joy of spreadsheets (off-topic)
 
 
 
  One might wonder if the Excel error was indeed THAT or
  perhaps a way to get the desired results, give the other
  issues in their analysis?
 
 The prior for the incompetence/malice question is usually best set pretty 
 heavily in
 favour of incompetence ...
 
 S
 
 
 ***
 This email and any attachments are confidential. Any use...{{dropped:8}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the joy of spreadsheets (off-topic)

2013-04-26 Thread John Kane
From a quick read,  the Excel error prior  for incompetence looks high but 
some of the other issues hint that the prior for the overall findings was 
remarkably in favor of malice.

John Kane
Kingston ON Canada


 -Original Message-
 From: s.elli...@lgcgroup.com
 Sent: Fri, 26 Apr 2013 17:07:55 +0100
 To: tea...@gmail.com, pda...@gmail.com
 Subject: Re: [R] the joy of spreadsheets (off-topic)
 
 
 
 One might wonder if the Excel error was indeed THAT or
 perhaps a way to get the desired results, give the other
 issues in their analysis?
 
 The prior for the incompetence/malice question is usually best set pretty
 heavily in favour of incompetence ...
 
 S
 
 
 ***
 This email and any attachments are confidential. Any =...{{dropped:15}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the joy of spreadsheets (off-topic)

2013-04-26 Thread S Ellison
 

 From a quick read,  the Excel error prior  for incompetence 
 looks high but some of the other issues hint that the prior 
 for the overall findings was remarkably in favor of malice.

That's p(malice|evidence), not p(malice); surely that must be the posterior? ;-)

'tain't a great advert for economics either way, though, however much fun it 
may be to apply Bayes theorem (badly, in my case) to analyse it. 

Steve E

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Decomposing a List

2013-04-26 Thread William Dunlap
You might add vapply() to you repertoire, as it is quicker than sapply but
also does some error checking on the your input data.  E.g., your f2 returns
a matrix whose columns are the elements of the list l and you assume that
there each element of l contains 2 character strings.
f2 - function(l)matrix(unlist(l),nr=2)
Here is a function based on vapply() the returns the same thing but also
verifies that element of l is really a 2-long character vector.
   f2v - function (l) vapply(l, function(x) x, FUN.VALUE = character(2))
and a function to generate datasets of various sizes
   makeL - 
function(n)strsplit(paste(sample(LETTERS,n,rep=TRUE),sample(1:10,n,rep=TRUE),sep=+),+,fix=TRUE)

Timing the functions on a million-long list I get
   l - makeL(n=10^6)
   system.time( r2 - f2(l) )
 user  system elapsed
0.088   0.000   0.090
   system.time( r2v - f2v(l) )
 user  system elapsed
 0.920.000.92 
identical(r2, r2v)
   [1] TRUE
vapply() is ten times slower than unlist() but three times faster than 
sapply(x,function(x)x).   However,
when  you give it data that doesn't meet your expectations, which is common 
when using strsplit(),
f2v tells you about the problem and f2 gives you an incorrect result:
   l[[10]] - c(a,b,c,d)
   system.time( r2v - f2v(l) )
  Error in vapply(l, function(x) x, FUN.VALUE = character(2)) :
values must be length 2,
   but FUN(X[[10]]) result is length 4
  Timing stopped at: 0.004 0 0.002
   system.time( rv - f2(l) )
 user  system elapsed
0.088   0.008   0.095
   dim(rv) # you will have alignment problems later
  [1]   2 101

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Bert Gunter
 Sent: Thursday, April 25, 2013 7:54 AM
 To: ted.hard...@wlandres.net
 Cc: R mailing list
 Subject: Re: [R] Decomposing a List
 
 Well, what you really want to do is convert the list to a matrix, and
 it can be done directly and considerably faster than with the
 (implicit) looping of sapply:
 
 f1 - function(l)sapply(l,[,1)
 f2 - function(l)matrix(unlist(l),nr=2)
 l -
 strsplit(paste(sample(LETTERS,1e6,rep=TRUE),sample(1:10,1e6,rep=TRUE),sep=+),+,f
 ix=TRUE)
 
 ## Then you get these results:
 
  system.time(x1 - f1(l))
user  system elapsed
1.920.011.95
  system.time(x2 - f2(l))
user  system elapsed
0.060.020.08
  system.time(x2 - f2(l)[1,])
user  system elapsed
 0.1 0.0 0.1
  identical(x1,x2)
 [1] TRUE
 
 
 Cheers,
 Bert
 
 
 
 
 
 
 On Thu, Apr 25, 2013 at 3:32 AM, Ted Harding ted.hard...@wlandres.net wrote:
  Thanks, Jorge, that seems to work beautifully!
  (Now to try to understand why ... but that's for later).
  Ted.
 
  On 25-Apr-2013 10:21:29 Jorge I Velez wrote:
  Dear Dr. Harding,
 
  Try
 
  sapply(L, [, 1)
  sapply(L, [, 2)
 
  HTH,
  Jorge.-
 
 
 
  On Thu, Apr 25, 2013 at 8:16 PM, Ted Harding 
  ted.hard...@wlandres.netwrote:
 
  Greetings!
  For some reason I am not managing to work out how to do this
  (in principle) simple task!
 
  As a result of applying strsplit() to a vector of character strings,
  I have a long list L (N elements), where each element is a vector
  of two character strings, like:
 
L[1] = c(A1,B1)
L[2] = c(A2,B2)
L[3] = c(A3,B3)
[etc.]
 
  From L, I wish to obtain (as directly as possible, e.g. avoiding
  a loop) two vectors each of length N where one contains the strings
  that are first in the pair, and the other contains the strings
  which are second, i.e. from L (as above) I would want to extract:
 
V1 = c(A1,A2,A3,...)
V2 = c(B1,B2,B3,...)
 
  Suggestions?
 
  With thanks,
  Ted.
 
  -
  E-Mail: (Ted Harding) ted.hard...@wlandres.net
  Date: 25-Apr-2013  Time: 11:16:46
  This message was sent by XFMail
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  -
  E-Mail: (Ted Harding) ted.hard...@wlandres.net
  Date: 25-Apr-2013  Time: 11:31:57
  This message was sent by XFMail
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
 biostatistics/pdb-ncb-home.htm
 
 __
 R-help@r-project.org mailing list
 

Re: [R] Stepwise regression for multivariate case in R?

2013-04-26 Thread Frank Harrell
Since stepwise methods do not work as advertised in the univariate case I'm
wondering why they should work in the multivariate case.
Frank

Jonathan Jansson wrote
 Hi! I am trying to make a stepwise regression in the multivariate case,
 using Wilks' Lambda test.
 I've tried this: 
  greedy.wilks(cbind(Y1,Y2) ~ . , data=my.data )
 But it only returns:
 Error in model.frame.default(formula = X[, j] ~ grouping,
 drop.unused.levels = TRUE) : 
   variable lengths differ (found for 'grouping') 
 What can be wrong here? I have checked and all variables in my.data is of
 the same length.
 //Jonathan
 
   [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Stepwise-regression-for-multivariate-case-in-R-tp4665505p4665526.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NMDS in Vegan: problems in stressplot, best solution

2013-04-26 Thread Kumar Mainali
Hello,

I can draw a basic stress plot for NMDS with the following code in package
Vegan.
 stressplot(parth.mds, parth.dis)

When I try to specify the line and point types, it gives me error message.
 stressplot(parth.mds, parth.dis, pch=1, p.col=gray, lwd=2, l.col=red)
Error in plot.xy(xy, type, ...) : invalid plot type

In the above code, if I removed line type, it does give me the plot only of
points with my choice of type.
 stressplot(parth.mds, parth.dis, pch=1, p.col=gray)

Why cannot I define both line and point at the same time?

If I have 100 iterations for metaMDS, then when I plot the result, does it
give me result from best solution? How do I know that. Can you plot the
Stress by Iteration number?
parth.mds - metaMDS(WorldPRSenv, distance = bray, k = 2, trymax = 100,
engine = c(monoMDS, isoMDS),
 autotransform =TRUE, wascores = TRUE, expand = TRUE, trace = 2)
plot(parth.mds, type = p)

Thanks in advance,
Kumar

-- 
Section of Integrative Biology
University of Texas at Austin
Austin, Texas 78712, USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Read big data (3G ) methods ?

2013-04-26 Thread Kevin Hao
Hi all scientists,

Recently, I am dealing with big data ( 3G  txt or csv format ) in my
desktop (windows 7 - 64 bit version), but I can not read them faster,
thought I search from internet. [define colClasses for read.table, cobycol
and limma packages I have use them, but it is not so fast].

Could you share your methods to read big data to R faster?

Though this is an odd question, but we need it really.

Any suggest appreciates.

Thank you very much.


kevin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting data.frame and saving to csv files

2013-04-26 Thread arun
Hi,
You can do this:
 
lst1-lapply(split(colnames(df)[-1],gsub(_.*,,colnames(df)[-1])),function(x)
 {x1-cbind(date=df[,1],df[,x]);colnames(x1)[-1]- x;x1})

 lst1
#$ABC
 #   date    ABC_f    ABC_e    ABC_d    ABC_m
#1 2013-04-15 62.80740 11.36784 38.71090 40.28474
#2 2013-04-14 81.04526 62.29652 93.48216 43.97076
#3 2013-04-13 84.65712 47.63482 93.14432 47.38762
#4 2013-04-12 12.78237 32.27821 78.27387 97.33573
#5 2013-04-11 57.61345 52.12561 31.87170 22.06885
#
#$LMN
 #   date    LMN_d    LMN_a    LMN_c
#1 2013-04-15 21.16794 56.67684 45.44847
#2 2013-04-14 54.65804 25.81530 17.72362
#3 2013-04-13 63.89233 40.12268 36.76901
#4 2013-04-12 87.59880 35.74175 68.58913
#5 2013-04-11 87.07694 47.95892 35.80767
#
#$PQR
 #    date  PQR
#[1,]    5 71.22868
#[2,]    4 95.09995
#[3,]    3 83.62438
#[4,]    2 30.18525
#[5,]    1 25.81805
#
#$XYZ
 #   date    XYZ_p   XYZ_zz
#1 2013-04-15 55.88855 85.74755
#2 2013-04-14 94.13587 63.48582
#3 2013-04-13 84.00891 81.61107
#4 2013-04-12 98.99747 58.15729
#5 2013-04-11 64.71084 27.44133

 lapply(seq_along(lst1),function(i) 
write.csv(lst1[[i]],file=paste0(names(lst1[i]),.csv),row.names=FALSE))

A.K.

- Original Message -
From: Katherine Gobin katherine_go...@yahoo.com
To: r-help@r-project.org
Cc: 
Sent: Friday, April 26, 2013 9:21 AM
Subject: [R] Splitting data.frame and saving to csv files

Dear R Forum,

I have a data.frame as

df = data.frame(date = c(2013-04-15, 2013-04-14, 2013-04-13, 
2013-04-12, 2013-04-11),
ABC_f = c(62.80739769,81.04525895,84.65712455,12.78237251,57.61345256),
LMN_d = c(21.16794336,54.6580401,63.8923307,87.59880367,87.07693716),
XYZ_p = c(55.8885464,94.1358684,84.0089114,98.99746696,64.71083712),
LMN_a = c(56.6768395,25.81530198,40.12268441,35.74175237,47.95892209),
ABC_e = c(11.36783959,62.29651784,47.63481552,32.27820673,52.12561419),
LMN_c = c(45.4484695,17.72362438,36.7690054,68.58912931,35.80767235), 
XYZ_zz = c(85.74755089,63.48582415,81.61107212,58.1572924,27.44132817),
PQR = c(71.22867519,95.09994812,83.62437819,30.18524735,25.81804865),
ABC_d =
c(38.71089816,93.48216193,93.14432203,78.2738731,31.87170019),
ABC_m = c(40.28473769,43.97076327,47.38761559,97.33573412,22.06884976))


 df
    date    ABC_f    LMN_d    XYZ_p    LMN_a    ABC_e
1 2013-04-15 62.80740 21.16794 55.88855 56.67684 11.36784
2 2013-04-14 81.04526 54.65804 94.13587 25.81530 62.29652
3 2013-04-13 84.65712 63.89233 84.00891 40.12268 47.63482
4 2013-04-12 12.78237 87.59880 98.99747 35.74175 32.27821
5 2013-04-11 57.61345 87.07694 64.71084 47.95892 52.12561
 LMN_c   XYZ_zz  PQR    ABC_d    ABC_m
1 45.44847 85.74755 71.22868 38.71090 40.28474
2 17.72362 63.48582 95.09995 93.48216 43.97076
3 36.76901 81.61107 83.62438 93.14432 47.38762
4 68.58913 58.15729 30.18525 78.27387
97.33573
5 35.80767 27.44133 25.81805 31.87170 22.06885

I need to identify columns with same labels and along-with the dates in the 
first column, save the columns in different csv files.

E.g. in the above data frame, I have 4 columns beginning with ABC so I need to 
save these four columns with the date in the first column as ABC.csv, then 
LMN_d, LMN_a, LMN_c in the LMN.csv file as date, LMN_a, LMN_c, LMN_d and so on. 
In my actual data.frame, I won't be aware how many such rates combinations are 
available. If there is no matching column as PQR, the PQR.csv file should 
have only date and PQR column. 

Kindly guide how do I split the data.frame and save the respective csv files.

Regards

Katherine











    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting data.frame and saving to csv files

2013-04-26 Thread arun


Hi,
Just noticed a mistake:
lst1 should be:
lst1-lapply(split(colnames(df)[-1],gsub(_.*,,colnames(df)[-1])),function(x)
 {x1-cbind(date=df[,1],df[,x]); cbind(date=df[,1],df[x])})
 lst1
#$ABC
 #   date    ABC_f    ABC_e    ABC_d    ABC_m
#1 2013-04-15 62.80740 11.36784 38.71090 40.28474
#2 2013-04-14 81.04526 62.29652 93.48216 43.97076
#3 2013-04-13 84.65712 47.63482 93.14432 47.38762
#4 2013-04-12 12.78237 32.27821 78.27387 97.33573
#5 2013-04-11 57.61345 52.12561 31.87170 22.06885
#
#$LMN
 #   date    LMN_d    LMN_a    LMN_c
#1 2013-04-15 21.16794 56.67684 45.44847
#2 2013-04-14 54.65804 25.81530 17.72362
#3 2013-04-13 63.89233 40.12268 36.76901
#4 2013-04-12 87.59880 35.74175 68.58913
#5 2013-04-11 87.07694 47.95892 35.80767
#
#$PQR
 #   date  PQR
#1 2013-04-15 71.22868
#2 2013-04-14 95.09995
#3 2013-04-13 83.62438
#4 2013-04-12 30.18525
#5 2013-04-11 25.81805
#
#$XYZ
 #   date    XYZ_p   XYZ_zz
#1 2013-04-15 55.88855 85.74755
#2 2013-04-14 94.13587 63.48582
#3 2013-04-13 84.00891 81.61107
#4 2013-04-12 98.99747 58.15729
#5 2013-04-11 64.71084 27.44133
A.K.


- Original Message -
From: arun smartpink...@yahoo.com
To: Katherine Gobin katherine_go...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Friday, April 26, 2013 9:45 AM
Subject: Re: [R] Splitting data.frame and saving to csv files

Hi,
You can do this:
 
lst1-lapply(split(colnames(df)[-1],gsub(_.*,,colnames(df)[-1])),function(x)
 {x1-cbind(date=df[,1],df[,x]);colnames(x1)[-1]- x;x1})

 lst1
#$ABC
 #   date    ABC_f    ABC_e    ABC_d    ABC_m
#1 2013-04-15 62.80740 11.36784 38.71090 40.28474
#2 2013-04-14 81.04526 62.29652 93.48216 43.97076
#3 2013-04-13 84.65712 47.63482 93.14432 47.38762
#4 2013-04-12 12.78237 32.27821 78.27387 97.33573
#5 2013-04-11 57.61345 52.12561 31.87170 22.06885
#
#$LMN
 #   date    LMN_d    LMN_a    LMN_c
#1 2013-04-15 21.16794 56.67684 45.44847
#2 2013-04-14 54.65804 25.81530 17.72362
#3 2013-04-13 63.89233 40.12268 36.76901
#4 2013-04-12 87.59880 35.74175 68.58913
#5 2013-04-11 87.07694 47.95892 35.80767
#
#$PQR
 #    date  PQR
#[1,]    5 71.22868
#[2,]    4 95.09995
#[3,]    3 83.62438
#[4,]    2 30.18525
#[5,]    1 25.81805
#
#$XYZ
 #   date    XYZ_p   XYZ_zz
#1 2013-04-15 55.88855 85.74755
#2 2013-04-14 94.13587 63.48582
#3 2013-04-13 84.00891 81.61107
#4 2013-04-12 98.99747 58.15729
#5 2013-04-11 64.71084 27.44133

 lapply(seq_along(lst1),function(i) 
write.csv(lst1[[i]],file=paste0(names(lst1[i]),.csv),row.names=FALSE))

A.K.

- Original Message -
From: Katherine Gobin katherine_go...@yahoo.com
To: r-help@r-project.org
Cc: 
Sent: Friday, April 26, 2013 9:21 AM
Subject: [R] Splitting data.frame and saving to csv files

Dear R Forum,

I have a data.frame as

df = data.frame(date = c(2013-04-15, 2013-04-14, 2013-04-13, 
2013-04-12, 2013-04-11),
ABC_f = c(62.80739769,81.04525895,84.65712455,12.78237251,57.61345256),
LMN_d = c(21.16794336,54.6580401,63.8923307,87.59880367,87.07693716),
XYZ_p = c(55.8885464,94.1358684,84.0089114,98.99746696,64.71083712),
LMN_a = c(56.6768395,25.81530198,40.12268441,35.74175237,47.95892209),
ABC_e = c(11.36783959,62.29651784,47.63481552,32.27820673,52.12561419),
LMN_c = c(45.4484695,17.72362438,36.7690054,68.58912931,35.80767235), 
XYZ_zz = c(85.74755089,63.48582415,81.61107212,58.1572924,27.44132817),
PQR = c(71.22867519,95.09994812,83.62437819,30.18524735,25.81804865),
ABC_d =
c(38.71089816,93.48216193,93.14432203,78.2738731,31.87170019),
ABC_m = c(40.28473769,43.97076327,47.38761559,97.33573412,22.06884976))


 df
    date    ABC_f    LMN_d    XYZ_p    LMN_a    ABC_e
1 2013-04-15 62.80740 21.16794 55.88855 56.67684 11.36784
2 2013-04-14 81.04526 54.65804 94.13587 25.81530 62.29652
3 2013-04-13 84.65712 63.89233 84.00891 40.12268 47.63482
4 2013-04-12 12.78237 87.59880 98.99747 35.74175 32.27821
5 2013-04-11 57.61345 87.07694 64.71084 47.95892 52.12561
 LMN_c   XYZ_zz  PQR    ABC_d    ABC_m
1 45.44847 85.74755 71.22868 38.71090 40.28474
2 17.72362 63.48582 95.09995 93.48216 43.97076
3 36.76901 81.61107 83.62438 93.14432 47.38762
4 68.58913 58.15729 30.18525 78.27387
97.33573
5 35.80767 27.44133 25.81805 31.87170 22.06885

I need to identify columns with same labels and along-with the dates in the 
first column, save the columns in different csv files.

E.g. in the above data frame, I have 4 columns beginning with ABC so I need to 
save these four columns with the date in the first column as ABC.csv, then 
LMN_d, LMN_a, LMN_c in the LMN.csv file as date, LMN_a, LMN_c, LMN_d and so on. 
In my actual data.frame, I won't be aware how many such rates combinations are 
available. If there is no matching column as PQR, the PQR.csv file should 
have only date and PQR column. 

Kindly guide how do I split the data.frame and save the respective csv files.

Regards

Katherine











    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list

[R] speed of a vector operation question

2013-04-26 Thread Mikhail Umorin
Hello, 

I am dealing with numeric vectors 10^5 to 10^6 elements long. The values are 
sorted (with duplicates) in the vector (v). I am obtaining the length of 
vectors such as (v  c) or (v  c1  v  c2), where c, c1, c2 are some scalar 
variables. What is the most efficient way to do this?

I am using sum(v  c) since TRUE's are 1's and FALSE's are 0's. This seems to 
me more efficient than length(which(v  c)), but, please, correct me if I'm 
wrong. So, is there anything faster than what I already use?

I'm running R 2.14.2 on Linux kernel 3.4.34.

I appreciate your time, 

Mikhail
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] converting character matrix to POSIXct matrix

2013-04-26 Thread hh wt
I thought this is a common question but rseek/google searches don't yield
any relevant hit.

I have a matrix of character strings, which are time stamps,

 time.m[1:5,1:5]
 [,1]   [,2]   [,3]   [,4]   [,5]

[1,] 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799
08:00:20.799
[2,] 08:00:21.996 08:00:22.071 08:00:23.821 08:00:24.370
08:00:25.573
[3,] 08:00:29.200 08:00:29.200 08:00:29.591 08:00:30.368
08:00:30.536
[4,] 08:00:31.073 08:00:31.372 08:00:31.384 08:00:31.403
08:00:31.867
[5,] 08:00:31.867 08:00:31.867 08:00:31.971 08:00:34.571
08:00:34.571

And i would like to convert it to a POSIXct matrix. I tried this,

time1 = lapply(time.m, function(tt)strptime(tt, %H:%M:%OS))

but it yields a list.

Any tip is appreciated.


Horace

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Questions about out-of-sample forecast using random walk

2013-04-26 Thread Wandi Zhou
Hi there,



I'm a bit confused about which command should I use when performing an 
out-of-sample forecast using random walk. I have som time sereis data from 
1957Q1 to 2011Q4, I want to use a fraction of data from 1960Q1 to 1984Q4 to 
forecast data from 1985Q1 onwards using random walk model and evaluate the 
forecasting performance based on the true data I have. I used rwf command from 
'forecast' package to do this. However, the results I obtained is all around 
the value of 1984Q4, which is quite different from the true data that shows an 
increasing trend with time. Could you me give some suggestions on which command 
should I choose to perform the random walk forecast and get the Mean Squared 
Forecast Error and Mean Absolute Error of the forecast? It would be really 
helpful if you can reply as soon as possible since I'm urgently need this. 
Thanks a lot.



Kind Regards,



Lavender

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with merge function

2013-04-26 Thread Catarina Ferreira
Dear all,

I'm trying to merge 2 dataframes, but I'm not being entirely successful and
I can't understand why.

Dataframe x1

State_prov Shape_name   bob2009   bob 2010   bob2011
Nova ScotiaAnnapolis 0  0  1
Nova ScotiaAntigonish0  0  0
Nova ScotiaGly   NA   NA NA

Dataframe x2 - has 2 rows and 193 variables, contains one important
field which is FID that is a link to a shapefile (this is not in x1) and
shares common columns with x1, like this:

FID State_prov Shape_name   bob2009   bob 2010  coy 2009
 0Nova ScotiaAnnapolis 0
0  10
 1Nova ScotiaAntigonish0
0  1
 2Nova ScotiaGly   0
0  1

So when I do

x3  - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE)

it should do the trick. The thing is that it works for the columns (it adds
all the new columns not common to both dataframes), but it also adds the
rows. This is what I get (x3):

FID State_prov Shape_name   bob2009   bob 2010  coy 2009   bob2011
 0Nova ScotiaAnnapolis 0
0  10NA
 NA  Nova ScotiaAnnapolis NA   NA  NA
1
 1Nova ScotiaAntigonish0
0  1   NA
NA  Nova ScotiaAntigonishNA   NA  NA
0
 2Nova ScotiaGly   0
0  1   NA
NA  Nova ScotiaGly   NA   NA
NA NA

What I want to get is a true merge, like this:

FID State_prov Shape_name   bob2009   bob 2010  coy 2009   bob2011
 0Nova ScotiaAnnapolis 0
0  101
 1Nova ScotiaAntigonish0
0  1   0
 2Nova ScotiaGly   0
0  1   NA

Can anybody please help me to understand what I'm doing wrong.
Any help will be much appreciated!!


-- 
Catarina C. Ferreira, PhD

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error Installing packages

2013-04-26 Thread Pramod Anugu
I am trying to install the package boss but i am getting error below:
Please advice

install.packages(boss)

--- Please select a CRAN mirror for use in this session ---

CRAN mirror

 

1: 0-Cloud   2: Argentina (La Plata)

3: Argentina (Mendoza)   4: Australia (Canberra)

5: Australia (Melbourne) 6: Austria

7: Belgium   8: Brazil (PR)

9: Brazil (RJ)  10: Brazil (SP 1)

11: Brazil (SP 2)12: Canada (BC)

13: Canada (NS)  14: Canada (ON)

15: Canada (QC 1)16: Canada (QC 2)

17: Chile18: China (Beijing 1)

19: China (Beijing 2)20: China (Guangzhou)

21: China (Hefei)22: China (Xiamen)

23: Colombia (Bogota)24: Colombia (Cali)

25: Denmark  26: Ecuador

27: France (Lyon 1)  28: France (Lyon 2)

29: France (Montpellier) 30: France (Paris 1)

31: France (Paris 2) 32: Germany (Berlin)

33: Germany (Bonn)   34: Germany (Falkenstein)

35: Germany (Goettingen) 36: Greece

37: Hungary  38: India

39: Indonesia40: Iran

41: Ireland  42: Italy (Milano)

43: Italy (Padua)44: Italy (Palermo)

45: Japan (Hyogo)46: Japan (Tsukuba)

47: Japan (Tokyo)48: Korea (Seoul 1)

49: Korea (Seoul 2)  50: Latvia

51: Mexico (Mexico City) 52: Mexico (Texcoco)

53: Netherlands (Amsterdam)  54: Netherlands (Utrecht)

55: New Zealand  56: Norway

57: Philippines  58: Poland

59: Portugal 60: Russia

61: Singapore62: Slovakia

63: South Africa (Cape Town) 64: South Africa (Johannesburg)

65: Spain (Madrid)   66: Sweden

67: Switzerland  68: Taiwan (Taichung)

69: Taiwan (Taipei)  70: Thailand

71: Turkey   72: UK (Bristol)

73: UK (London)  74: UK (St Andrews)

75: USA (CA 1)   76: USA (CA 2)

77: USA (IA) 78: USA (IN)

79: USA (KS) 80: USA (MD)

81: USA (MI) 82: USA (MO)

83: USA (OH) 84: USA (OR)

85: USA (PA 1)   86: USA (PA 2)

87: USA (TN) 88: USA (TX 1)

89: USA (WA 1)   90: USA (WA 2)

91: Venezuela92: Vietnam

 

 

Selection: 86

also installing the dependency 'ncdf'

 

trying URL 'http://cran.mirrors.hoobly.com/src/contrib/ncdf_1.6.6.tar.gz'

Content type 'application/x-gzip' length 79403 bytes (77 Kb)

opened URL

==

downloaded 77 Kb

 

trying URL 'http://cran.mirrors.hoobly.com/src/contrib/boss_1.2.tar.gz'

Content type 'application/x-gzip' length 9702 bytes

opened URL

==

downloaded 9702 bytes

 

* installing *source* package 'ncdf' ...

** package 'ncdf' successfully unpacked and MD5 sums checked

checking for nc-config... no

checking for gcc... gcc -std=gnu99

checking whether the C compiler works... yes

checking for C compiler default output file name... a.out

checking for suffix of executables...

checking whether we are cross compiling... no

checking for suffix of object files... o

checking whether we are using the GNU C compiler... yes

checking whether gcc -std=gnu99 accepts -g... yes

checking for gcc -std=gnu99 option to accept ISO C89... none needed

checking how to run the C preprocessor... gcc -std=gnu99 -E

checking for grep that handles long lines and -e... /bin/grep

checking for egrep... /bin/grep -E

checking for ANSI C header files... yes

checking for sys/types.h... yes

checking for sys/stat.h... yes

checking for stdlib.h... yes

checking for string.h... yes

checking for memory.h... yes

checking for strings.h... yes

checking for inttypes.h... yes

checking for stdint.h... yes

checking for unistd.h... yes

checking netcdf.h usability... no

checking netcdf.h presence... no

checking for netcdf.h... no

configure: error: netcdf header netcdf.h not found

ERROR: configuration failed for package 'ncdf'

* removing '/share/apps/R-2.15.3/lib64/R/library/ncdf'

ERROR: dependency 'ncdf' is not available for package 'boss'

* removing '/share/apps/R-2.15.3/lib64/R/library/boss'

 

The downloaded source packages are in

'/tmp/RtmppOWF74/downloaded_packages'

Updating HTML index of packages in '.Library'

Making packages.html  ... done

Warning messages:

1: In install.packages(boss) :

  installation of package 'ncdf' had non-zero exit status

2: In install.packages(boss) :

  installation of package 'boss' had non-zero exit status

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list

Re: [R] Read big data (3G ) methods ?

2013-04-26 Thread Ye Lin
Have you think of build a database then then let R read it thru that db
instead of your desktop?


On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote:

 Hi all scientists,

 Recently, I am dealing with big data ( 3G  txt or csv format ) in my
 desktop (windows 7 - 64 bit version), but I can not read them faster,
 thought I search from internet. [define colClasses for read.table, cobycol
 and limma packages I have use them, but it is not so fast].

 Could you share your methods to read big data to R faster?

 Though this is an odd question, but we need it really.

 Any suggest appreciates.

 Thank you very much.


 kevin

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C50 package in R

2013-04-26 Thread Max Kuhn
There isn't much out there. Quinlan didn't open source the code until about
a year ago.

I've been through the code line by line and we have a fairly descriptive
summary of the model in our book (that's almost out):

  http://appliedpredictivemodeling.com/

I will say that the pruning is mostly the same as described in Quinlan's
C4.5 book. The big differences in C4.5 and C5.0 are boosting and winnowing.
The former is very different mechanically than gradient boosting machines
and is more similar to the re-weighting approach of the original adaboost
algorithm (but is still pretty different).

I've submitted a talk on C5.0 for this year's UseR! conference. If there is
enough time I will be able to go through some of the technical details.

Two other related notes:

- the J48 implementation in Weka lacks one or two of C4.5's features that
makes the results substantially different than what C4.5 would have
produced  The differences are significant enough that Quinlan asked us to
call the results of that function as J48 and not C4.5. Using C5.0 with
a single tree is much similar to C4.5 than J48.

- the differences between model trees and Cubist are also substantial and
largely undocumented.

HTH,

Max




On Thu, Apr 25, 2013 at 9:40 AM, Indrajit Sen Gupta 
indrajit...@rediffmail.com wrote:

 Hi All,



 I am trying to use the C50 package to build classification trees in R.
 Unfortunately there is not enought documentation around its use. Can anyone
 explain to me - how to prune the decision trees?



 Regards,

 Indrajit


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Max

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with merge function

2013-04-26 Thread Rui Barradas

Hello,

The following seems to do the trick.



x1 -
structure(list(State_prov = c(Nova Scotia, Nova Scotia, Nova Scotia
), Shape_name = c(Annapolis, Antigonish, Gly), bob2009 = c(0L,
0L, NA), bob2010 = c(0L, 0L, NA), bob2011 = c(1L, 0L, NA)), .Names = 
c(State_prov,
Shape_name, bob2009, bob2010, bob2011), class = data.frame, 
row.names = c(NA,

-3L))

x2 -
structure(list(FID = 0:2, State_prov = c(Nova Scotia, Nova Scotia,
Nova Scotia), Shape_name = c(Annapolis, Antigonish, Gly
), bob2009 = c(0L, 0L, 0L), bob2010 = c(0L, 0L, 0L), coy2009 = c(10L,
1L, 1L)), .Names = c(FID, State_prov, Shape_name, bob2009,
bob2010, coy2009), class = data.frame, row.names = c(NA,
-3L))

x3  - merge(x1, x2, all.y = TRUE)



Note also that since by = intersect(names(x1), names(x2)), you really 
don't need it, it's the default behavior.


Hope this helps,

Rui Barradas

Em 26-04-2013 18:10, Catarina Ferreira escreveu:

Dear all,

I'm trying to merge 2 dataframes, but I'm not being entirely successful and
I can't understand why.

Dataframe x1

State_prov Shape_name   bob2009   bob 2010   bob2011
Nova ScotiaAnnapolis 0  0  1
Nova ScotiaAntigonish0  0  0
Nova ScotiaGly   NA   NA NA

Dataframe x2 - has 2 rows and 193 variables, contains one important
field which is FID that is a link to a shapefile (this is not in x1) and
shares common columns with x1, like this:

FID State_prov Shape_name   bob2009   bob 2010  coy 2009
  0Nova ScotiaAnnapolis 0
0  10
  1Nova ScotiaAntigonish0
0  1
  2Nova ScotiaGly   0
0  1

So when I do

x3  - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE)

it should do the trick. The thing is that it works for the columns (it adds
all the new columns not common to both dataframes), but it also adds the
rows. This is what I get (x3):

FID State_prov Shape_name   bob2009   bob 2010  coy 2009   bob2011
  0Nova ScotiaAnnapolis 0
0  10NA
  NA  Nova ScotiaAnnapolis NA   NA  NA
 1
  1Nova ScotiaAntigonish0
0  1   NA
NA  Nova ScotiaAntigonishNA   NA  NA
 0
  2Nova ScotiaGly   0
0  1   NA
NA  Nova ScotiaGly   NA   NA
NA NA

What I want to get is a true merge, like this:

FID State_prov Shape_name   bob2009   bob 2010  coy 2009   bob2011
  0Nova ScotiaAnnapolis 0
0  101
  1Nova ScotiaAntigonish0
0  1   0
  2Nova ScotiaGly   0
0  1   NA

Can anybody please help me to understand what I'm doing wrong.
Any help will be much appreciated!!




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting character matrix to POSIXct matrix

2013-04-26 Thread Rui Barradas

Hello,

Use sapply instead.


Hope this helps,

Rui Barradas

Em 26-04-2013 18:51, hh wt escreveu:

I thought this is a common question but rseek/google searches don't yield
any relevant hit.

I have a matrix of character strings, which are time stamps,


time.m[1:5,1:5]

  [,1]   [,2]   [,3]   [,4]   [,5]

[1,] 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799
08:00:20.799
[2,] 08:00:21.996 08:00:22.071 08:00:23.821 08:00:24.370
08:00:25.573
[3,] 08:00:29.200 08:00:29.200 08:00:29.591 08:00:30.368
08:00:30.536
[4,] 08:00:31.073 08:00:31.372 08:00:31.384 08:00:31.403
08:00:31.867
[5,] 08:00:31.867 08:00:31.867 08:00:31.971 08:00:34.571
08:00:34.571

And i would like to convert it to a POSIXct matrix. I tried this,

time1 = lapply(time.m, function(tt)strptime(tt, %H:%M:%OS))

but it yields a list.

Any tip is appreciated.


Horace

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stratified Random Sampling Proportional to Size

2013-04-26 Thread Lopez, Dan
Hello R Experts,

I kindly request your assistance on figuring out how to get a stratified random 
sampling proportional to 100.

Below is my r code showing what I did and the error I'm getting with 
sampling::strata

# FIRST I summarized count of records by the two variables I want to use as 
strata

Library(RODBC)
library(sqldf)
library(sampling)
#After establishing connection I query the data and sort it by strata 
APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe
CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, 
EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN,RET_TYP_CD_LL
 FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY 
APPT_TYP_CD_LL, EMPL_TYPE)
#ROWID is a dummy ID I added and repositioned after the strat columns for later 
use
CURRPOP$ROWID-seq(nrow(CURRPOP))
CURRPOP-CURRPOP[,c(1:2,11,3:10)]

# My strata.  Stratp is how many I want to sampled from each strata. NOTE THERE 
ARE SOME 0's which just means I won't sample from that group.
stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP 
GROUP BY APPT_TYP_CD_LL,EMPL_TYPE)
stratum_cp$stratp-round(stratum_cp$HC/nrow(CURRPOP)*100)

 stratum_cp
   APPT_TYP_CD_LL EMPL_TYPE   HC stratp
1  FA S1  0
2  FC S5  0
3  FP S  173  3
4  FR H  170  3
5  FX H   49  1
6  FX S   57  1
7  IN H 1589 25
8  IN S 3987 63
9  IP H7  0
10 IP S   53  1
11 SA H8  0
12 SE S   43  1
13 SF H   14  0
14 SF S1  0
15 SG S   10  0
16 ST H  107  2
17 ST S6  0

#THEN I attempted to use sampling::strata using the instructions in that 
package and got an error


#I use stratum_cp$stratp for my sizes.



 s-strata(CURRPOP,c(APPT_TYP_CD_LL,EMPL_TYPE),size=stratum_cp$stratp,method=srswor)

Error in data.frame(..., check.names = FALSE) :

  arguments imply differing number of rows: 0, 1

 traceback()

5: stop(arguments imply differing number of rows: , paste(unique(nrows),

   collapse = , ))

4: data.frame(..., check.names = FALSE)

3: cbind(deparse.level, ...)

2: cbind(r, i)

1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size = stratum_cp$stratp,

   method = srswor)



#In lieu of a reproducible sample here is some info regarding most of my data
dim(CURRPOP)
[1] 6280   11
#Cols w/ personal info have been removed in this output

 str(CURRPOP[,c(1:3,7:11)])

'data.frame':  6280 obs. of  8 variables:

 $ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3 3 3 
...

 $ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ...

 $ ROWID : int  1 2 3 4 5 6 7 8 9 10 ...

 $ DEPTID: int  9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ...

 $ JOBCODE   : Factor w/ 325 levels 055.2,055.3,..: 311 112 112 112 112 
112 298 299 299 300 ...

 $ JOBTITLE  : Factor w/ 325 levels Accounting Assistant,..: 227 192 192 
192 192 192 190 191 191 153 ...

 $ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38 38 38 
31 31 31 31 ...

 $ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2 ...

Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics  Metrics
Strategic Human Resources Management
wf-analytics-metr...@lists.llnl.govmailto:wf-analytics-metr...@lists.llnl.gov
(925) 422-0814


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to export graph value in R

2013-04-26 Thread Thomas Adams
Anup,

You should have provided some additional information, such as that the
function 'hypsometric' is found in the hydroTSM contributed package.
Nevertheless, here's what I did (maybe not elegant, but it works) :

(1) at the R command prompt simply type hypsometric -- the source code for
the function 'hypsometric' will be written out
(2) copy this source code into a text file and save it as hypsometric2.R
(3) edit it as this or just copy this:

hypsometric2 - function (x, band = 1, main = Hypsometric Curve, xlab =
Relative Area above Elevation, (a/A),
ylab = Relative Elevation, (h/H), col = blue, ...)
{
if (class(x) != SpatialGridDataFrame)
stop(Invalid argument: 'class(x)' must be 'SpatialGridDataFrame')
band.error - FALSE
if (is.numeric(band) | is.integer(band)) {
if ((band  1) | (band  length(colnames(x@data
band.error - TRUE
}
else if (is.character(band))
if (!(band %in% colnames(x@data)))
band.error - TRUE
if (band.error)
stop(Invalid argument: 'band' does not exist in 'x' !)
mydem - x@data[band]
z.min - min(mydem, na.rm = TRUE)
z.max - max(mydem, na.rm = TRUE)
x.dim - x@grid@cellsize[1]
y.dim - x@grid@cellsize[2]
max.area - length(which(!is.na(mydem))) * x.dim * y.dim
res - plot.stepfun(ecdf(as.matrix(mydem)), lwd = 0, cex.points = 0)
z.mean.index - which(round(res$y, 3) == 0.5)[1]
z.mean - res$t[z.mean.index]
relative.area - (1 - res$y[-1])
relative.elev - (res$t[-c(1, length(res$t))] - z.min)/(z.max -
z.min)
plot(relative.area, relative.elev, xaxt = n, yaxt = n,
main = main, xlim = c(0, 1), ylim = c(0, 1), type = l,
ylab = ylab, xlab = xlab, col = col, ...)
Axis(side = 1, at = seq(0, 1, by = 0.05), labels = TRUE)
Axis(side = 2, at = seq(0, 1, by = 0.05), labels = TRUE)
f - splinefun(relative.area, relative.elev, method = monoH.FC)
hi - integrate(f = f, lower = 0, upper = 1)
legend(topright, c(paste(Min Elev. :, round(z.min, 2),
[m.a.s.l.], sep =  ), paste(Mean Elev.:, round(z.mean,
1), [m.a.s.l.], sep =  ), paste(Max Elev. :, round(z.max,
1), [m.a.s.l.], sep =  ), paste(Max Area  :,
round(max.area/1e+06,
1), [km2], sep =  ), , paste(Integral value :,
round(hi$value, 3), sep =  ), paste(Integral error :,
round(hi$abs.error, 3), sep =  )), bty = n, cex = 0.9,
col = c(black, black, black), lty = c(NULL, NULL,
NULL, NULL))

curve_data-data.frame(relative.area,relative.elev)
return(curve_data)
}

(4) rather than calling hypsometric(dem), for example, first do this:

source(hypsometric2.R)

(5) then call:

data-hypsometric2(dem)

(6) you can see the x.y pairs by typing:

data

 at the R prompt.

(7) verify that the data are what you expect, by typing this at the R
prompt:

plot(data)

which should give the same plot as hypsometric2(dem) and hypsometric(dem)
without the embellishments and labeling...

Tom

On Fri, Apr 26, 2013 at 8:52 AM, Anup khanal za...@hotmail.com wrote:

 Dear exports,I have created a hypsometric curve (area-elevation curve) for
 my watershed by using simple command hypsometric(X,main=Hypsometric
 Curve, xlab=Relative Area above Elevation, (a/A),
  ylab=Relative Elevation, (h/H), col=blue)It plots the hypsometric
 curve in RGraphics window, My question is how can I export values which
 is used to create this plot? I mean I want to know the value in y axis for
 certain x value.
 Thanks in advance !

 ..Anup KhanalNorwegian Institute of science and Technology
 (NTNU)Trondheim, NorwayMob:(+47) 45174313

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read big data (3G ) methods ?

2013-04-26 Thread lcn
Do you really have the need loading all the data into memory?

Mostly for large data set, people would just read a chunk of it for
developing analysis pipeline, and when that's done, the ready script would
just iterate through the entire data set. For example, the read.table
function has 'nrow' and 'skip' parameters to control the reading of data
chunks.

read.table(file, nrows = -1, skip = 0, ...)

And another tip here is, you can split the large file into smaller ones.



On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote:

 Hi all scientists,

 Recently, I am dealing with big data ( 3G  txt or csv format ) in my
 desktop (windows 7 - 64 bit version), but I can not read them faster,
 thought I search from internet. [define colClasses for read.table, cobycol
 and limma packages I have use them, but it is not so fast].

 Could you share your methods to read big data to R faster?

 Though this is an odd question, but we need it really.

 Any suggest appreciates.

 Thank you very much.


 kevin

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [newbie] how to find and combine geographic maps with particular features?

2013-04-26 Thread MacQueen, Don
If someone else hasn't suggested it already, you will probably get
more/better help on the R-sig-geo mailing list.

(if you decide to repost there, just mention up front that it's a repost
and why)

-Don
-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 4/25/13 6:38 PM, Tom Roche tom_ro...@pobox.com wrote:


SUMMARY:

Specific problem: I'm regridding biomass-burning emissions from a
global/unprojected inventory to a regional projection (LCC over North
America). I need to have boundaries for Canada, Mexico, and US
(including US states), but also Caribbean and Atlantic nations
(notably the Bahamas). I would also like to add Canadian provinces and
Mexican states. How to put these together?

General problem: are there references regarding

* sources for different geographical and political features?

* combining maps for the different R graphics packages?

DETAILS:

(Apologies if this is a FAQ, but googling has not helped me with this.)

I'd appreciate help with a specific problem, as well as guidance
(e.g., pointers to docs) regarding the larger topic of combining
geographical maps (especially projected ones, i.e., not just lon-lat)
on plots of regional data (i.e., data that is multinational but not
global).

My specific problem is

https://bitbucket.org/tlroche/gfed-3.1_global_to_aqmeii-na/downloads/GFED-
3.1_2008_N2O_monthly_emissions_regrid_20130404_1344.pdf

which plots N2O concentrations from a global inventory of fire
emissions (GFED) regridded to a North American projection. (See

https://bitbucket.org/tlroche/gfed-3.1_global_to_aqmeii-na

for details.) The plot currently includes boundaries for Canada,
Mexico, and US (including US states, since this is being done for a US
agency), which are being gotten calling code from package=M3

http://cran.r-project.org/web/packages/M3/

like

https://bitbucket.org/tlroche/gfed-3.1_global_to_aqmeii-na/src/95484c5d635
02ab146402cedc3612dcdaf629bd7/vis_regrid_vis.r?at=master
 ## get projected North American map
 NorAm.shp - project.NorAm.boundaries.for.CMAQ(
   units='m',
   extents.fp=template_input_fp,
   extents=template.extents,
   LCC.parallels=c(33,45),
   CRS=out.crs)

https://bitbucket.org/tlroche/gfed-3.1_global_to_aqmeii-na/src/95484c5d635
02ab146402cedc3612dcdaf629bd7/visualization.r?at=master
 # database: Geographical database to use.  Choices include state
 #   (default), world, worldHires, canusamex, etc.  Use
 #   canusamex to get the national boundaries of the Canada,
the
 #   USA, and Mexico, along with the boundaries of the states.
 #   The other choices (state, world, etc.) are the names of
 #   databases included with the Œmaps¹ and Œmapdata¹ packages.

 project.M3.boundaries.for.CMAQ - function(
   database='state', # see `?M3::get.map.lines.M3.proj`
   units='m',# or 'km': see `?M3::get.map.lines.M3.proj`
   extents.fp,   # path to extents file
   extents,  # raster::extent object
   LCC.parallels=c(33,45), # LCC standard parallels: see
https://github.com/TomRoche/cornbeltN2O/wiki/AQMEII-North-American-domain
#wiki-EPA
   CRS   # see `sp::CRS`
 ) {

   library(M3)
   ## Will replace raw LCC map's coordinates with:
   metadata.coords.IOAPI.list - M3::get.grid.info.M3(extents.fp)
   metadata.coords.IOAPI.x.orig - metadata.coords.IOAPI.list$x.orig
   metadata.coords.IOAPI.y.orig - metadata.coords.IOAPI.list$y.orig
   metadata.coords.IOAPI.x.cell.width -
metadata.coords.IOAPI.list$x.cell.width
   metadata.coords.IOAPI.y.cell.width -
metadata.coords.IOAPI.list$y.cell.width

   library(maps)
   map.lines - M3::get.map.lines.M3.proj(
 file=extents.fp, database=database, units=m)
   # dimensions are in meters, not cells. TODO: take argument
   map.lines.coords.IOAPI.x -
 (map.lines$coords[,1] - metadata.coords.IOAPI.x.orig)
   map.lines.coords.IOAPI.y -
 (map.lines$coords[,2] - metadata.coords.IOAPI.y.orig)
   map.lines.coords.IOAPI -
 cbind(map.lines.coords.IOAPI.x, map.lines.coords.IOAPI.y)

   # # start debugging
   # class(map.lines.coords.IOAPI)
   # # [1] matrix
   # summary(map.lines.coords.IOAPI)
   # #  map.lines.coords.IOAPI.x map.lines.coords.IOAPI.y
   # #  Min.   : 283762Min.   : 160844
   # #  1st Qu.:26502441st Qu.:1054047
   # #  Median :3469204Median :1701052
   # #  Mean   :3245997Mean   :1643356
   # #  3rd Qu.:43009693rd Qu.:2252531
   # #  Max.   :4878260Max.   :2993778
   # #  NA's   :168NA's   :168
   # #   end debugging

   # Note above is not zero-centered, like our extents:
   # extent : -2556000, 2952000, -1728000, 186  (xmin, xmax, ymin,
ymax)
   # So gotta add (xmin, ymin) below.

   ## Get LCC state map
   # see 
http://stackoverflow.com/questions/14865507/how-to-display-a-projected-ma
p-on-an-rlatticelayerplot
   map.IOAPI - maps::map(
 

Re: [R] speed of a vector operation question

2013-04-26 Thread lcn
I think the sum way is the best.


On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin mike...@gmail.com wrote:

 Hello,

 I am dealing with numeric vectors 10^5 to 10^6 elements long. The values
 are
 sorted (with duplicates) in the vector (v). I am obtaining the length of
 vectors such as (v  c) or (v  c1  v  c2), where c, c1, c2 are some
 scalar
 variables. What is the most efficient way to do this?

 I am using sum(v  c) since TRUE's are 1's and FALSE's are 0's. This seems
 to
 me more efficient than length(which(v  c)), but, please, correct me if I'm
 wrong. So, is there anything faster than what I already use?

 I'm running R 2.14.2 on Linux kernel 3.4.34.

 I appreciate your time,

 Mikhail
 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression coefficients

2013-04-26 Thread Preetam Pal
Hi all,

I have run a ridge regression as follows:

reg=lm.ridge(final$l~final$lag1+final$lag2+final$g+final$g+final$u,
lambda=seq(0,10,0.01))

Then I enter :

select(reg)   and it returns: modified HKB estimator is 19.3409
   modified L-W estimator is 36.18617
   smallest value of GCV  at 10

I think it means that it is advisable to use the results of regression
corresponding to lambda= 10;
so the next thing I do is:

reg=lm.ridge(final$l~final$lag1+final$lag2+final$g+final$u, lambda=10)

which yields:

 final$lag1final$lag2   final$g
final$u
 3.147255e-04  1.802505e-01 -4.461005e-02 -1.728046e-09 -5.154932e-04


If I am to use these coefficient values later in my analysis, how do I call
them; for clearly reg$final$lag1  does not work.

1 Any way I can access these values?

2 The main issue is that I want to access these coefficient values
automatically, i.e. R should run the regression and automatically provide
me these values after taking into consideration that lambda which minimizes
the GCV.   Kindly advise me how I can proceed.


Thanks and regards,
Preetam

-- 
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year, Room No. N-114
Statistics Division,   C.V.Raman
Hall
Indian Statistical Institute, B.H.O.S.
Kolkata.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NMDS in Vegan: problems in stressplot, best solution

2013-04-26 Thread Gavin Simpson
On Fri, 2013-04-26 at 12:42 -0500, Kumar Mainali wrote:
 Hello,
 
 I can draw a basic stress plot for NMDS with the following code in package
 Vegan.
  stressplot(parth.mds, parth.dis)
 
 When I try to specify the line and point types, it gives me error message.
  stressplot(parth.mds, parth.dis, pch=1, p.col=gray, lwd=2, l.col=red)
 Error in plot.xy(xy, type, ...) : invalid plot type
 
 In the above code, if I removed line type, it does give me the plot only of
 points with my choice of type.
  stressplot(parth.mds, parth.dis, pch=1, p.col=gray)
 
 Why cannot I define both line and point at the same time?

You can. What you can't do is use argument `dis` with an metaMDS object.
If you use:

stressplot(parth.mds, pch=1, p.col=gray, lwd=2, l.col=red)

you'll see it works just fine.

We'll see about providing a better error message if you do what the
documentation asks you not to.

 If I have 100 iterations for metaMDS, then when I plot the result, does it
 give me result from best solution?

The best solution it encountered in the 100 random starts, yes.

 How do I know that.

It is implied in point 4. of the Details section of ?metaMDS.

  Can you plot the
 Stress by Iteration number?

Not in a graphical plot. The stresses for each iteration are printed to
the console at each iteration. Note these iterations are random
starts, each of which has iterations of the algorithm.

HTH

G

 parth.mds - metaMDS(WorldPRSenv, distance = bray, k = 2, trymax = 100,
 engine = c(monoMDS, isoMDS),
  autotransform =TRUE, wascores = TRUE, expand = TRUE, trace = 2)
 plot(parth.mds, type = p)
 
 Thanks in advance,
 Kumar
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read big data (3G ) methods ?

2013-04-26 Thread Martin Morgan

On 04/26/2013 08:09 AM, Kevin Hao wrote:

Hi all scientists,

Recently, I am dealing with big data ( 3G  txt or csv format ) in my
desktop (windows 7 - 64 bit version), but I can not read them faster,
thought I search from internet. [define colClasses for read.table, cobycol
and limma packages I have use them, but it is not so fast].


you mention limma; if this is sequence or microarray data then asking on the 
Bioconductor mailing list


  http://bioconductor.org/help/mailing-list/

(no subscription necessary) may be more appropriate, but you need to provide 
more information about what you want to do, e.g., a code chunk illustrating the 
problem.


Martin



Could you share your methods to read big data to R faster?

Though this is an odd question, but we need it really.

Any suggest appreciates.

Thank you very much.


kevin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed of a vector operation question

2013-04-26 Thread William Dunlap

 I think the sum way is the best.

On my Linux machine running R-3.0.0 the sum way is slightly faster:
   x - rexp(1e6, 2)
   system.time(for(i in 1:100)sum(x.3  x.5))
 user  system elapsed
4.664   0.340   5.018
   system.time(for(i in 1:100)length(which(x.3  x.5)))
 user  system elapsed
5.017   0.160   5.186

If you are doing many of these counts on the same dataset you
can save time by using functions like cut(), table(), ecdf(), and
findInterval().  E.g.,
 system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128)  
 x=i), FUN.VALUE=0L))
   user  system elapsed
  5.332   0.568   5.909
 system.time(r2 - table(cut(x, seq(0,1,by=1/128
   user  system elapsed
  0.500   0.008   0.511
 all.equal(as.vector(r1), as.vector(r2))
[1] TRUE

You should do the timings yourself, as the relative speeds will depend
on the version or dialect of  the R interpreter and how it was compiled.
E.g., with the current development version of 'TIBCO Enterprise Runtime for R' 
(aka 'TERR')
on this same 8-core Linux box the sum way is considerably faster then
the length(which) way:
   x - rexp(1e6, 2)
   system.time(for(i in 1:100)sum(x.3  x.5))
 user  system elapsed
 1.870.030.48
   system.time(for(i in 1:100)length(which(x.3  x.5)))
 user  system elapsed
 3.210.040.83
   system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128) 
 x=i), FUN.VALUE=0L))
 user  system elapsed
 2.190.040.56
   system.time(r2 - table(cut(x, seq(0,1,by=1/128
 user  system elapsed
 0.270.010.13
   all.equal(as.vector(r1), as.vector(r2))
  [1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of lcn
 Sent: Friday, April 26, 2013 12:09 PM
 To: Mikhail Umorin
 Cc: r-help@r-project.org
 Subject: Re: [R] speed of a vector operation question
 
 I think the sum way is the best.
 
 
 On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin mike...@gmail.com wrote:
 
  Hello,
 
  I am dealing with numeric vectors 10^5 to 10^6 elements long. The values
  are
  sorted (with duplicates) in the vector (v). I am obtaining the length of
  vectors such as (v  c) or (v  c1  v  c2), where c, c1, c2 are some
  scalar
  variables. What is the most efficient way to do this?
 
  I am using sum(v  c) since TRUE's are 1's and FALSE's are 0's. This seems
  to
  me more efficient than length(which(v  c)), but, please, correct me if I'm
  wrong. So, is there anything faster than what I already use?
 
  I'm running R 2.14.2 on Linux kernel 3.4.34.
 
  I appreciate your time,
 
  Mikhail
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed of a vector operation question

2013-04-26 Thread Martin Morgan
A very similar question was asked on StackOverflow (by Mikhail? and then I guess 
the answers there were somehow not satisfactory...)



http://stackoverflow.com/questions/16213029/more-efficient-strategy-for-which-or-match

where it turns out that a binary search (implemented in R) on the sorted vector 
is much faster than sum, etc. I guess because it's log N without copying. The 
more complicated condition x  .3  x  .5 could be satisfied with multiple 
calls to the search.


Martin

On 04/26/2013 01:20 PM, William Dunlap wrote:



I think the sum way is the best.


On my Linux machine running R-3.0.0 the sum way is slightly faster:
x - rexp(1e6, 2)
system.time(for(i in 1:100)sum(x.3  x.5))
  user  system elapsed
 4.664   0.340   5.018
system.time(for(i in 1:100)length(which(x.3  x.5)))
  user  system elapsed
 5.017   0.160   5.186

If you are doing many of these counts on the same dataset you
can save time by using functions like cut(), table(), ecdf(), and
findInterval().  E.g.,

system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128)  
x=i), FUN.VALUE=0L))

user  system elapsed
   5.332   0.568   5.909

system.time(r2 - table(cut(x, seq(0,1,by=1/128

user  system elapsed
   0.500   0.008   0.511

all.equal(as.vector(r1), as.vector(r2))

[1] TRUE

You should do the timings yourself, as the relative speeds will depend
on the version or dialect of  the R interpreter and how it was compiled.
E.g., with the current development version of 'TIBCO Enterprise Runtime for R' 
(aka 'TERR')
on this same 8-core Linux box the sum way is considerably faster then
the length(which) way:
x - rexp(1e6, 2)
system.time(for(i in 1:100)sum(x.3  x.5))
  user  system elapsed
  1.870.030.48
system.time(for(i in 1:100)length(which(x.3  x.5)))
  user  system elapsed
  3.210.040.83
system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128)  
x=i), FUN.VALUE=0L))
  user  system elapsed
  2.190.040.56
system.time(r2 - table(cut(x, seq(0,1,by=1/128
  user  system elapsed
  0.270.010.13
all.equal(as.vector(r1), as.vector(r2))
   [1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf
Of lcn
Sent: Friday, April 26, 2013 12:09 PM
To: Mikhail Umorin
Cc: r-help@r-project.org
Subject: Re: [R] speed of a vector operation question

I think the sum way is the best.


On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin mike...@gmail.com wrote:


Hello,

I am dealing with numeric vectors 10^5 to 10^6 elements long. The values
are
sorted (with duplicates) in the vector (v). I am obtaining the length of
vectors such as (v  c) or (v  c1  v  c2), where c, c1, c2 are some
scalar
variables. What is the most efficient way to do this?

I am using sum(v  c) since TRUE's are 1's and FALSE's are 0's. This seems
to
me more efficient than length(which(v  c)), but, please, correct me if I'm
wrong. So, is there anything faster than what I already use?

I'm running R 2.14.2 on Linux kernel 3.4.34.

I appreciate your time,

Mikhail
 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read big data (3G ) methods ?

2013-04-26 Thread Ye Lin
I can not think of sth better. Maybe try read part of the data that you
want to analyze, basically break the large data set into pieces.


On Fri, Apr 26, 2013 at 10:58 AM, Ye Lin ye...@lbl.gov wrote:

 Have you think of build a database then then let R read it thru that db
 instead of your desktop?


 On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote:

 Hi all scientists,

 Recently, I am dealing with big data ( 3G  txt or csv format ) in my
 desktop (windows 7 - 64 bit version), but I can not read them faster,
 thought I search from internet. [define colClasses for read.table, cobycol
 and limma packages I have use them, but it is not so fast].

 Could you share your methods to read big data to R faster?

 Though this is an odd question, but we need it really.

 Any suggest appreciates.

 Thank you very much.


 kevin

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed of a vector operation question

2013-04-26 Thread William Dunlap
R's findInterval can also take advantage of a sorted x vector.  E.g.,
in R-3.0.0 on the same 8-core Linux box:

 x - rexp(1e6, 2)
 system.time(for(i in 1:100)tabulate(findInterval(x, c(-Inf, .3, .5, Inf)))[2])
   user  system elapsed
  2.444   0.000   2.446
 xs - sort(x)
 system.time(for(i in 1:100)tabulate(findInterval(xs, c(-Inf, .3, .5, 
 Inf)))[2])
   user  system elapsed
  1.472   0.000   1.475
 
 tabulate(findInterval(xs, c(-Inf, .3, .5, Inf)))[2]
[1] 180636
 sum( xs  .3  xs = .5 )
[1] 180636


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: Martin Morgan [mailto:mtmor...@fhcrc.org]
 Sent: Friday, April 26, 2013 1:33 PM
 To: William Dunlap
 Cc: lcn; Mikhail Umorin; r-help@r-project.org
 Subject: Re: [R] speed of a vector operation question
 
 A very similar question was asked on StackOverflow (by Mikhail? and then I 
 guess
 the answers there were somehow not satisfactory...)
 
 
 http://stackoverflow.com/questions/16213029/more-efficient-strategy-for-which-or-
 match
 
 where it turns out that a binary search (implemented in R) on the sorted 
 vector
 is much faster than sum, etc. I guess because it's log N without copying. The
 more complicated condition x  .3  x  .5 could be satisfied with multiple
 calls to the search.
 
 Martin
 
 On 04/26/2013 01:20 PM, William Dunlap wrote:
 
  I think the sum way is the best.
 
  On my Linux machine running R-3.0.0 the sum way is slightly faster:
  x - rexp(1e6, 2)
  system.time(for(i in 1:100)sum(x.3  x.5))
user  system elapsed
   4.664   0.340   5.018
  system.time(for(i in 1:100)length(which(x.3  x.5)))
user  system elapsed
   5.017   0.160   5.186
 
  If you are doing many of these counts on the same dataset you
  can save time by using functions like cut(), table(), ecdf(), and
  findInterval().  E.g.,
  system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128) 
   x=i),
 FUN.VALUE=0L))
  user  system elapsed
 5.332   0.568   5.909
  system.time(r2 - table(cut(x, seq(0,1,by=1/128
  user  system elapsed
 0.500   0.008   0.511
  all.equal(as.vector(r1), as.vector(r2))
  [1] TRUE
 
  You should do the timings yourself, as the relative speeds will depend
  on the version or dialect of  the R interpreter and how it was compiled.
  E.g., with the current development version of 'TIBCO Enterprise Runtime for 
  R' (aka
 'TERR')
  on this same 8-core Linux box the sum way is considerably faster then
  the length(which) way:
  x - rexp(1e6, 2)
  system.time(for(i in 1:100)sum(x.3  x.5))
user  system elapsed
1.870.030.48
  system.time(for(i in 1:100)length(which(x.3  x.5)))
user  system elapsed
3.210.040.83
  system.time(r1 - vapply(seq(0,1,by=1/128)[-1], 
  function(i)sum(x(i-1/128)  x=i),
 FUN.VALUE=0L))
user  system elapsed
2.190.040.56
  system.time(r2 - table(cut(x, seq(0,1,by=1/128
user  system elapsed
0.270.010.13
  all.equal(as.vector(r1), as.vector(r2))
 [1] TRUE
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf
  Of lcn
  Sent: Friday, April 26, 2013 12:09 PM
  To: Mikhail Umorin
  Cc: r-help@r-project.org
  Subject: Re: [R] speed of a vector operation question
 
  I think the sum way is the best.
 
 
  On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin mike...@gmail.com wrote:
 
  Hello,
 
  I am dealing with numeric vectors 10^5 to 10^6 elements long. The values
  are
  sorted (with duplicates) in the vector (v). I am obtaining the length of
  vectors such as (v  c) or (v  c1  v  c2), where c, c1, c2 are some
  scalar
  variables. What is the most efficient way to do this?
 
  I am using sum(v  c) since TRUE's are 1's and FALSE's are 0's. This seems
  to
  me more efficient than length(which(v  c)), but, please, correct me if 
  I'm
  wrong. So, is there anything faster than what I already use?
 
  I'm running R 2.14.2 on Linux kernel 3.4.34.
 
  I appreciate your time,
 
  Mikhail
   [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the 

Re: [R] Help with merge function

2013-04-26 Thread Catarina Ferreira
Hello, Thank you for your help. However the dataframes I gave you were only
examples, the actual dataframes are very big. Does this mean I have to
write every range of data for each variable??


On Fri, Apr 26, 2013 at 2:25 PM, Rui Barradas ruipbarra...@sapo.pt wrote:

 Hello,

 The following seems to do the trick.



 x1 -
 structure(list(State_prov = c(Nova Scotia, Nova Scotia, Nova Scotia
 ), Shape_name = c(Annapolis, Antigonish, Gly), bob2009 = c(0L,
 0L, NA), bob2010 = c(0L, 0L, NA), bob2011 = c(1L, 0L, NA)), .Names =
 c(State_prov,
 Shape_name, bob2009, bob2010, bob2011), class = data.frame,
 row.names = c(NA,
 -3L))

 x2 -
 structure(list(FID = 0:2, State_prov = c(Nova Scotia, Nova Scotia,
 Nova Scotia), Shape_name = c(Annapolis, Antigonish, Gly
 ), bob2009 = c(0L, 0L, 0L), bob2010 = c(0L, 0L, 0L), coy2009 = c(10L,
 1L, 1L)), .Names = c(FID, State_prov, Shape_name, bob2009,
 bob2010, coy2009), class = data.frame, row.names = c(NA,
 -3L))

 x3  - merge(x1, x2, all.y = TRUE)



 Note also that since by = intersect(names(x1), names(x2)), you really
 don't need it, it's the default behavior.

 Hope this helps,

 Rui Barradas

 Em 26-04-2013 18:10, Catarina Ferreira escreveu:

  Dear all,

 I'm trying to merge 2 dataframes, but I'm not being entirely successful
 and
 I can't understand why.

 Dataframe x1

 State_prov Shape_name   bob2009   bob 2010   bob2011
 Nova ScotiaAnnapolis 0  0  1
 Nova ScotiaAntigonish0  0  0
 Nova ScotiaGly   NA   NA NA

 Dataframe x2 - has 2 rows and 193 variables, contains one important
 field which is FID that is a link to a shapefile (this is not in x1) and
 shares common columns with x1, like this:

 FID State_prov Shape_name   bob2009   bob 2010  coy 2009
   0Nova ScotiaAnnapolis 0
 0  10
   1Nova ScotiaAntigonish0
 0  1
   2Nova ScotiaGly   0
 0  1

 So when I do

 x3  - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE)

 it should do the trick. The thing is that it works for the columns (it
 adds
 all the new columns not common to both dataframes), but it also adds the
 rows. This is what I get (x3):

 FID State_prov Shape_name   bob2009   bob 2010  coy 2009   bob2011
   0Nova ScotiaAnnapolis 0
 0  10NA
   NA  Nova ScotiaAnnapolis NA   NA  NA
  1
   1Nova ScotiaAntigonish0
 0  1   NA
 NA  Nova ScotiaAntigonishNA   NA  NA
  0
   2Nova ScotiaGly   0
 0  1   NA
 NA  Nova ScotiaGly   NA   NA
 NA NA

 What I want to get is a true merge, like this:

 FID State_prov Shape_name   bob2009   bob 2010  coy 2009   bob2011
   0Nova ScotiaAnnapolis 0
 0  101
   1Nova ScotiaAntigonish0
 0  1   0
   2Nova ScotiaGly   0
 0  1   NA

 Can anybody please help me to understand what I'm doing wrong.
 Any help will be much appreciated!!





-- 
Catarina C. Ferreira, PhD
Post-doctoral Research Fellow
Department of Biology
Trent University
Peterborough, ON Canada
URL: http://www.researcherid.com/rid/A-3898-2011

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with merge function

2013-04-26 Thread arun
Hi,

The format is bit messed up. 
So, not sure this is what you wanted.

x1- read.table(text=State_prov,Shape_name,bob2009,bob2010,bob2011
Nova Scotia,Annapolis,0,0,1
Nova Scotia,Antigonish,0,0,0
Nova Scotia,Gly,NA,NA,NA
,sep=,,header=TRUE,stringsAsFactors=FALSE)

x2- read.table(text=
FID,State_prov,Shape_name,bob2009,bob2010,coy2009
0,Nova Scotia,Annapolis,0,0,10
1,Nova Scotia,Antigonish,0,0,1
2,Nova Scotia,Gly,0,0,1
,sep=,,header=TRUE,stringsAsFactors=FALSE)
 merge(x1,x2,all=TRUE)
#   State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009
#1 Nova Scotia  Annapolis   0   0   1   0  10
#2 Nova Scotia Antigonish   0   0   0   1   1
#3 Nova Scotia    Gly   0   0  NA   2   1
#4 Nova Scotia    Gly  NA  NA  NA  NA  NA





- Original Message -
From: Catarina Ferreira catferre...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, April 26, 2013 1:10 PM
Subject: [R] Help with merge function

Dear all,

I'm trying to merge 2 dataframes, but I'm not being entirely successful and
I can't understand why.

Dataframe x1

State_prov     Shape_name   bob2009   bob 2010   bob2011
Nova Scotia    Annapolis         0                  0              1
Nova Scotia    Antigonish        0                  0              0
Nova Scotia    Gly                   NA               NA             NA

Dataframe x2 - has 2 rows and 193 variables, contains one important
field which is FID that is a link to a shapefile (this is not in x1) and
shares common columns with x1, like this:

FID     State_prov     Shape_name   bob2009   bob 2010  coy 2009
0        Nova Scotia    Annapolis         0
0              10
1        Nova Scotia    Antigonish        0
0              1
2        Nova Scotia    Gly                   0
0              1

So when I do

x3  - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE)

it should do the trick. The thing is that it works for the columns (it adds
all the new columns not common to both dataframes), but it also adds the
rows. This is what I get (x3):

FID     State_prov     Shape_name   bob2009   bob 2010  coy 2009   bob2011
0        Nova Scotia    Annapolis         0
0              10            NA
NA      Nova Scotia    Annapolis         NA               NA          NA
            1
1        Nova Scotia    Antigonish        0
0              1               NA
NA      Nova Scotia    Antigonish        NA               NA          NA
            0
2        Nova Scotia    Gly                   0
0              1               NA
NA      Nova Scotia    Gly                   NA               NA
NA             NA

What I want to get is a true merge, like this:

FID     State_prov     Shape_name   bob2009   bob 2010  coy 2009   bob2011
0        Nova Scotia    Annapolis         0
0              10            1
1        Nova Scotia    Antigonish        0
0              1               0
2        Nova Scotia    Gly                   0
0              1               NA

Can anybody please help me to understand what I'm doing wrong.
Any help will be much appreciated!!


-- 
Catarina C. Ferreira, PhD

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] example

2013-04-26 Thread Iut Tri Utami
Dear Sir,
My name is Iut Tri Utami. i am beginning user.  I have a problem about
generate data in R. It consists of one disk generated by a Gaussian N(0,
0.167) and one ring generated by a Gaussian N(R, 0.1). The mean R was
generated from its polar coordinates. The angle was drawn from a uniform
distribution on the interval (0, 2ð), and the radius, from a Gaussian
N(1.5, 0.1). The class sizes are 500 and 2000.


Thank you very much for your attention and, I wish that you will help me.

Best wishes ,

Iut Tri Utami

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read big data (3G ) methods ?

2013-04-26 Thread Kevin Hao
Thanks lcn,

I will try to read data from different chunks.

Best,

Kevin


On Fri, Apr 26, 2013 at 3:05 PM, lcn lcn...@gmail.com wrote:

 Do you really have the need loading all the data into memory?

 Mostly for large data set, people would just read a chunk of it for
 developing analysis pipeline, and when that's done, the ready script would
 just iterate through the entire data set. For example, the read.table
 function has 'nrow' and 'skip' parameters to control the reading of data
 chunks.

 read.table(file, nrows = -1, skip = 0, ...)

 And another tip here is, you can split the large file into smaller ones.



 On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote:

 Hi all scientists,

 Recently, I am dealing with big data ( 3G  txt or csv format ) in my
 desktop (windows 7 - 64 bit version), but I can not read them faster,
 thought I search from internet. [define colClasses for read.table, cobycol
 and limma packages I have use them, but it is not so fast].

 Could you share your methods to read big data to R faster?

 Though this is an odd question, but we need it really.

 Any suggest appreciates.

 Thank you very much.


 kevin

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with merge function

2013-04-26 Thread Rui Barradas

Hello,

I don't understand the question, what range? I've just changed the 'all' 
argument to 'all.y', without doing anything special to the variables.

Can you explain what you mean?

Rui Barradas


Em 26-04-2013 19:30, Catarina Ferreira escreveu:

Hello, Thank you for your help. However the dataframes I gave you were only
examples, the actual dataframes are very big. Does this mean I have to
write every range of data for each variable??


On Fri, Apr 26, 2013 at 2:25 PM, Rui Barradas ruipbarra...@sapo.pt wrote:


Hello,

The following seems to do the trick.



x1 -
structure(list(State_prov = c(Nova Scotia, Nova Scotia, Nova Scotia
), Shape_name = c(Annapolis, Antigonish, Gly), bob2009 = c(0L,
0L, NA), bob2010 = c(0L, 0L, NA), bob2011 = c(1L, 0L, NA)), .Names =
c(State_prov,
Shape_name, bob2009, bob2010, bob2011), class = data.frame,
row.names = c(NA,
-3L))

x2 -
structure(list(FID = 0:2, State_prov = c(Nova Scotia, Nova Scotia,
Nova Scotia), Shape_name = c(Annapolis, Antigonish, Gly
), bob2009 = c(0L, 0L, 0L), bob2010 = c(0L, 0L, 0L), coy2009 = c(10L,
1L, 1L)), .Names = c(FID, State_prov, Shape_name, bob2009,
bob2010, coy2009), class = data.frame, row.names = c(NA,
-3L))

x3  - merge(x1, x2, all.y = TRUE)



Note also that since by = intersect(names(x1), names(x2)), you really
don't need it, it's the default behavior.

Hope this helps,

Rui Barradas

Em 26-04-2013 18:10, Catarina Ferreira escreveu:

  Dear all,


I'm trying to merge 2 dataframes, but I'm not being entirely successful
and
I can't understand why.

Dataframe x1

State_prov Shape_name   bob2009   bob 2010   bob2011
Nova ScotiaAnnapolis 0  0  1
Nova ScotiaAntigonish0  0  0
Nova ScotiaGly   NA   NA NA

Dataframe x2 - has 2 rows and 193 variables, contains one important
field which is FID that is a link to a shapefile (this is not in x1) and
shares common columns with x1, like this:

FID State_prov Shape_name   bob2009   bob 2010  coy 2009
   0Nova ScotiaAnnapolis 0
0  10
   1Nova ScotiaAntigonish0
0  1
   2Nova ScotiaGly   0
0  1

So when I do

x3  - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE)

it should do the trick. The thing is that it works for the columns (it
adds
all the new columns not common to both dataframes), but it also adds the
rows. This is what I get (x3):

FID State_prov Shape_name   bob2009   bob 2010  coy 2009   bob2011
   0Nova ScotiaAnnapolis 0
0  10NA
   NA  Nova ScotiaAnnapolis NA   NA  NA
  1
   1Nova ScotiaAntigonish0
0  1   NA
NA  Nova ScotiaAntigonishNA   NA  NA
  0
   2Nova ScotiaGly   0
0  1   NA
NA  Nova ScotiaGly   NA   NA
NA NA

What I want to get is a true merge, like this:

FID State_prov Shape_name   bob2009   bob 2010  coy 2009   bob2011
   0Nova ScotiaAnnapolis 0
0  101
   1Nova ScotiaAntigonish0
0  1   0
   2Nova ScotiaGly   0
0  1   NA

Can anybody please help me to understand what I'm doing wrong.
Any help will be much appreciated!!








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting character matrix to POSIXct matrix

2013-04-26 Thread arun


time.m- as.matrix(read.table(text='
08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799
08:00:21.996 08:00:22.071 08:00:23.821 08:00:24.370 08:00:25.573
08:00:29.200 08:00:29.200 08:00:29.591 08:00:30.368 08:00:30.536
08:00:31.073 08:00:31.372 08:00:31.384 08:00:31.403 08:00:31.867
08:00:31.867 08:00:31.867 08:00:31.971 08:00:34.571 08:00:34.571
',sep=,header=FALSE,stringsAsFactors=FALSE))
colnames(time.m)- NULL
op- options(digits.secs=3)
 res-data.frame(lapply(seq_len(ncol(time.m)),function(i) 
strptime(time.m[,i],%H:%M:%OS)))
colnames(res)- paste0(X,1:5)
  str(res)
#'data.frame':    5 obs. of  5 variables:
# $ X1: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:21.996 ...
# $ X2: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:22.071 ...
# $ X3: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:23.821 ...
# $ X4: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:24.369 ...
# $ X5: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:25.572 ...
options(op)

A.K.

- Original Message -
From: hh wt horace...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, April 26, 2013 1:51 PM
Subject: [R] converting character matrix to POSIXct matrix

I thought this is a common question but rseek/google searches don't yield
any relevant hit.

I have a matrix of character strings, which are time stamps,

 time.m[1:5,1:5]
     [,1]           [,2]           [,3]           [,4]           [,5]

[1,] 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799
08:00:20.799
[2,] 08:00:21.996 08:00:22.071 08:00:23.821 08:00:24.370
08:00:25.573
[3,] 08:00:29.200 08:00:29.200 08:00:29.591 08:00:30.368
08:00:30.536
[4,] 08:00:31.073 08:00:31.372 08:00:31.384 08:00:31.403
08:00:31.867
[5,] 08:00:31.867 08:00:31.867 08:00:31.971 08:00:34.571
08:00:34.571

And i would like to convert it to a POSIXct matrix. I tried this,

time1 = lapply(time.m, function(tt)strptime(tt, %H:%M:%OS))

but it yields a list.

Any tip is appreciated.


Horace

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read big data (3G ) methods ?

2013-04-26 Thread Kevin Hao
Thanks.

I will try break into pieces to analysis.

Kevin


On Fri, Apr 26, 2013 at 4:38 PM, Ye Lin ye...@lbl.gov wrote:

 I can not think of sth better. Maybe try read part of the data that you
 want to analyze, basically break the large data set into pieces.


 On Fri, Apr 26, 2013 at 10:58 AM, Ye Lin ye...@lbl.gov wrote:

 Have you think of build a database then then let R read it thru that db
 instead of your desktop?


 On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote:

 Hi all scientists,

 Recently, I am dealing with big data ( 3G  txt or csv format ) in my
 desktop (windows 7 - 64 bit version), but I can not read them faster,
 thought I search from internet. [define colClasses for read.table,
 cobycol
 and limma packages I have use them, but it is not so fast].

 Could you share your methods to read big data to R faster?

 Though this is an odd question, but we need it really.

 Any suggest appreciates.

 Thank you very much.


 kevin

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vectorized code for generating the Kac (Clement) matrix

2013-04-26 Thread Ravi Varadhan
Thank you, Berend and Enrico, for looking into this.  I did not think of 
Enrico's clever use of cbind() to form the subsetting indices. 

Best,
Ravi

From: Berend Hasselman [b...@xs4all.nl]
Sent: Friday, April 26, 2013 10:08 AM
To: Enrico Schumann
Cc: Ravi Varadhan; 'r-help@r-project.org'
Subject: Re: [R] Vectorized code for generating the Kac (Clement) matrix

On 26-04-2013, at 14:42, Enrico Schumann e...@enricoschumann.net wrote:

 On Thu, 25 Apr 2013, Ravi Varadhan ravi.varad...@jhu.edu writes:

 Hi, I am generating large Kac matrices (also known as Clement matrix).
 This a tridiagonal matrix.  I was wondering whether there is a
 vectorized solution that avoids the `for' loops to the following code:

 n - 1000

 Kacmat - matrix(0, n+1, n+1)

 for (i in 1:n) Kacmat[i, i+1] - n - i + 1

 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1

 The above code is fast, but I am curious about vectorized ways to do this.

 Thanks in advance.
 Best,
 Ravi



 This may be a bit faster; but as Berend and you said, the original
 function seems already fast.

 n - 5000

 f1 - function(n) {
Kacmat - matrix(0, n+1, n+1)
for (i in 1:n) Kacmat[i, i+1] - n - i + 1
for (i in 2:(n+1)) Kacmat[i, i-1] - i-1
Kacmat
 }
 f3 - function(n) {
n1 - n + 1L
res - numeric(n1 * n1)
dim(res) - c(n1, n1)
bw - n:1L ## bw = backward, fw = forward
fw - seq_len(n)
res[cbind(fw, fw + 1L)] - bw
res[cbind(fw + 1L, fw)] - fw
res
 }

 system.time(K1 - f1(n))
 system.time(K3 - f3(n))
 identical(K3, K1)

 ##user  system elapsed
 ##   0.132   0.028   0.161
 ##
 ##user  system elapsed
 ##   0.024   0.048   0.071
 ##

Using some of your code in my function I was able to speed up my function f2.
Complete code:

f1 - function(n) { #Ravi
Kacmat - matrix(0, n+1, n+1)
for (i in 1:n) Kacmat[i, i+1] - n - i + 1
for (i in 1:n) Kacmat[i+1, i] - i
Kacmat
}

f2 - function(n) { # Berend 1 modified to use 1L
Kacmat - matrix(0, n+1, n+1)
Kacmat[row(Kacmat)==col(Kacmat)-1L] - n:1L
Kacmat[row(Kacmat)==col(Kacmat)+1L] - 1L:n
Kacmat
}

f3 - function(n) { # Enrico
   n1 - n + 1L
   res - numeric(n1 * n1)
   dim(res) - c(n1, n1)
   bw - n:1L ## bw = backward, fw = forward
   fw - seq_len(n)
   res[cbind(fw, fw + 1L)] - bw
   res[cbind(fw + 1L, fw)] - fw
   res
}

f4 - function(n) {# Berend 2 using which with arr.ind=TRUE
Kacmat - matrix(0, n+1, n+1)
k1 - which(row(Kacmat)==col(Kacmat)-1L, arr.ind=TRUE)
k2 - which(row(Kacmat)==col(Kacmat)+1L, arr.ind=TRUE)

Kacmat[k1] - n:1L
Kacmat[k2] - 1L:n
Kacmat
}

library(compiler)

f1.c - cmpfun(f1)
f2.c - cmpfun(f2)
f3.c - cmpfun(f3)
f4.c - cmpfun(f4)

f1(n)
f2(n)
n - 5000

system.time(K1 - f1(n))
system.time(K2 - f2(n))
system.time(K3 - f3(n))
system.time(K4 - f4(n))

system.time(K1c - f1.c(n))
system.time(K2c - f2.c(n))
system.time(K3c - f3.c(n))
system.time(K4c - f4.c(n))
identical(K2,K1)
identical(K3,K1)
identical(K4,K1)
identical(K1c,K1)
identical(K2c,K2)
identical(K3c,K3)
identical(K4c,K4)

Result:

#  system.time(K1 - f1(n))
#user  system elapsed
#   0.387   0.120   0.511
#  system.time(K2 - f2(n))
#user  system elapsed
#   3.541   0.702   4.250
#  system.time(K3 - f3(n))
#user  system elapsed
#   0.108   0.089   0.199
#  system.time(K4 - f4(n))
#user  system elapsed
#   1.975   0.355   2.336
# 
#  system.time(K1c - f1.c(n))
#user  system elapsed
#   0.323   0.120   0.445
#  system.time(K2c - f2.c(n))
#user  system elapsed
#   3.374   0.422   3.807
#  system.time(K3c - f3.c(n))
#user  system elapsed
#   0.107   0.098   0.205
#  system.time(K4c - f4.c(n))
#user  system elapsed
#   1.816   0.384   2.203
#  identical(K2,K1)
# [1] TRUE
#  identical(K3,K1)
# [1] TRUE
#  identical(K4,K1)
# [1] TRUE
#  identical(K1c,K1)
# [1] TRUE
#  identical(K2c,K2)
# [1] TRUE
#  identical(K3c,K3)
# [1] TRUE
#  identical(K4c,K4)
# [1] TRUE

So Ravi's original and Enrico's versions are the quickest.
Using which with arr.ind made  my version run a lot quicker.

All in all an interesting exercise.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with merge function

2013-04-26 Thread arun
Hi,
From the output you wanted, it looks like:
library(plyr)
join(x1,x2,type=right)
#Joining by: State_prov, Shape_name, bob2009, bob2010
 #  State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009
#1 Nova Scotia  Annapolis   0   0   1   0  10
#2 Nova Scotia Antigonish   0   0   0   1   1
#3 Nova Scotia    Gly   0   0  NA   2   1
 merge(x1,x2,all.y=TRUE)
#   State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009
#1 Nova Scotia  Annapolis   0   0   1   0  10
#2 Nova Scotia Antigonish   0   0   0   1   1
#3 Nova Scotia    Gly   0   0  NA   2   1

A.K.







From: Catarina Ferreira catferre...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Friday, April 26, 2013 2:23 PM
Subject: Re: [R] Help with merge function



Hello,

I didn't realize that the format had been changed after I sent the email. I'm 
sending you the original mail in attach in a word with the correct format, 
since I don't think your answer is the one I'm looking for, likely due to the 
erroneous format.

Thank you again for your help.




On Fri, Apr 26, 2013 at 2:11 PM, arun smartpink...@yahoo.com wrote:

Hi,

The format is bit messed up. 
So, not sure this is what you wanted.

x1- read.table(text=State_prov,Shape_name,bob2009,bob2010,bob2011
Nova Scotia,Annapolis,0,0,1
Nova Scotia,Antigonish,0,0,0
Nova Scotia,Gly,NA,NA,NA
,sep=,,header=TRUE,stringsAsFactors=FALSE)

x2- read.table(text=
FID,State_prov,Shape_name,bob2009,bob2010,coy2009
0,Nova Scotia,Annapolis,0,0,10
1,Nova Scotia,Antigonish,0,0,1
2,Nova Scotia,Gly,0,0,1
,sep=,,header=TRUE,stringsAsFactors=FALSE)
 merge(x1,x2,all=TRUE)
#   State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009
#1 Nova Scotia  Annapolis   0   0   1   0  10
#2 Nova Scotia Antigonish   0   0   0   1   1
#3 Nova Scotia    Gly   0   0  NA   2   1
#4 Nova Scotia    Gly  NA  NA  NA  NA  NA






- Original Message -
From: Catarina Ferreira catferre...@gmail.com
To: r-help@r-project.org
Cc:
Sent: Friday, April 26, 2013 1:10 PM
Subject: [R] Help with merge function

Dear all,

I'm trying to merge 2 dataframes, but I'm not being entirely successful and
I can't understand why.

Dataframe x1

State_prov     Shape_name   bob2009   bob 2010   bob2011
Nova Scotia    Annapolis         0                  0              1
Nova Scotia    Antigonish        0                  0              0
Nova Scotia    Gly                   NA               NA             NA

Dataframe x2 - has 2 rows and 193 variables, contains one important
field which is FID that is a link to a shapefile (this is not in x1) and
shares common columns with x1, like this:

FID     State_prov     Shape_name   bob2009   bob 2010  coy 2009
0        Nova Scotia    Annapolis         0
0              10
1        Nova Scotia    Antigonish        0
0              1
2        Nova Scotia    Gly                   0
0              1

So when I do

x3  - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE)

it should do the trick. The thing is that it works for the columns (it adds
all the new columns not common to both dataframes), but it also adds the
rows. This is what I get (x3):

FID     State_prov     Shape_name   bob2009   bob 2010  coy 2009   bob2011
0        Nova Scotia    Annapolis         0
0              10            NA
NA      Nova Scotia    Annapolis         NA               NA          NA
            1
1        Nova Scotia    Antigonish        0
0              1               NA
NA      Nova Scotia    Antigonish        NA               NA          NA
            0
2        Nova Scotia    Gly                   0
0              1               NA
NA      Nova Scotia    Gly                   NA               NA
NA             NA

What I want to get is a true merge, like this:

FID     State_prov     Shape_name   bob2009   bob 2010  coy 2009   bob2011
0        Nova Scotia    Annapolis         0
0              10            1
1        Nova Scotia    Antigonish        0
0              1               0
2        Nova Scotia    Gly                   0
0              1               NA

Can anybody please help me to understand what I'm doing wrong.
Any help will be much appreciated!!


--
Catarina C. Ferreira, PhD

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




-- 
Catarina C. Ferreira, PhD
Post-doctoral Research Fellow
Department of Biology
Trent University
Peterborough, ON Canada
URL: http://www.researcherid.com/rid/A-3898-2011

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do 

Re: [R] Read big data (3G ) methods ?

2013-04-26 Thread Kevin Hao
Hi Ye,

Thanks.

That is a good method. have any other methods instead of using database?

kevin


On Fri, Apr 26, 2013 at 1:58 PM, Ye Lin ye...@lbl.gov wrote:

 Have you think of build a database then then let R read it thru that db
 instead of your desktop?


 On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote:

 Hi all scientists,

 Recently, I am dealing with big data ( 3G  txt or csv format ) in my
 desktop (windows 7 - 64 bit version), but I can not read them faster,
 thought I search from internet. [define colClasses for read.table, cobycol
 and limma packages I have use them, but it is not so fast].

 Could you share your methods to read big data to R faster?

 Though this is an odd question, but we need it really.

 Any suggest appreciates.

 Thank you very much.


 kevin

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with merge function

2013-04-26 Thread arun
Hi,

Check whether this works.

Lines1-readLines(NS_update.txt)
x1-read.table(text=gsub('\',,Lines1),sep=,,header=TRUE,stringsAsFactors=FALSE)
 x2- read.table(data.txt,sep=,header=TRUE,stringsAsFactors=FALSE,fill=TRUE)
 dim(x2)
#[1] 34577   189
library(plyr) 
 res- join(x1,x2,type=right)
#Joining by: State_Prov, Shape_name, bob2009, bob2010, red2009, red2010, 
coy2009, coy2010, lyn2009, lyn2010
 dim(res)
#[1] 34577   193

 res2- merge(x1,x2,all.y=TRUE)
 dim(res2)
#[1] 34577   193
A.K.






From: Catarina Ferreira catferre...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Friday, April 26, 2013 4:20 PM
Subject: Re: [R] Help with merge function



here they are. As you see the NS_update is data for only 1 province and I want 
it to add this data to the bigger file (data), merging the common columns and 
adding the new columns. But what it is doing is duplicating the rows in the 
bigger file that correspond to NS_update, as well as creating the new columns 
(this is ok).




On Fri, Apr 26, 2013 at 4:16 PM, arun smartpink...@yahoo.com wrote:

You can send the files. 








From: Catarina Ferreira catferre...@gmail.com
To: arun smartpink...@yahoo.com
Sent: Friday, April 26, 2013 4:15 PM

Subject: Re: [R] Help with merge function



is it ok if I send you the files, it's probably better for you to understand 
me. It didn't work on my files.




On Fri, Apr 26, 2013 at 4:12 PM, arun smartpink...@yahoo.com wrote:

Hi,

 I am not sure what is the problem.  I used the datasets your provided x1 
and x2.  I got the result that was shown in the output your desired.
Are you saying that this didn't worked in your original dataset or the one 
your provided?  In that case, could you dput(dataset,20)?









From: Catarina Ferreira catferre...@gmail.com
To: arun smartpink...@yahoo.com
Sent: Friday, April 26, 2013 4:01 PM

Subject: Re: [R] Help with merge function



Thank you. It still isn't working. Thank you in any case.




On Fri, Apr 26, 2013 at 2:31 PM, arun smartpink...@yahoo.com wrote:

Hi,
From the output you wanted, it looks like:
library(plyr)
join(x1,x2,type=right)
#Joining by: State_prov, Shape_name, bob2009, bob2010

 #  State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009
#1 Nova Scotia  Annapolis   0   0   1   0  10
#2 Nova Scotia Antigonish   0   0   0   1   1
#3 Nova Scotia    Gly   0   0  NA   2   1
 merge(x1,x2,all.y=TRUE)

#   State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009
#1 Nova Scotia  Annapolis   0   0   1   0  10
#2 Nova Scotia Antigonish   0   0   0   1   1
#3 Nova Scotia    Gly   0   0  NA   2   1

A.K.








From: Catarina Ferreira catferre...@gmail.com
To: arun smartpink...@yahoo.com
Sent: Friday, April 26, 2013 2:23 PM
Subject: Re: [R] Help with merge function




Hello,

I didn't realize that the format had been changed after I sent the email. 
I'm sending you the original mail in attach in a word with the correct 
format, since I don't think your answer is the one I'm looking for, likely 
due to the erroneous format.

Thank you again for your help.




On Fri, Apr 26, 2013 at 2:11 PM, arun smartpink...@yahoo.com wrote:

Hi,

The format is bit messed up. 
So, not sure this is what you wanted.

x1- read.table(text=State_prov,Shape_name,bob2009,bob2010,bob2011
Nova Scotia,Annapolis,0,0,1
Nova Scotia,Antigonish,0,0,0
Nova Scotia,Gly,NA,NA,NA
,sep=,,header=TRUE,stringsAsFactors=FALSE)

x2- read.table(text=
FID,State_prov,Shape_name,bob2009,bob2010,coy2009
0,Nova Scotia,Annapolis,0,0,10
1,Nova Scotia,Antigonish,0,0,1
2,Nova Scotia,Gly,0,0,1
,sep=,,header=TRUE,stringsAsFactors=FALSE)
 merge(x1,x2,all=TRUE)
#   State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009
#1 Nova Scotia  Annapolis   0   0   1   0  10
#2 Nova Scotia Antigonish   0   0   0   1   1
#3 Nova Scotia    Gly   0   0  NA   2   1
#4 Nova Scotia    Gly  NA  NA  NA  NA  NA






- Original Message -
From: Catarina Ferreira catferre...@gmail.com
To: r-help@r-project.org
Cc:
Sent: Friday, April 26, 2013 1:10 PM
Subject: [R] Help with merge function

Dear all,

I'm trying to merge 2 dataframes, but I'm not being entirely successful and
I can't understand why.

Dataframe x1

State_prov     Shape_name   bob2009   bob 2010   bob2011
Nova Scotia    Annapolis         0                  0              1
Nova Scotia    Antigonish        0                  0              0
Nova Scotia    Gly                   NA               NA             NA

Dataframe x2 - has 2 rows and 193 variables, contains one important
field which is FID that is a link to a shapefile (this is not in x1) and
shares common columns with x1, like this:

FID     State_prov     Shape_name   bob2009   bob 2010  coy 2009
0        Nova 

  1   2   >