Re: [R] pls package - validation

2017-02-07 Thread Bert Gunter
I think this wants a statistical discussion, which is OT here.
stats.stackexchange.com would be a better place to post for that.

However, if I understand correctly, using pls or anything else to try
to fit (some combination of) 501 variables to 16 data points -- and
then crossvalidate with 6 data points -- is utter nonsense. You just
have a fancy random number generator!

As I said, I think it better to follow up or complain about me on
stackexchange rather than here.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Feb 7, 2017 at 4:49 PM, Ladislav Rozkošný
 wrote:
>
>
>
> Hi,
>
>
>
>
> I'm trying to fit PLSR model in R with 'pls' package with 22 samples (16
> train, 6 test). I know that basic for considering of number of component is
> cross-validation (in my case 'LOO') and then I should choose number of
> component with minimum of RMSEP (or first minimum). But problem is that
> values of RMSEP is increasing (not the opposite). Should I choose only 1
> component?
>
>
>
>
> And then I tried compute R2 with my test-dataset (6 samples) and I received
> nonsensical values (below 0, bigger then 1).
>
> Do you have any idea what may be caused? If it's my problem with fitting or
> problem with datasets.
>
>
>
>
> Below, you can see my results:
>
>
>
>
>>pH.spec<-plsr(pH ~ spec, data=soil.train, validation="LOO")
>
>>summary(pH.spec)
>
> Data: X dimension: 16 501
> Y dimension: 16 1
> Fit method: kernelpls
> Number of components considered: 14
>
> VALIDATION: RMSEP
> Cross-validated using 16 leave-one-out segments.
>(Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7
> comps  8 comps  9 comps  10 comps  11 comps
> CV  0.5343   0.5435   0.55061.6291.6171.7421.921
> 1.9791.9771.971 1.972 1.972
> adjCV   0.5343   0.5419   0.54861.5871.5701.6881.860
> 1.9161.9141.908 1.910 1.909
>12 comps  13 comps  14 comps
> CV1.972 1.972 1.972
> adjCV 1.909 1.909 1.909
>
> TRAINING: % variance explained
> 1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
> 9 comps  10 comps  11 comps  12 comps
> X96.410   99.65599.8799.9099.9399.9499.9599.96
>   99.96 99.97 99.98 99.99
> pH3.6498.34219.4167.4888.9697.1999.6999.94
>   99.99100.00100.00100.00
> 13 comps  14 comps
> X  99.99   100
> pH100.00   100
>
>
>
>
>> R2(pH.spec, newdata = soil.test)
> (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps
>   6 comps  7 comps  8 comps
>-1.65763 -0.60849 -0.05253 -0.72870 -2.84718 -2.34102
>  -3.28201 -3.68611 -3.69817
> 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
>
>-3.77271 -3.74585 -3.76342 -3.76074 -3.76110 -3.76115
>
>
>
>
>
>
> Thank you in advance for your help
>
>
>
>
>
>
> =
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] pls package - validation

2017-02-07 Thread Ladislav Rozkošný



Hi,




I'm trying to fit PLSR model in R with 'pls' package with 22 samples (16 
train, 6 test). I know that basic for considering of number of component is 
cross-validation (in my case 'LOO') and then I should choose number of 
component with minimum of RMSEP (or first minimum). But problem is that 
values of RMSEP is increasing (not the opposite). Should I choose only 1 
component?




And then I tried compute R2 with my test-dataset (6 samples) and I received 
nonsensical values (below 0, bigger then 1).

Do you have any idea what may be caused? If it's my problem with fitting or 
problem with datasets.




Below, you can see my results:




>pH.spec<-plsr(pH ~ spec, data=soil.train, validation="LOO")

>summary(pH.spec)

Data:     X dimension: 16 501 
    Y dimension: 16 1
Fit method: kernelpls
Number of components considered: 14

VALIDATION: RMSEP
Cross-validated using 16 leave-one-out segments.
   (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 
comps  8 comps  9 comps  10 comps  11 comps
CV  0.5343   0.5435   0.5506    1.629    1.617    1.742    1.921    
1.979    1.977    1.971 1.972 1.972
adjCV   0.5343   0.5419   0.5486    1.587    1.570    1.688    1.860    
1.916    1.914    1.908 1.910 1.909
   12 comps  13 comps  14 comps
CV    1.972 1.972 1.972
adjCV 1.909 1.909 1.909

TRAINING: % variance explained
    1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps  
9 comps  10 comps  11 comps  12 comps
X    96.410   99.655    99.87    99.90    99.93    99.94    99.95    99.96  
  99.96 99.97 99.98 99.99
pH    3.649    8.342    19.41    67.48    88.96    97.19    99.69    99.94  
  99.99    100.00    100.00    100.00
    13 comps  14 comps
X  99.99   100
pH    100.00   100




> R2(pH.spec, newdata = soil.test)
(Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps
  6 comps  7 comps  8 comps  
   -1.65763 -0.60849 -0.05253 -0.72870 -2.84718 -2.34102
 -3.28201 -3.68611 -3.69817  
    9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
  
   -3.77271 -3.74585 -3.76342 -3.76074 -3.76110 -3.76115
  





Thank you in advance for your help






=
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gaussian Filter

2017-02-07 Thread Kwesi Quagraine
Hello Catalin, you could have a look on this link first for ideas in
constructing your own script.

http://stackoverflow.com/questions/7105962/how-do-i-run-a-high-pass-or-low-pass-filter-on-data-points-in-r

Cheers!
Kwesi

On Tue, Feb 7, 2017 at 9:30 PM, Bert Gunter  wrote:

> Please do your "homework" before posting!
>
> Either:
>
> https://cran.r-project.org/web/views/TimeSeries.html
>
> or search: e.g. "bandpass filter" on rseek.org
>
>
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Feb 7, 2017 at 9:51 AM, catalin roibu 
> wrote:
> > Dear all!
> >
> > Please help me with a script or package to compute a Gaussian filter. I
> > have a time series (like average mean temperature from 1901-2014) and I
> > want to extract low, high and band pass frequencies using a Gaussian
> filter
> > with 32 years window.
> >
> > Thank you very much!
> >
> > Best regards!
> >
> > Catalin
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Try not to become a man of success but rather a man of value-Albert Einstein

University of Cape Coast|College of Agriculture and Natural Sciences|Department
of Physics|
Team Leader|Recycle Up! Ghana|Technology Without Borders|
Other emails: kwesi.quagra...@ucc.edu.gh|kwesi.quagra...@teog.de|
Mobile: +233266173582
Skype: quagraine_cwasi
Twitter: @Pkdilly

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gaussian Filter

2017-02-07 Thread Bert Gunter
Please do your "homework" before posting!

Either:

https://cran.r-project.org/web/views/TimeSeries.html

or search: e.g. "bandpass filter" on rseek.org



Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Feb 7, 2017 at 9:51 AM, catalin roibu  wrote:
> Dear all!
>
> Please help me with a script or package to compute a Gaussian filter. I
> have a time series (like average mean temperature from 1901-2014) and I
> want to extract low, high and band pass frequencies using a Gaussian filter
> with 32 years window.
>
> Thank you very much!
>
> Best regards!
>
> Catalin
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rms::latex.anova broken?

2017-02-07 Thread Kevin E. Thorpe
I am re-running some logistic regression analyses using lrm from the rms 
package but latex(anova(...)) appears to be broken on my system.


Here is some anova() output followed by the latex() error for two models 
since the error changes. My sessionInfo() follows the other output. I 
have updated all my packages and re-installed Hmisc and rms plus 
dependencies. The only thing I haven't done yet is update R completely. 
Has anyone else encountered this and know how to solve it?


> anova(full)
Wald Statistics  Response: id14

 Factor  Chi-Square d.f. P
 birthweight_kilo 0.87   1   0.3517
 ageinmonth   4.12   1   0.0423
 zbmi 6.49   1   0.0108
 maxtbf  15.16   1   0.0001
 cowsmilk 6.54   1   0.0106
 Male 1.96   1   0.1611
 multivitamin 0.76   1   0.3819
 bottleuse0.13   1   0.7194
 preterm  0.50   1   0.4811
 AGEINTRO_cowsmilk0.06   1   0.8032
 AGEINTRO_complefood  0.61   1   0.4356
 TOTAL   30.49  11   0.0013
> latex(anova(full),file="",table.env=FALSE,booktabs=TRUE)
Error in ifelse(sn %nin% c("d.f.", "MS", "Partial SS"), math(sn), sn) :
  could not find function "math"
> anova(full.nl)
Wald Statistics  Response: id14

 Factor  Chi-Square d.f. P
 birthweight_kilo 3.68   2   0.1588
  Nonlinear   2.65   1   0.1037
 ageinmonth  16.25   2   0.0003
  Nonlinear  13.45   1   0.0002
 zbmi 4.07   2   0.1310
  Nonlinear   0.23   1   0.6323
 maxtbf  15.81   2   0.0004
  Nonlinear   2.57   1   0.1092
 cowsmilk 3.34   2   0.1880
  Nonlinear   1.16   1   0.2821
 Male 1.21   1   0.2711
 multivitamin 0.57   1   0.4494
 bottleuse0.06   1   0.8100
 preterm  0.01   1   0.9418
 AGEINTRO_cowsmilk3.65   2   0.1612
  Nonlinear   3.28   1   0.0700
 AGEINTRO_complefood  5.40   2   0.0671
  Nonlinear   4.00   1   0.0455
 TOTAL NONLINEAR 25.41   7   0.0006
 TOTAL   52.13  18   <.0001
> latex(anova(full.nl),file="",table.env=FALSE,booktabs=TRUE)
Error in paste0(specs$lspace, specs$italics(substring(rowl, 2)), sep = 
"") :

  attempt to apply non-function

> sessionInfo()
R version 3.2.3 Patched (2016-01-31 r70055)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Slackware 14.2

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] rms_5.1-0   SparseM_1.74Hmisc_4.0-2 ggplot2_2.2.1
[5] Formula_1.2-1   survival_2.40-1 lattice_0.20-34 knitr_1.15.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9 RColorBrewer_1.1-2  plyr_1.8.4
 [4] base64enc_0.1-3 tools_3.2.3 rpart_4.1-10
 [7] digest_0.6.12   polspline_1.1.12tibble_1.2
[10] gtable_0.2.0htmlTable_1.9   checkmate_1.8.2
[13] nlme_3.1-131Matrix_1.2-8mvtnorm_1.0-5
[16] gridExtra_2.2.1 stringr_1.1.0   cluster_2.0.5
[19] htmlwidgets_0.8 MatrixModels_0.4-1  grid_3.2.3
[22] nnet_7.3-12 data.table_1.10.4   foreign_0.8-67
[25] multcomp_1.4-6  TH.data_1.0-8   latticeExtra_0.6-28
[28] magrittr_1.5codetools_0.2-15MASS_7.3-45
[31] scales_0.4.1backports_1.0.5 htmltools_0.3.5
[34] splines_3.2.3   assertthat_0.1  colorspace_1.3-2
[37] quantreg_5.29   sandwich_2.3-4  stringi_1.1.2
[40] acepack_1.4.1   lazyeval_0.2.0  munsell_0.4.3
[43] zoo_1.7-14


--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's Hospital
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Gaussian Filter

2017-02-07 Thread catalin roibu
Dear all!

Please help me with a script or package to compute a Gaussian filter. I
have a time series (like average mean temperature from 1901-2014) and I
want to extract low, high and band pass frequencies using a Gaussian filter
with 32 years window.

Thank you very much!

Best regards!

Catalin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginner needs help with R

2017-02-07 Thread Bert Gunter
Yes, I was replying to the OP's query **as stated.** I try to avoid
guessing what the OP really *meant*, although I grant that sometimes
this may be necessary.

But do note that the leading 0's in seq() *are* unnecessary:

> sprintf("%02d",1:3)
[1] "01" "02" "03"


Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Feb 7, 2017 at 8:48 AM, Ted Harding  wrote:
> Bert, your solution seems to presuppose that the programmer
> knows beforehand that the leading digit in the number is "0"
> (which in fact is clearly the case in Nabila Arbi's original
> query). However, the sequence might arise from some process
> outside of the progammer's contgrol, and may then either have
> a leading 0 or not.In that case, I think Jim's solution is safer!
> Best wishes,
> Ted.
>
>
> On 07-Feb-2017 16:02:18 Bert Gunter wrote:
>> No need for sprintf(). Simply:
>>
>>> paste0("DQ0",seq.int(60054,60060))
>>
>> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
>> [7] "DQ060060"
>>
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Mon, Feb 6, 2017 at 5:45 AM, jim holtman  wrote:
>>> You need the leading zeros, and 'numerics' just give the number without
>>> leading zeros.  You can use 'sprintf' for create a character string with
>>> the leading zeros:
>>>
 # this is using 'numeric' and drops leading zeros

 seq1 <- paste("DQ", seq(060054, 060060), sep = "")
 seq1
>>> [1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060"

 # use 'sprintf' to create leading zeros
 seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060)))
 seq2
>>> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
>>> "DQ060060"

>>>
>>>
>>> Jim Holtman
>>> Data Munger Guru
>>>
>>> What is the problem that you are trying to solve?
>>> Tell me what you want to do, not how you want to do it.
>>>
>>> On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi 
>>> wrote:
>>>
 Dear R-Help Team!

 I have some trouble with R. It's probably nothing big, but I can't find a
 solution.
 My problem is the following:
 I am trying to download some sequences from ncbi using the ape package.

 seq1 <- paste("DQ", seq(060054, 060060), sep = "")

 sequences <- read.GenBank(seq1,
 seq.names = seq1,
 species.names = TRUE,
 gene.names = FALSE,
 as.character = TRUE)

 write.dna(sequences, "mysequences.fas", format = "fasta")

 My problem is, that R doesn't take the whole sequence number as "060054"
 but it puts it as DQ60054 (missing the zero in the beginning, which is
 essential).

 Could please tell me, how I can get R to accepting the zero in the
 beginning of the accession number?

 Thank you very much in advance and all the best!

 Nabila

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> -
> E-Mail: (Ted Harding) 
> Date: 07-Feb-2017  Time: 16:48:41
> This message was sent by XFMail
> -

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginner needs help with R

2017-02-07 Thread Ted Harding
Bert, your solution seems to presuppose that the programmer
knows beforehand that the leading digit in the number is "0"
(which in fact is clearly the case in Nabila Arbi's original
query). However, the sequence might arise from some process
outside of the progammer's contgrol, and may then either have
a leading 0 or not.In that case, I think Jim's solution is safer!
Best wishes,
Ted.


On 07-Feb-2017 16:02:18 Bert Gunter wrote:
> No need for sprintf(). Simply:
> 
>> paste0("DQ0",seq.int(60054,60060))
> 
> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
> [7] "DQ060060"
> 
> 
> Cheers,
> Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Mon, Feb 6, 2017 at 5:45 AM, jim holtman  wrote:
>> You need the leading zeros, and 'numerics' just give the number without
>> leading zeros.  You can use 'sprintf' for create a character string with
>> the leading zeros:
>>
>>> # this is using 'numeric' and drops leading zeros
>>>
>>> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>>> seq1
>> [1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060"
>>>
>>> # use 'sprintf' to create leading zeros
>>> seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060)))
>>> seq2
>> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
>> "DQ060060"
>>>
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi 
>> wrote:
>>
>>> Dear R-Help Team!
>>>
>>> I have some trouble with R. It's probably nothing big, but I can't find a
>>> solution.
>>> My problem is the following:
>>> I am trying to download some sequences from ncbi using the ape package.
>>>
>>> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>>>
>>> sequences <- read.GenBank(seq1,
>>> seq.names = seq1,
>>> species.names = TRUE,
>>> gene.names = FALSE,
>>> as.character = TRUE)
>>>
>>> write.dna(sequences, "mysequences.fas", format = "fasta")
>>>
>>> My problem is, that R doesn't take the whole sequence number as "060054"
>>> but it puts it as DQ60054 (missing the zero in the beginning, which is
>>> essential).
>>>
>>> Could please tell me, how I can get R to accepting the zero in the
>>> beginning of the accession number?
>>>
>>> Thank you very much in advance and all the best!
>>>
>>> Nabila
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) 
Date: 07-Feb-2017  Time: 16:48:41
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginner needs help with R

2017-02-07 Thread Bert Gunter
No need for sprintf(). Simply:

> paste0("DQ0",seq.int(60054,60060))

[1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
[7] "DQ060060"


Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Feb 6, 2017 at 5:45 AM, jim holtman  wrote:
> You need the leading zeros, and 'numerics' just give the number without
> leading zeros.  You can use 'sprintf' for create a character string with
> the leading zeros:
>
>> # this is using 'numeric' and drops leading zeros
>>
>> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>> seq1
> [1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060"
>>
>> # use 'sprintf' to create leading zeros
>> seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060)))
>> seq2
> [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
> "DQ060060"
>>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi 
> wrote:
>
>> Dear R-Help Team!
>>
>> I have some trouble with R. It's probably nothing big, but I can't find a
>> solution.
>> My problem is the following:
>> I am trying to download some sequences from ncbi using the ape package.
>>
>> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>>
>> sequences <- read.GenBank(seq1,
>> seq.names = seq1,
>> species.names = TRUE,
>> gene.names = FALSE,
>> as.character = TRUE)
>>
>> write.dna(sequences, "mysequences.fas", format = "fasta")
>>
>> My problem is, that R doesn't take the whole sequence number as "060054"
>> but it puts it as DQ60054 (missing the zero in the beginning, which is
>> essential).
>>
>> Could please tell me, how I can get R to accepting the zero in the
>> beginning of the accession number?
>>
>> Thank you very much in advance and all the best!
>>
>> Nabila
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot: restricting legend with multiple geoms and nested groups

2017-02-07 Thread Szumiloski, John
Dear useRs:

I am having difficulty understanding how to make a legend in ggplot when I only 
want certain geoms to be indicated, in the presence of nested groups.  An 
example:

require(tidyverse)

  dat <- tibble(X=rep(seq(4),3),
# fake data
Y=c(-1.11, -0.46, 0.02, 0.81,
-0.51,  0.43, 0.73, 1.39,
-0.12,  0.62, 1.19, 1.99
),
G1=rep(seq(3), each=4) %>% factor)

  dat <- dat %>% mutate(lin=predict(lm(Y~X*G1, dat)),
quad=predict(lm(Y~poly(X,2)*G1, dat)))


Now dat contains one grouping variable, G1,  the X and Y data, and two columns 
of fitted values.  I want to consolidate the fitted value columns for plotting:


  # stack model fits: make wide -> long
  dat <- dat %>% gather(lin:quad, key="Model", value="fit", factor_key=TRUE)

Thus the model variable acts as another grouping variable.

I want to plot the two fits over the raw data, with the raw data grouped and 
annotated by G1, and the fits grouped by model*G1 but annotated by only model.  
Thus each G1 will have separately annotated model fits plotted, but the same 
model annotations will be the same for all levels of G1.  Here is the code that 
I thought would do this.


  # init plot
  pl <- ggplot(data=dat, mapping=aes(x=X, y=Y, group=interaction(Model,G1))) +  
theme_bw()

# add raw data in background
  pl <- pl +
geom_path(data=dat %>% filter(Model=='lin'),  # filter probably not 
necessary but prevents redundant overplotting
  mapping=aes(x=X, y=Y, group=G1, color=G1), linetype=2, 
show.legend=FALSE) +
geom_point(data=dat %>% filter(Model=='lin'),
   mapping=aes(x=X, y=Y, group=G1, color=G1, shape=G1), 
show.legend=FALSE)

# add fits
pl <- pl + geom_path(aes(x=X, y=fit, group=interaction(Model,G1), 
color=Model) )
pl


The plot looks as I want it.  But the legend is titled G1, and has the levels 
of G1 in the legend (as well as the desired Model levels).  But I thought 
turning off the show.legend argument in the raw data geoms would prevent this.  
What I desire in the legend is only the two levels of Model (and titled as 
such).

Any assistance greatly appreciated.
John
John Szumiloski, Ph.D.
Principal Scientist, Statistician
Pharmaceutical Development / Analytical and Bioanalytical 
Operations
NBR105-1-1411

Bristol-Myers Squibb
P.O. Box 191
1 Squibb Drive
New Brunswick, NJ
08903-0191

(732) 227-7167



This message (including any attachments) may contain con...{{dropped:19}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.