[R] Summer student internship placement at University of York / YCCSA / SEI (paid)

2011-04-29 Thread Corrado Topi
Dear R-lings,

I did not know which list to post to, because it is a studentship so not 
really a job, so it did not fit the r-sig-jobs list  and it is about 
devloping an extension package interfaced with R  I hope I did not upset 
anyone. If so apologies.

The Centre For Complex systems Analysis at the University of York (YCCSA) in 
UK in collaboration with Stockholm Environment Institute is looking for a 
highly motivated student in Computer Science, Applied Mathematics, Applied 
Statistics or related fields for a 10 weeks paid student internship over the 
summer 2011, starting in july,  to collaborate in development of a R package. 
The student will participate in research projects to develop prototypes for 
toolkits for statistical predictions of diversity and dissimilarity and the 
generation of spatial landscapes, with applications in the biological and 
environmental sciences. We require excellent development skills and experience 
in CUDA/openCL, and a strong foundation in Computing, Statistics / Applied 
Mathematics and COmputer Graphics. We need an excellent problem solver, able 
to innovate, find solutions and work independently.

For further information on the project please contact ct...@york.ac.uk or go 
to http://www.york.ac.u...2011/201107.pdf

For further information on the studentship programme please look at 
http://www.york.ac.u...olarships.html.

Please send your application not later than the 13 of may to 
scholarsh...@yccsa.org as one single pdf document including:

1. Your CV (max 2 pages)
2. A brief personal statement (max 1 page) including:
* Which project(s) you are interested in (as many as you like but in 
preference order)
* Your reasons for applying
* Your academic interest
* Your future aspirations
3. A full written academic reference (not just contact details). Your 
application will not be accepted without this reference (max 1 page). 

Best,
-- 
Corrado Topi

Stockholm Environment Institute

Mob: +44 (0) 7769 601784
Tel: +44 (0) 1904 322893
Skype: corrado-eeos
Website:  sei-international.org

University of York
York YO10 5DD
UK

Fax: +44 (0) 1904 322898

EMAIL DISCLAIMER: http://www.york.ac.uk/docs/disclaimer/email.htm

-- 
Corrado Topi

Stockholm Environment Institute

Mob: +44 (0) 7769 601784
Tel: +44 (0) 1904 322893
Skype: corrado-eeos
Website:  sei-international.org

University of York
York YO10 5DD
UK

Fax: +44 (0) 1904 322898

EMAIL DISCLAIMER: http://www.york.ac.uk/docs/disclaimer/email.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error singular gradient matrix at initial parameter estimates in nls

2010-03-31 Thread Corrado

Dear JN, Bert,

1) It is not a perfect fit. I do not think I have ever said that. I said 
that an external algorithms fits the model without any problems: with ~ 
500,000 data points and 19 paramters (ki in the original equation), it 
fits the model in less than 1 second. The data are not artificial data. 
The variables are independent (pi in the original model). The solution 
is unique and the rapidity of convergence is practically independent 
from the selection of start conditions (with a reasonable selection of 
start conditions at least). The resulting residuals are approximately 
normally distributed with mean 0 and sd ~ 4.23.


2) I agree with the comment of Bert on over-parametrization, but again 
the model is not overparamterised, and it is identifiable (in part 
answered already in (1))


Regards


Prof. John C Nash wrote:
If you have a perfect fit, you have zero residuals. But in the nls 
manual page we have:



Warning:

 *Do not use ‘nls’ on artificial zero-residual data.*


So this is a case of complaining that your diesel car is broken 
because you ignored the Diesel fuel only sign on the filler cap and 
put in gasoline.


However I've not been happy with this choice in the code of nls -- 
it's been there a long time -- and my own codes from 1974 onwards have 
always handled zero residual cases. I do believe that the code could 
at least give a better diagnostic message. Zero residuals -- perfect 
fits -- arise when one is interested more or less in an interpolating 
function rather than doing statistics, and I can understand the 
reluctance of statisticians to countenance such a use of nls.


And Bert's comment on overparametrization is almost certainly valid also.

JN




--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error singular gradient matrix at initial parameter estimates in nls

2010-03-30 Thread Corrado

I am using nls to fit a non linear function to some data.

The non linear function is:

y= 1- exp(-(k0+k1*p1+  + kn*pn))

I have chosen algorithm port, with lower boundary is 0 for all of the 
ki parameters, and I have tried many start values for the parameters ki 
(including generating them at random).


If I fit the non linear function to the same data using an external 
algorithm, it fits perfectly and finds the parameters.


As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 
bit), I keep getting the error:


Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient 
matrix at initial parameter estimates


I have read all the previous postings and the documentation, but to no 
avail: the error is there to stay. I am sure the problem is with nls, 
because the external fitting algorithm perfectly fits it in less than a 
second. Also, if my n is 4, then the nls works perfectly (but that 
excludes all the k5  kn).


Can anyone help me with suggestions? Thanks in advance.

Alternatively, what do you suggest I should do? Shall I abandon nls in 
favour of optim?


Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Corrado

Dear friends,

I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is 
on proportion data.


I use glm(y~x1+,family=binomial)

y is a proportion in (0,1), and x is a real number.

I get the error:

In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

But that is exactly what was suggested in the book, where there is no 
mention of a similar warning. Where am I going wrong?


Here is the output:

 glm(response.prepared~x,data=,family=binomial)

Call:  glm(formula = response.prepared ~ x, family = binomial, data = )

Coefficients:
(Intercept)x 
   -0.3603   0.4480 


Degrees of Freedom: 510554 Total (i.e. Null);  510553 Residual
Null Deviance:  24420
Residual Deviance: 23240AIC: 700700
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!




Regards
--

Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error singular gradient matrix at initial parameter

2010-03-30 Thread Corrado

Hi Gabor,

same problem even using nls2 with method=brute-force to calculate the 
initial parameters.


Best,

Gabor Grothendieck wrote:

You could try method=brute-force in the nls2 package to find starting values.

On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote:
  

I am using nls to fit a non linear function to some data.

The non linear function is:

y= 1- exp(-(k0+k1*p1+  + kn*pn))

I have chosen algorithm port, with lower boundary is 0 for all of the ki
parameters, and I have tried many start values for the parameters ki
(including generating them at random).

If I fit the non linear function to the same data using an external
algorithm, it fits perfectly and finds the parameters.

As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit),
I keep getting the error:

Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient
matrix at initial parameter estimates

I have read all the previous postings and the documentation, but to no
avail: the error is there to stay. I am sure the problem is with nls,
because the external fitting algorithm perfectly fits it in less than a
second. Also, if my n is 4, then the nls works perfectly (but that excludes
all the k5  kn).

Can anyone help me with suggestions? Thanks in advance.

Alternatively, what do you suggest I should do? Shall I abandon nls in
favour of optim?

Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error singular gradient matrix at initial parameter

2010-03-30 Thread Corrado

Yes, of course. The problem still stays.

Gabor Grothendieck wrote:

Sorry, its algorithm=brute-force

On Tue, Mar 30, 2010 at 10:29 AM, Corrado ct...@york.ac.uk wrote:
  

Hi Gabor,

same problem even using nls2 with method=brute-force to calculate the
initial parameters.

Best,

Gabor Grothendieck wrote:


You could try method=brute-force in the nls2 package to find starting
values.

On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote:

  

I am using nls to fit a non linear function to some data.

The non linear function is:

y= 1- exp(-(k0+k1*p1+  + kn*pn))

I have chosen algorithm port, with lower boundary is 0 for all of the
ki
parameters, and I have tried many start values for the parameters ki
(including generating them at random).

If I fit the non linear function to the same data using an external
algorithm, it fits perfectly and finds the parameters.

As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64
bit),
I keep getting the error:

Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient
matrix at initial parameter estimates

I have read all the previous postings and the documentation, but to no
avail: the error is there to stay. I am sure the problem is with nls,
because the external fitting algorithm perfectly fits it in less than a
second. Also, if my n is 4, then the nls works perfectly (but that
excludes
all the k5  kn).

Can anyone help me with suggestions? Thanks in advance.

Alternatively, what do you suggest I should do? Shall I abandon nls in
favour of optim?

Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk






--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Corrado

Dear David,

David Winsemius wrote:
A) It is not an error, only a warning. Wouldn't it seem reasonable to 
issue such a warning if you have data that violates the distributional 
assumptions?
I am not questioning the approach. I am only trying to understand why a 
(rather expensive) source of documentation and the behaviour of a 
function are not aligned.



B) You did not include any of the data

Data attached as R object.
C) Wouldn't this be more appropriate to the author of the book if this 
is exactly what was suggested there?


I think it will be definitively appropriate, but only when I am certain 
I am not doing anything wrong.


Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Corrado

Dear Ruben

I am afraid not  the paragraph's title is a bit of a give away:

Proportion Data and Binomial Errors

The sentence reads:

  are dealt with by using a generalised linear model with a 
binomial error structure.


with the example:

glm(y~x,family=binomial)

You can check at page 514/515.

Rubén Roa wrote:

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Corrado
Enviado el: martes, 30 de marzo de 2010 16:52
Para: r-help@r-project.org
Asunto: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : 
non-integer #successes in a binomial glm!

Dear friends,

I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on 
proportion data.

I use glm(y~x1+,family=binomial)

y is a proportion in (0,1), and x is a real number.

I get the error:

In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

But that is exactly what was suggested in the book, where there is no mention 
of a similar warning. Where am I going wrong?

Here is the output:

  glm(response.prepared~x,data=,family=binomial)

Call:  glm(formula = response.prepared ~ x, family = binomial, data = )

Coefficients:
(Intercept)x 
-0.3603   0.4480 


Degrees of Freedom: 510554 Total (i.e. Null);  510553 Residual
Null Deviance:  24420
Residual Deviance: 23240AIC: 700700
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
 



Regards
  



--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Changing content of column in data.frame + efficient join extraction between 2 data.frames

2010-03-23 Thread Corrado

Dear R users,

I have 2 SpatialPointsDataFrame's, pcs and East.

The column str_1 in the first (pcs) is:

 pcs[0:4,]
  coordinates cat   str_1  int_1  int_2dbl_1 dbl_2
1 (101000, 263000)   1 SM06B 101000 263000 4.978915 -4.293668
2 (101000, 265000)   2 SM06C 101000 265000 4.960478 -4.266742
3 (101000, 267000)   3 SM06D 101000 267000 4.912984 -4.246849
4 (101000, 269000)   4 SM06E 101000 269000 4.613309 -4.185405


The column str_1 in the second (East) is:

 East[0:4,]
  coordinates str_1
1 (489000, 215000) sp81x
2 (489000, 217000) sp81y
3 (493000, 209000) sp90j
4 (495000, 209000) sp90p


I would like to do 2 things:

1) I would like to change the format of the column str_1
in the first to be the same that it is in the second,
that is I need to remove the inverted commas  and I need to
make it lower case.

2) I would like to extract the rows from the first one (pcs) where 
pcs$str_1

is the same as East$str_1.

I have even tried regexp, but cannot modify
the content of pcs$str_1 to remove
the inveretd commas  and change the case to lowercase.

How do I do that?

Regards
--

Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] S4: Multiple inheritance

2010-03-23 Thread Corrado

Dear Christophe,

Could you please post some example code of what you are trying to achieve?

Christophe Genolini wrote:

Hi all,

Working with S4 object, I definine two class foo1 and foo2. I define 
'[' (resp. '[-') for the two classes.

Then I define a third class foo3 that inherit from both foo1 and foo2.
Is there a way to make '[' (resp. '[-') for foo3 inherit from '[' 
(resp. '[-')  for foo1 and foo2?


Thanks
Christophe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Factor variables with GAM models

2010-03-20 Thread Corrado
You can some time manually substitute a categorical variable with a set 
of continuous variables.


For example, you have the variables like landcover.class with 3 values 
class A, class B, class C. You cna transform it into 3 continuous 
variables landcover.class.A, landcover.class.B, landcover.class.C and 
assign a value of 1 or 100% for elements belonging to that class or of 0 
for elements not belonging.


That help some time.

Regards

Noah Silverman wrote:

Steve,

I get that.  What you wrote make sense.

My challenge is the data I'm attempting to model.  Some of the 
variables are continuous, some are factors.  both linear and poisson 
models work. (Poisson doing a much more accurate job.)  However, some 
of the numerical variables are clearly non-linear.  Hence my interest 
in GAM.  I suppose one alternative would be to try some polynomial 
transformation on the variable as part of a Poisson model.


Any other suggestions would be welcome.

Thanks!

-N

On 3/19/10 8:37 PM, Steven McKinney wrote:

Hi Noah

GAM models were developed to assess the functional form
of the relationship of continuous predictor variables to the
response, so weren't really meant to handle factor variables
as predictor variables.  GAMs are of the form
E(Y | X1, X2, ...) = So + S(X1) + S(X2) + ...
where S(X) is a smooth function of X.

Hence you might want to rethink why you'd want a
factor variable as a predictor variable in a GAM.
This is why the gam machinery doesn't just do the
factor conversion to indicator variables as is done in
lm.

HTH

Steven McKinney


From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On 
Behalf Of Noah Silverman [n...@smartmediacorp.com]

Sent: March 19, 2010 12:54 PM
To: r-help@r-project.org
Subject: [R] Factor variables with GAM models

I'm just starting to learn about GAM models.

When using the lm function in R, any factors I have in my data set are
automatically converted into a series of binomial variables.

For example, if I have a data.frame with a column named color and values
red, green, blue.   The lm function automatically replaces it with
3 variables colorred, colorgreen, colorblue which are binomial {0,1}

When I use the gam function, R doesn't do this so I get an error.

1) Is there a way to ask the gam function to do this conversion for me?
2) If not, is there some other tool or utility to make this data
transformation easy?
3) Last option - can I use lm to transform the data and then extract it
into a new data.frame to then pass to gam?

Thanks!!!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained non linear regression using ML

2010-03-18 Thread Corrado

Dear Gabor, Arne, Ravi, R users,

I am firstly trying the maximum likelihood approach, then will try the 
Bayesian approach.


The likelihood function, and the log likelihood function, will depend on 
the pdf of the error e in the formula:


y=f(theta*x)+e

Now let's say that e is  Gaussian distributed, then I can use LS which 
is the same as ML in this case, and the residuals would be distributed 
Gaussian. Is that right?


If e is distributed differently (for example: beta, in the continuous 
case,  or binomial, in the discrete case), then I am better off by using 
Maximum Likelihood. How would the residual be distributed? Should they 
not be distributed the same as e?


Best,

Gabor Grothendieck wrote:

For specific questions on the betareg package contact the maintainer.
If the likelihood based approaches are giving too much difficulty try
moving to a Bayesian framework (WinBUGS/R2WinBUGS, JAGS/r2jags, etc.)

On Wed, Mar 17, 2010 at 10:03 AM, Corrado ct...@york.ac.uk wrote:
  

Dear Arne, Gabor,

I solved the problem with betareg (downloaded the package). I run it on my
data, and unfortunately the  constraint is definitively active, if I remove
the active variables, I then remove the most significant variables!

Of course the error is important, not the distribution of the variable.

In this case, one of the assumptions is that the error may be distributed ~
beta. I think that betareg makes this assumption, am I right?

I am finding it difficult to solve two problems:

1) write the maximum likelihood function (what do you suggest?)
2) deal with the fact that a few factors actually have values of y (the
response) at the extremes: that is 0 and 1. But that mean that the link
function returns Infinite values in that case 
3) the error is dependent on E(y).

PS: Additional silly question: what is the discrete equivalent of beta?
binomial?

Arne Henningsen wrote:


On 17 March 2010 14:22, Gabor Grothendieck ggrothendi...@gmail.com
wrote:

  

Contact the maintainer regarding problems with the package.  Not sure
if this is acceptable but if you get it to run you could consider just
dropping the variables from your model that correspond to active
constraints.

Also try the maxLik package.  You will have to define the likelihood
yourself but it does support constraints.



Yes. And specifying the likelihood function is probably (depending on
your distributional assumptions) not too complicated.

BTW: Even if your y follows a beta distribution, it does not mean that
your error term also follows a beta distribution. And it the
distribution of the error term which is crucial for specifying the
likelihood function.

/Arne

  

--

Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk






--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained non linear regression using ML

2010-03-17 Thread Corrado

Dear Gabor, dear R users,

I had already read the betareg documentation. As far as I can understand 
from the help, it does not allow for constrained regression.


Regards


Gabor Grothendieck wrote:

Check out the betareg package.

On Tue, Mar 16, 2010 at 2:58 PM, Corrado ct...@york.ac.uk wrote:
  

Dear R users,

I have to fit the non linear regression:

y~1-exp(-(k0+k1*p1+k2*p2+  +kn*pn))

where ki=0 for each i in [1  n] and pi are on R+.

I am using, at the moment, nls, but I would rather use a Maximum Likelhood
based algorithm. The error is not necessarily normally distributed.

y is approximately beta distributed, and the volume of data is medium to
large (the y,pi may have ~ 40,000 elements).

I have studied the packages in the task views Optimisation and Robust
Statistical Methods, but I did look like what I was looking for was there.
Maybe I am wrong.

The nearest thing was nlrob, but even that does not allow for constraints,
as far as I can understand.

Any suggestion?

Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained non linear regression using ML

2010-03-17 Thread Corrado

Dear Gabor,

1) The constraints are active, at least from a formal point view.
3) I have tried several times to run betareg.fit on the data, and the 
only thing I can obtain is the very strange error:


Error in dimnames(x) - dn :  length of 'dimnames' [2] not equal to 
array extent


The error is strange because, because the function dimnames is not 
called anywhere.  


Regards

Gabor Grothendieck wrote:

Try it anyways -- maybe none of your constraints are active.

On Wed, Mar 17, 2010 at 6:01 AM, Corrado ct...@york.ac.uk wrote:
  

Dear Gabor, dear R users,

I had already read the betareg documentation. As far as I can understand
from the help, it does not allow for constrained regression.

Regards


Gabor Grothendieck wrote:


Check out the betareg package.

On Tue, Mar 16, 2010 at 2:58 PM, Corrado ct...@york.ac.uk wrote:

  

Dear R users,

I have to fit the non linear regression:

y~1-exp(-(k0+k1*p1+k2*p2+  +kn*pn))

where ki=0 for each i in [1  n] and pi are on R+.

I am using, at the moment, nls, but I would rather use a Maximum
Likelhood
based algorithm. The error is not necessarily normally distributed.

y is approximately beta distributed, and the volume of data is medium to
large (the y,pi may have ~ 40,000 elements).

I have studied the packages in the task views Optimisation and Robust
Statistical Methods, but I did look like what I was looking for was
there.
Maybe I am wrong.

The nearest thing was nlrob, but even that does not allow for
constraints,
as far as I can understand.

Any suggestion?

Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk






--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrained non linear regression using ML

2010-03-17 Thread Corrado

Dear Arne, Gabor,

I solved the problem with betareg (downloaded the package). I run it on 
my data, and unfortunately the  constraint is definitively active, if I 
remove the active variables, I then remove the most significant variables!


Of course the error is important, not the distribution of the variable.

In this case, one of the assumptions is that the error may be 
distributed ~ beta. I think that betareg makes this assumption, am I right?


I am finding it difficult to solve two problems:

1) write the maximum likelihood function (what do you suggest?)
2) deal with the fact that a few factors actually have values of y (the 
response) at the extremes: that is 0 and 1. But that mean that the link 
function returns Infinite values in that case 

3) the error is dependent on E(y).

PS: Additional silly question: what is the discrete equivalent of beta? 
binomial?


Arne Henningsen wrote:

On 17 March 2010 14:22, Gabor Grothendieck ggrothendi...@gmail.com wrote:
  

Contact the maintainer regarding problems with the package.  Not sure
if this is acceptable but if you get it to run you could consider just
dropping the variables from your model that correspond to active
constraints.

Also try the maxLik package.  You will have to define the likelihood
yourself but it does support constraints.



Yes. And specifying the likelihood function is probably (depending on
your distributional assumptions) not too complicated.

BTW: Even if your y follows a beta distribution, it does not mean that
your error term also follows a beta distribution. And it the
distribution of the error term which is crucial for specifying the
likelihood function.

/Arne
  

--

Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Constrained non linear regression using ML

2010-03-16 Thread Corrado

Dear R users,

I have to fit the non linear regression:

y~1-exp(-(k0+k1*p1+k2*p2+  +kn*pn))

where ki=0 for each i in [1  n] and pi are on R+.

I am using, at the moment, nls, but I would rather use a Maximum 
Likelhood based algorithm. The error is not necessarily normally 
distributed.


y is approximately beta distributed, and the volume of data is medium to 
large (the y,pi may have ~ 40,000 elements).


I have studied the packages in the task views Optimisation and Robust 
Statistical Methods, but I did look like what I was looking for was 
there. Maybe I am wrong.


The nearest thing was nlrob, but even that does not allow for 
constraints, as far as I can understand.


Any suggestion?

Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Distance between sets of points in transformed environmental space

2009-12-01 Thread Corrado
Dear friends,

I have several sets of points in a transformed environmental space. Each set 
of points can be represented as a cloud in the environmental space.

This space is spanned by n coordinates, corresponding to the first n PCs of 36 
PCs of some environmental variables (12 monthly minimum temperatures, 12 
monthly maximum temperature, 12 monthly precipitations).

I would like to calculate a distance or dissimilarity between each pair of 
sets of points.

Let's label two of those sets as X,Y, where x is in X and y is in Y. We are 
interested in defining a distance between X and Y. I have thought of using the 
following:

1) The Euclidean distance between the centroids of X and Y. Simple and 
effective but does not give much real information on the actual degree of 
overlapping.
2) The median of the all the distances between all pairs of points (x,y). Same 
problem as (1), partially resolved.
3) The proportion of points of X U Y which fall outside the intersection of 
the convex or concave hulls (defined with a smoothing parameter) of X and Y, 
i.e. C(X) intersect C(Y). Very complicated, and does not necessarily lead to

What do you think? Are there any other approaches worth considering?  

Kind Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Distance between sets of points in transformed environmental space

2009-12-01 Thread Corrado
Thanks Mario! (Oppure grazie Mario?)

- Can those silhouette coefficients be used for distances between sets or only 
for distances point to set?

- Where did you get the other post you attached? It did not come up when I 
searched the mailing list!
 

Best,

On Tuesday 01 December 2009 10:31:47 Mario Valle wrote:
 silhouette coefficients?
 It measure for each point how similar is to its cluster other points and
  how dissimilar from the points of other clusters.
 
 P.N. Tam, M. Steinbach, V. Kumar, Introduction to data mining,
  Addison-Wesley, 2006 page 541
 
 Hope it helps.
   mario
 
 Charlotte Maia wrote:
  Well, here's another naive post from me (hopefully better than the last
  one).
 
  Firstly I'm not sure computing euclidean distance is that simple. I
  would assume temperatures and precipitation would need to be
  standardised in some way.
 
  I think the notion of how far away something is, and how distinct
  location wise something is, are quite different, so maybe two
  measures?
 
  For distance per se, I think your first idea is the best.
  Plus simple is always good...
 
  For distinctness, given one one of two sets, for each point, you could
  just compute the closest point to it. If the closest point is a member
  of the same set, we will call that a + point, if the closest point is
  a member of the other set, we will call it a - point. In principle the
  measure of distinctness would be the sum of the +'s, however there
  might need to be some scaling to take into account the number of
  points in each set.
 
  There are also a lot of fancy things out there, so someone will
  probably come up with a much fancier (and possibly better) idea than
  this.
 
  Well, that's just my rant, before I go to bed.
 
 
  kind regards
 





-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Concave hull

2009-11-26 Thread Corrado Topi

Dear David and other concave-hull-ists,

yes, I meant concave hulls indeed. I know about the algorithm mentioed 
(www.concavehull.com) but it is not open source, so you cannot integrate it 
in R, and it is apparently patented, so even if you find the description 
you cannot apply it to implement a solution (even if patenting algorithms 
is at least questionable and has a rather patchy validity).


Some questions / comments which applies to David's approach but in general 
even to convex hulls (question 2):


1) How do you extend it to n dimensions (in R)? 2) How do you do set 
calculus (horrible expression to mean: union, intersection, difference, 
and particularly membership, and so on ) on these hulls (in R)?


Finally, I am at the moment using a gis to do it, but I did not find any 
command for concave hulls in grass. There is a rather long a convoluted way 
of doing them, but nearly impossible to automatise (see 
http://grass.osgeo.org/wiki/Create_concave_hull). Looking for the 
capability of extending it to the n-dimensional case does not sound right, 
because gis is thought for working in 2d/3d.



Best,


On Nov 26 2009, David Winsemius wrote:



On Nov 25, 2009, at 7:51 PM, David Winsemius wrote:


Drats; Forgot the plot:

xx - runif(100, -1, 1)
yy - abs(xx)+rnorm(100,0,.2); plot(xx,yy, xlim=c( min(xx)-sd(xx),  
max(xx)+sd(xx)), ylim =c( min(yy)-sd(yy), max(yy)+sd(yy)))


dens2 - kde2d(xx, yy, lims=c(min(xx)-sd(xx), max(xx)+sd(xx),  
min(yy)-sd(yy), max(yy)+sd(yy) )  )

contour(dens2, add=TRUE)

#  You can pick a single contour if you like:


contour(dens2, level=0.05, col=red, add=TRUE)
contour(dens2, level=0.10, col=blue, add=TRUE)


And as a further note you can drop the bandwidth and lower the density  
level to get a tighter fit:


xx - runif(1, -1, 1)
yy - abs(xx)+rnorm(1   ,0,.2); plot(xx,yy, xlim=c( min(xx)- 
sd(xx), max(xx)+sd(xx)), ylim =c( min(yy)-sd(yy), max(yy)+sd(yy)),  
cex=.2)


dens2 - kde2d(xx, yy, lims=c(min(xx)-sd(xx), max(xx)+sd(xx), min(yy)- 
sd(yy), max(yy)+sd(yy) ) , h=c(bandwidth.nrd(xx)/4, bandwidth.nrd(xx)/ 
4) )

contour(dens2, add=TRUE)
#  You can pick a single contour if you like:

contour(dens2, level=0.05, col=red, add=TRUE)
contour(dens2, level=0.10, col=blue, add=TRUE)

contour(dens2, level=0.005, col=red, add=TRUE)


(More bat-like.)




--
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Concave hull

2009-11-25 Thread Corrado
Dear friends,

Do you know how to calculate the CONCAVE hull of a set of points (2-
dimensional or n-dimensional)? is that possible in R? (With a smoothing 
parameter of course).

Best,
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] IRLS or other iteratively re weighted optimization algorithms with constraints in R

2009-10-06 Thread Corrado
Dear list,

is there an iterative re weighted least square based algorithm or any or other 
iteratively re weighted optimisation algorithms for non linear (and possibly 
non parametric) optimisation problems with constraints available in R?

Regards
-- 
Corrado

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with dist (bug?)

2009-10-02 Thread Corrado
Dear list,

using package proxy.

In one situation, the dissimilarity between two vectors based on 
method=correlation returns a value of 1.9. That should not happen, should it?

The correlation is normally the cos() of the angle between the two vectors 


That dissimilarity

Any clue?

Package dist 0.4-3 on R 2.9.2 on Kubuntu 904 64 bit.

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with dist (bug?)

2009-10-02 Thread Corrado
Dear list,

here is the code that generates the problem:

library(proxy)
scot-read.csv(scot.csv,header=TRUE)
scot24_climate-scot24[,1105:1109]

# Scotland
dist_scot24_climate-
dist(scot24_climate,method=correlation,diag=TRUE,upper=TRUE)

max(dist_scot24_climate)  

is 1.9. I do not think it should be, because the value is usually the cos() of 
the angle between the 2 vectors. If you use method=cosine you have  1.8, 
which I think it should not be.

Is there a problem with the way I use it, or is there a bug?

I have been able to reduce the scot.csv to under 200MB, but I thought of not 
posting it to the list 

 We need to see the data and the script that produced the error.

 On Fri, Oct 2, 2009 at 5:06 AM, Corrado ct...@york.ac.uk wrote:
  Dear list,
 
  using package proxy.
 


  In one situation, the dissimilarity between two vectors based on
  method=correlation returns a value of 1.9. That should not happen, should
  it?
 
  The correlation is normally the cos() of the angle between the two
  vectors 
 
  Any clue?
 
  Package dist 0.4-3 on R 2.9.2 on Kubuntu 904 64 bit.
 
  Regards
  --
  Corrado Topi
 
  Global Climate Change  Biodiversity Indicators
  Area 18,Department of Biology
  University of York, York, YO10 5YW, UK
  Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fetch large sized file from SQL

2009-10-02 Thread Corrado
You can try using RODBC, it allows you to connect to databases using the ODBC 
driver. 

I had some difficulties using it with the postgresSQL driver in the past, 
because of some apparent incompatibility with the native postrgesSQL ODBC 
driver. I think the problem where solved by the new ODBC driver for 
postgresSQL and the new revision for RODBC.

On Friday 02 October 2009 13:56:02 Dr. Alireza Zolfaghari wrote:
 Hi List,
 Does any one know what package I need to use in order to fetch/get a large
 sized dataframe from SQL? I have already used sqldf package which is good
 for fetching large sized csv files.

 Thanks
 Alireza

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fetch large sized file from SQL

2009-10-02 Thread Corrado
I think you can specify the number of rows to be loaded at a time.  It was 
quite a while ago. Try reading

?sqlQuery
?odbcConnect

I have loaded quite large tables.

On Friday 02 October 2009 14:59:59 Dr. Alireza Zolfaghari wrote:
 But the problem is that the dataframe size in sql is large, therefore odbc
 can sqlQuery() can not handel it.

 On Fri, Oct 2, 2009 at 2:14 PM, Corrado ct...@york.ac.uk wrote:
  You can try using RODBC, it allows you to connect to databases using the
  ODBC
  driver.
 
  I had some difficulties using it with the postgresSQL driver in the past,
  because of some apparent incompatibility with the native postrgesSQL ODBC
  driver. I think the problem where solved by the new ODBC driver for
  postgresSQL and the new revision for RODBC.
 
  On Friday 02 October 2009 13:56:02 Dr. Alireza Zolfaghari wrote:
   Hi List,
   Does any one know what package I need to use in order to fetch/get a
 
  large
 
   sized dataframe from SQL? I have already used sqldf package which is
   good for fetching large sized csv files.
  
   Thanks
   Alireza
  
 [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/po
  sting-guide.htmland provide commented, minimal, self-contained,
   reproducible code.
 
  --
  Corrado Topi
 
  Global Climate Change  Biodiversity Indicators
  Area 18,Department of Biology
  University of York, York, YO10 5YW, UK
  Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A point in a vector?

2009-09-30 Thread Corrado
Dear list,

I have a strange requirement  I have a vector, for example v-
c(0,0,0,0,1,2,4,6,8,8,8,8). I have a value,for example x- 4.8. 

I would like to understand in which sub interval of v is x. In this case, v 
would be in the sub interval [4,6] that is in the subinterval starting from 
element j=7 to the element j+1=8.

Can we do that with an R command?

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Something wrong with my function Please Help

2009-09-29 Thread Corrado
Did you run debug over your function?

Load the library debug, and then run mtrace over your function.

library(debug)

? mtrace

hth

On Tuesday 29 September 2009 04:29:37 Chunhao Tu wrote:
 Hi R users,
 I try to build a function to compute odds ratio and relative risk however
 something wrong. I stuck for many hours but I really don't know how to
 solve it. Would someone please give me a hint?

  OR.RR-function(x){

 +   x - as.matrix(any(dim(x)==2))
 +   OR-(x[1,1]*x[2,2])/(x[1,2]*x[2,1])
 +   RR-(x[1,1]/(sum(x[1,])))/(x[2,1]/(sum(x[2,])))
 +   return(OR);return(RR)
 +   }

  tt-matrix(data=1:4,nrow=2,ncol=2)
  OR.RR(tt)

 Error in OR.RR(tt) : subscript out of bounds

 Many Thanks
 Tu



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SAS user now converting to R - Help with Transpose

2009-09-28 Thread Corrado
I think you want to look at the command reshape, it may solve your problem.

Type ?reshape in the R console on your system.

On Monday 28 September 2009 18:35:25 Gabor Grothendieck wrote:
  I have a dataset that looks like this:
 
  Chemical Well1 Well2 Well3 Well4
  BOD 13.2 14.2 15.5 14.2
  O2 7.8 2.6 3.5 2.4
  TURB 10.2 14.6 18.5 17.3
  and so on with more chemicals
 
  I would like to transpose my data so that it looks like this:
  Chemical WellID Value
  BOD Well1 13.2
  BOD Well2 14.2
  BOD Well3 15.5
  BOD Well4 14.2
  O2 Well1 7.8
  O2 Well2 2.6
   and so on
 
  In sas I would code it like this:
  proc sort data=ds1; by chemical; run;
  Proc Transpose data=ds1 out=ds2;
  by chemical;
  var Well1 Well2 Well3 Well4;
  run;
  data ds3; set ds2;
  rename _name_ = WellID;
  rename col1 = value;
  run;
 
  How can I do this in R??  Any help is much appreciated.  Thanks!
  --
  View this message in context:
  http://www.nabble.com/SAS-user-now-converting-to-R---Help-with-Transpose-
 tp25645393p25645393.html Sent from the R help mailing list archive at
  Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] define a new family (and a new link function) for gam in gam package

2009-09-21 Thread Corrado
Dear Simon,

I want  a simple generalised additive model for regression which I can 
customise (force /define a certain basis, certain order for the splines, knots 
vector, generating new link functions, and force the regression through 0,0).

What do you suggest?  


On Friday 18 September 2009 16:06:31 Simon Wood wrote:
  I am using gam in gam package (not in mgcv)  it is possible to force
  gam in mgcv to behave like gam in gam package?

 -- not *exactly*, no. But what do you want to do? (i.e. what feature of
 `gam' do you need?)



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] define a new family (and a new link function) for gam in gam package

2009-09-18 Thread Corrado
Dear David,

I am using gam in gam package (not in mgcv)  it is possible to force gam 
in mgcv to behave like gam in gam package?

On Thursday 17 September 2009 23:00:17 David Winsemius wrote:
 On Sep 17, 2009, at 1:39 PM, Topi, Corrado wrote:
  Dear R list,
 
  is it possible to define a new family (and a new link function) for
  gam in gam package? How?
 
  I read the help for gam, family, gam.model, make.link but I did not
  find a solution.

 Wood provides an example for negbin with alternate links in package
 mgcv;

 library(mgcv)
 ?negbin
 negbin  # produces the code



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] define a new family (and a new link function) for gam in gam package

2009-09-17 Thread Topi, Corrado

Dear R list,

is it possible to define a new family (and a new link function) for gam 
in gam package? How?


I read the help for gam, family, gam.model, make.link but I did not find 
a solution.


Regards
--
Corrado Topi

Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with error on function: Error in .... attempt to apply non-function

2009-09-15 Thread Corrado
Dear R gurrus,

I wrote this function

http://scsys.co.uk:8002/33852?ln=onstore=onsubmit=Format+it!

for a small package I am preparing. 

Whenever I run the function I get the error

Error in Mspline(i = i, x = x, degree = kk, t = t) :  attempt to apply non-
function

Anyone could point me out what I am doing wrong?

kubuntu 904 64 bit, R 2.9.2

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with error on function: Error in .... attempt to apply non-function

2009-09-15 Thread Corrado
Dear Duncan,

this is a reproducible example: it is the function copied straight from my 
Eclipse.

I found the mistake (thanks to Peter) 

On Tuesday 15 September 2009 11:15:29 Duncan Murdoch wrote:
 Corrado wrote:
  Dear R gurrus,
 
  I wrote this function
 
  http://scsys.co.uk:8002/33852?ln=onstore=onsubmit=Format+it!
 
  for a small package I am preparing.
 
  Whenever I run the function I get the error
 
  Error in Mspline(i = i, x = x, degree = kk, t = t) :  attempt to apply
  non- function
 
  Anyone could point me out what I am doing wrong?

 It would be a lot easier to do so if you gave us a reproducible example.
 But the usual cause for that is using () instead of [], or forgetting an
 operator.  I think you've done the second:  you have (k-1)(t[i+k]-t[i])
 where you should have (k-1)*(t[i+k]-t[i]).

 Duncan Murdoch



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with error on function: Error in .... attempt to apply non-function (Solution)

2009-09-15 Thread Corrado
Dear friends,

the problem with the error has been solved (thanks to Peter).

The line 41 in http://scsys.co.uk:8002/33852 should be rewritten as

M-k*((x-t[i])*m0+(t[i+k]-x)*m1)/((k-1)*(t[i+k]-t[i]))

On Tuesday 15 September 2009 11:32:53 Corrado wrote:
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] identical(length(x), 1) returns FALSE, but print(length(x)) is 1, length(x)==1 is TRUE, and is.integer(lenght(x)) is TRUE????

2009-09-15 Thread Corrado
Dear R,

the condition: 

identical(length(x),1) returns FALSE 

but

print(length(x))

returns 1 and:

is.vector(x) is TRUE.
is.integer(length(x)) is TRUE
length(x) ==1 is TRUE

I am puzzled.

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Function returns different results on the vector as a whole vs. on the values of the vector

2009-09-15 Thread Corrado
Dear R friends,

I have developed the function here below attached.

Strangely, when I apply it to a vector it behaves very differently than when I 
apply it separately to each value of the vector itself.

Is there any reason why?

Here is the function:

# TODO: Add comment
# 
# Author: ct529, 3 Sep 2009, 08:42:50,mspline.R
###

mspline-function(i=1,x=0,k=1,t=c(0,1)){
# x is the variable
# i is the index of the member of the Mspline family
# t is the vector of knots. t[h] is the h-th knot.
# k is the Mspline degree

I-i

if(identical(k,1)){

if( xt[i+1]  x=t[i] ){
td-t[i+1]-t[i] 
M-1/td 
}else{
M-0
}

}else if (k1) {

kk-(k-1)

if (x=t[i]  xt[i+k]){
M-k*((x-t[i])*mspline(i=I,x=x,k=kk,t=t)+(t[i+k]-
x)*mspline(i=(I+1),x=x,k=kk,t=t))/((k-1)*(t[i+k]-t[i]))
} else if (xt[i] || x=t[i+k]){
M-0
}
}

return(M)   
}

For example:

source(./functions/mspline.R)

X-seq(0,1,0.1)
Q-c(0,0,0,0.3,0.5,0.6,1,1,1)
II-c(1,2,3,4,5,6)

plot(c(0,1),c(0,24),type=p,col=white,cex=.4,pch=.)

for (h in II) {

y-vector()

for ( in X) {

y-append(y,mspline(i=h,x=,k=3,t=Q))

}

points(X,y,type=l,col=green)

}

works very differently from using a vectorial approach, that is substituting 
the inner for iteration with the expression:

y-mspline(i=h,x=X,k=3,t=Q)

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] identical(length(x), 1) returns FALSE, but print(length(x)) is 1, length(x)==1 is TRUE, and is.integer(lenght(x)) is TRUE????

2009-09-15 Thread Corrado
On Tuesday 15 September 2009 17:28:02 Gavin Simpson wrote:
 [note you don't give us your x so I'm making this up - This is what
 Duncan was going on about in an earlier thread, give us something we can
 just paste into R and it works]

Dear Gavin,

I do not understand what more information! Take any vector of length 1, for 
example x-1. Plus all the command that where in my previous email 

What is the logic behind  

identical(length(x),1)

being false?

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Intercept=0 in gam from gam package

2009-09-10 Thread Corrado
Dear R list,

is it possible to force the intercept to assume the value of 0 (that is no 
intercept) in gam from gam package?

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Negative AIC

2009-09-10 Thread Corrado
Dear R list,

I just obtained a negative AIC for two models (-221.7E+4
 and -230.2E+4). Is that normal?

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Negative AIC

2009-09-10 Thread Corrado
My worry is: can I compare negative AIC with positive AIC? does the comparison 
still hold?

On Thursday 10 September 2009 15:57:01 Ben Bolker wrote:
 Corrado-5 wrote:
  Dear R list,
 
  I just obtained a negative AIC for two models (-221.7E+4
   and -230.2E+4). Is that normal?

 It's not necessarily wrong.  See http://emdbolker.wikidot.com/faq



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Negative AIC

2009-09-10 Thread Corrado
I think the problem is trying to compare different models trained don the same 
dataset.

1) If I compare for example gam (from gam package) with and without intercept, 
is that a valid comparison?

For example: model with intercept has explained dev 24%, with AIC -2217146, 
model without intercept has explained dev 85.5% with AIC 217488.1

The results sound incredibly strange, but there is actually no difference in 
the model but the removal of the intercept  :(. So which model is better 
at fitting the data

2) If I compare for example gam from gam package with let's say gam from mgcv 
(using tpsp), then I get two completely analogous AIC, but are they 
comparable?

gam from mgcv package: -2195000
gam from gam package: -2217000

3) I would like to compare those AIC to the AIC obtained by running BRT on the 
same dataset. I was thinking of simply recalculating manually the AIC using 
the formula:

AIC=2K+N*log(rss/N)

where K is the number of parameters of the regression (i.e. the coefficient 
that 
are not zero, I would think) and N is the number of samples.

What do you think? Would that be reasonable?

Regards

On Thursday 10 September 2009 16:39:32 Ben Bolker wrote:
   If all the models are fitted to the same data set, using the same
 modeling tools (you have to be careful e.g. comparing lmer models to
 glm models, because they use different additive constants), and
 everything seems to make sense (!!!), then yes.  I would be a little
 surprised, and think that something was wrong, if you have some AIC
 values that are on the order of -20,000 (as below) and others that are
 +20,000 ...

   Ben Bolker

 Corrado wrote:
  My worry is: can I compare negative AIC with positive AIC? does the
  comparison still hold?
 
  On Thursday 10 September 2009 15:57:01 Ben Bolker wrote:
  Corrado-5 wrote:
  Dear R list,
 
  I just obtained a negative AIC for two models (-221.7E+4
   and -230.2E+4). Is that normal?
 
  It's not necessarily wrong.  See http://emdbolker.wikidot.com/faq



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] AIC and goodness of prediction - was: Re: goodness of prediction using a model (lm, glm, gam, brt,

2009-09-10 Thread Corrado
Dear Kingsford,

I apologise for breaking the thread, but I thought there were some more people 
who would be interested.

What you propose is what I am using at the moment: the sum of the squares of 
the residuals, plus  variance / stdev. I am not really satisfied. I have also 
tried using R2, and it works well  but some people go a bit wild eyed when 
they see a negative R2 (which is perfectly reasonable when you use R2 as a 
measure of goodness of fit on prediction on a dataset different from the 
training set).

I was then wondering whether it would make sense to use AIC: the K in the 
formula will still be the number of parameters of the trained model, the sum 
of square residuals would be the (predicted - observed)^2, N would be the 
number of samples in the test dataset. I think it should work well.

What do you / other R list members think?

Regards

On Thursday 03 September 2009 15:06:14 Kingsford Jones wrote:
 There are many ways to measure prediction quality, and what you choose
 depends on the data and your goals.  A common measure for a
 quantitative response is mean squared error (i.e. 1/n * sum((observed
 - predicted)^2)) which incorporates bias and variance.  Common terms
 for what you are looking for are test error and generalization
 error.


 hth,
 Kingsford

 On Wed, Sep 2, 2009 at 11:56 PM, Corradoct...@york.ac.uk wrote:
  Dear R-friends,
 
  How do you test the goodness of prediction of a model, when you predict
  on a set of data DIFFERENT from the training set?
 
  I explain myself: you train your model M (e.g. glm,gam,regression tree,
  brt) on a set of data A with a response variable Y. You then predict the
  value of that same response variable Y on a different set of data B (e.g.
  predict.glm, predict.gam and so on). Dataset A and dataset B are
  different in the sense that they contain the same variable, for example
  temperature, measured in different sites, or on a different interval
  (e.g. B is a subinterval of A for interpolation, or a different interval
  for extrapolation). If you have the measured values for Y on the new
  interval, i.e. B, how do you measure how good is the prediction, that is
  how well model fits the Y on B (that is, how well does it predict)?
 
  In other words:
 
  Y~T,data=A for training
  Y~T,data=B for predicting
 
  I have devised a couple of method based around 1) standard deviation 2)
  R^2, but I am unhappy with them.
 
  Regards
  --
  Corrado Topi
 
  Global Climate Change  Biodiversity Indicators
  Area 18,Department of Biology
  University of York, York, YO10 5YW, UK
  Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matrix regression

2009-09-07 Thread Corrado
Dear friends,

I would like to solve the following regression problem:

y=c1 x1 + c2 x2 +  + cn xn

where the y, xi are all matrices and the ci are constants that need to be 
determined. The y, xi are distance matrices (symmetric). ci should be forced 
to positive or null (i.e. non negative).

Any suggestion? 

I will be more than happy to share the results of my quest with the list or 
developers.

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error returned or bug in gam in mgcv????

2009-09-02 Thread Corrado
Dear Gavin, Simon,

this is the result of str:

 str(dist_scot24_vector_with_climate)
'data.frame':   2265025 obs. of  14 variables:
 $ X   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ tetrad_i: Factor w/ 1505 levels HP61A,HP61I,..: 1505 1504 1503 1502 
1501 1500 1499 1498 1497 1496 ...
 $ tetrad_j: Factor w/ 1505 levels HP61A,HP61I,..: 1505 1505 1505 1505 
1505 1505 1505 1505 1505 1505 ...
 $ bray: num  0 0.566 0.251 0.407 0.45 ...
 $ PC1 : num  -3.97 -3.14 -7.27 -5.77 -5.88 ...
 $ PC2 : num  3.26 2.87 3.19 2.96 2.97 ...
 $ PC3 : num  -0.16511 -0.28601 -0.00362 -0.11685 -0.09695 ...
 $ PC4 : num  -0.629 -0.696 -0.6 -0.683 -0.639 ...
 $ PC5 : num  0.2603 0.3818 -0.0148 0.0967 0.094 ...
 $ PC6   : num  -3.97 -3.97 -3.97 -3.97 -3.97 ...
 $ PC7   : num  3.26 3.26 3.26 3.26 3.26 ...
 $ PC8   : num  -0.165 -0.165 -0.165 -0.165 -0.165 ...
 $ PC9   : num  -0.629 -0.629 -0.629 -0.629 -0.629 ...
 $ PC10   : num  0.26 0.26 0.26 0.26 0.26 ...


It looks ok to me. What do you think?

On Tuesday 01 September 2009 18:43:24 Gavin Simpson wrote:
 On Tue, 2009-09-01 at 17:55 +0100, Corrado wrote:
  Dear Simon,
 
  I have stored all information at the link:
 
  http://scsys.co.uk:8002/33309?hl=onsubmit=Format+it!

 You could have included that in your mail to the list - it is just plain
 text after all.

  I have the same problem if I do
  s(PC1)  + . + s(PC10) or
  s(Pc1,PC2,PC3,PC4,PC5)+s(PC6,PC7,PC8,PC9,PC10) or
  s(PC1,PC2,PC3,PC6,PC7,PC8) .
 
  I have renamed PC1.1,PC2.1,PC3.1,PC4.1,PC5.1 to PC6,PC7,PC8,PC9,PC10 for
  simplicity.

 What does

 str(dist_scot24_vector_with_climate)

 show? I seem to recall getting similar errors when I'd done something
 silly in a data prep routine and had data in a data frame that wasn't
 numeric but looked like it was - a factor for example.

 If you can't do some quite simple things like the first of your three
 alternatives above, that suggests something amiss with the data. That'd
 be the first thing to check.

 HTH

 G

  Regards
 
  On Tuesday 01 September 2009 17:31:04 Simon Wood wrote:
   The basic problem is that you have requested a 10 dimensional thin
   plate spline, with a basis dimension of 196830. In reality it will not
   be possible to compute this, even if you have more than 196830 data. In
   any case it would be unlikely to provide a very useful model --- the
   simplest function that it can theoretically represent will have 3003
   degrees of freedom.
  
   That said the error message is obviously rather unhelpful... Can you
   tell me how many data you are actually trying to fit, and I'll try and
   track down exactly where it's failing, and put in a more informative
   message.
  
   best,
   Simon
  
   On Tuesday 01 September 2009 14:51, Corrado wrote:
Dear friends,
   
what is this error message in gam I cannot understand what it
means  is it a bug?
   
gam_bray_scot24_pc_0505gam(bray~s(PC1,PC2,PC3,PC4,PC5,
PC1.1,PC2.1,PC3.1,PC4.1,PC5.1),data=dist_scot24_vector_with_climate)
   
Error in if (length(data) != vl) { :
  missing value where TRUE/FALSE needed
Calls: gam ... smooth.construct - smooth.construct.tp.smooth.spec -
array In addition: Warning message:
In array(0, n * k) : NAs introduced by coercion
Execution halted
   
Thanks in advance,
   
Best regards



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] goodness of prediction using a model (lm, glm, gam, brt, regression tree .... )

2009-09-02 Thread Corrado
Dear R-friends,

How do you test the goodness of prediction of a model, when you predict on a 
set of data DIFFERENT from the training set?

I explain myself: you train your model M (e.g. glm,gam,regression tree, brt) 
on a set of data A with a response variable Y. You then predict the value of 
that same response variable Y on a different set of data B (e.g. predict.glm, 
predict.gam and so on). Dataset A and dataset B are different in the sense that 
they contain the same variable, for example temperature, measured in different 
sites, or on a different interval (e.g. B is a subinterval of A for 
interpolation, or a different interval for extrapolation). If you have the 
measured values for Y on the new interval, i.e. B, how do you measure how good 
is the prediction, that is how well model fits the Y on B (that is, how well 
does it predict)?

In other words:

Y~T,data=A for training
Y~T,data=B for predicting

I have devised a couple of method based around 1) standard deviation 2) R^2, 
but I am unhappy with them.

Regards 
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Google's R Style Guide

2009-09-01 Thread Corrado
Thanks Duncan, Spencer,

To clarify, the situation is:

1) I have no reasons to choose S3 on S4 or vice versa, or any other coding 
convention
2) Our group has not done any OO developing in R and I would be the first, so I 
can set up the standards
3) I am starting from scratch with a new package, so I do not have any code I 
need to re-use.
4) I am an R OO newbie, so whatever I can learn from the beginning what is 
better and good for me.

So the questions would be two:

1) What coding style guide should we / I follow? Is the google style guide 
good, or is there something better / more prescriptive which makes our 
research group life easier? 
2) What class type should I use? From what you two say, I should use S3 
because is easier to use  what are the disadvantages? Is there an 
advantages / disadvantages table for S3 and S4 classes?

Thanks
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange error returned or bug in gam in mgcv????

2009-09-01 Thread Corrado
Dear friends,

what is this error message in gam I cannot understand what it means  
is it a bug?

gam_bray_scot24_pc_0505gam(bray~s(PC1,PC2,PC3,PC4,PC5,
PC1.1,PC2.1,PC3.1,PC4.1,PC5.1),data=dist_scot24_vector_with_climate)

Error in if (length(data) != vl) { :
  missing value where TRUE/FALSE needed
Calls: gam ... smooth.construct - smooth.construct.tp.smooth.spec - array
In addition: Warning message:
In array(0, n * k) : NAs introduced by coercion
Execution halted

Thanks in advance,

Best regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error returned or bug in gam in mgcv????

2009-09-01 Thread Corrado
Nope Of course, it was just a copy and paste problem 

On Tuesday 01 September 2009 15:00:34 David Winsemius wrote:
 On Sep 1, 2009, at 9:51 AM, Corrado wrote:
  Dear friends,
 
  what is this error message in gam I cannot understand what it
  means 
  is it a bug?
 
  gam_bray_scot24_pc_0505gam(bray~s(PC1,PC2,PC3,PC4,PC5,
  PC1.1,PC2.1,PC3.1,PC4.1,PC5.1),data=dist_scot24_vector_with_climate)

 If the code was as posted, you have entered  where you probably
 wanted -.

  Error in if (length(data) != vl) { :
   missing value where TRUE/FALSE needed
  Calls: gam ... smooth.construct - smooth.construct.tp.smooth.spec -
 
   array
 
  In addition: Warning message:
  In array(0, n * k) : NAs introduced by coercion
  Execution halted
 
  Thanks in advance,
 
  Best regards
  --
  Corrado Topi
 
  Global Climate Change  Biodiversity Indicators
  Area 18,Department of Biology
  University of York, York, YO10 5YW, UK
  Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.

 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error returned or bug in gam in mgcv???? - additional information

2009-09-01 Thread Corrado
Here I pasted the code from when I opened the R shell, so that it possible to 
see what is going on:

http://scsys.co.uk:8002/33309?hl=onsubmit=Format+it!

Thanks in advance
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error returned or bug in gam in mgcv???? - yet more additional information

2009-09-01 Thread Corrado
I am using mgcv 1.4-1.1 on Fedora 9 64 bit on an Opteron server with 8Gb of 
RAM.

On Tuesday 01 September 2009 15:19:28 Corrado wrote:
 Here I pasted the code from when I opened the R shell, so that it possible
 to see what is going on:

 http://scsys.co.uk:8002/33309?hl=onsubmit=Format+it!

 Thanks in advance



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error returned or bug in gam in mgcv????

2009-09-01 Thread Corrado
Dear Simon,

I have stored all information at the link:

http://scsys.co.uk:8002/33309?hl=onsubmit=Format+it!

I have the same problem if I do 
s(PC1)  + . + s(PC10) or 
s(Pc1,PC2,PC3,PC4,PC5)+s(PC6,PC7,PC8,PC9,PC10) or
s(PC1,PC2,PC3,PC6,PC7,PC8) .

I have renamed PC1.1,PC2.1,PC3.1,PC4.1,PC5.1 to PC6,PC7,PC8,PC9,PC10 for 
simplicity.

Regards

On Tuesday 01 September 2009 17:31:04 Simon Wood wrote:
 The basic problem is that you have requested a 10 dimensional thin plate
 spline, with a basis dimension of 196830. In reality it will not be
 possible to compute this, even if you have more than 196830 data. In any
 case it would be unlikely to provide a very useful model --- the simplest
 function that it can theoretically represent will have 3003 degrees of
 freedom.

 That said the error message is obviously rather unhelpful... Can you tell
 me how many data you are actually trying to fit, and I'll try and track
 down exactly where it's failing, and put in a more informative message.

 best,
 Simon

 On Tuesday 01 September 2009 14:51, Corrado wrote:
  Dear friends,
 
  what is this error message in gam I cannot understand what it means
   is it a bug?
 
  gam_bray_scot24_pc_0505gam(bray~s(PC1,PC2,PC3,PC4,PC5,
  PC1.1,PC2.1,PC3.1,PC4.1,PC5.1),data=dist_scot24_vector_with_climate)
 
  Error in if (length(data) != vl) { :
missing value where TRUE/FALSE needed
  Calls: gam ... smooth.construct - smooth.construct.tp.smooth.spec -
  array In addition: Warning message:
  In array(0, n * k) : NAs introduced by coercion
  Execution halted
 
  Thanks in advance,
 
  Best regards



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Google's R Style Guide

2009-08-29 Thread Corrado
I do not understand why one should use a S3 preferentially on a S4 class, if 
S4 is more rigorous.

(The premiss is I am a newbie with OO programming in R, and would like to 
understand what is the proper way to OO program in R )

Regards



On Saturday 29 August 2009 16:23:39 hadley wickham wrote:
  An opening curly brace should never go on its own line;
 
  I tend to do this:
 
  f - function()
  {
   if (TRUE)
 {
   cat(TRUE!!\n)
 } else {
   cat(FALSE!!\n)
 }
  }
 
  (I don't usually put one-liners in if/else blocks; here I would have
  used ifelse)
 
  I haven't seen many others format code in this way. Is there an
  objective reason for this (such as the rule for the trailing }) or
  is this just aesthetics?

 It's probably just aesthetics.  I don't like it because it increases
 the number of lines without much real benefit - indenting already
 gives you all the hints you need.

 Hadley



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best R text editors?

2009-08-28 Thread Corrado
Eclipse + StatET (the R plugin)  both on Linux and Windows

On Thursday 27 August 2009 20:43:41 Jonathan Greenberg wrote:
 Quick informal poll: what is everyone's favorite text editor for working
 with R?  I'd like to hear from people who are using editors that have
 some level of direct R interface (e.g. Tinn-R, Komodo+SciViews).  Thanks!

 --j



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best R text editors?

2009-08-28 Thread Corrado
I am using 3.4.2 not 3.5, I would not know. But it is worth visiting the 
STATET mailing list archive  and subscribe to the mailing list.

On Friday 28 August 2009 09:22:14 [Ricardo Rodriguez] Your XEN ICT Team wrote:
 Hi!

 Corrado wrote:
  Eclipse + StatET (the R plugin)  both on Linux and Windows

 Please, does it work with Eclipse 3.5 Galileo on a Mac OS X (10.5.8) box?

 Thanks!



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rd] Formulas in gam function of mgcv package

2009-08-26 Thread Corrado
Dear Simon,

thanks for your answer.

I am running the model with both s and te smoothing, to compare.

A few questions on your email:

1) Isotropic smoothness: my variables are centred and scaled. I assumed an 
isotropic smoother (that is, a smoother that treats all the variables in the 
same way) was good. What do you think? Is my understanding of isotropic 
smoothing wrong? 

2) s(x1,, xn): it does not contains (1), but I thought it was true that it 
does improve on (1) by being free of including some interaction, albeit not 
explicitly  is my interpretation wrong?

3) te: I am confused! What does it mean that the function space for (4) is 
built up from the function spaces used in (3)? Does it mean that 
te(xi,,xn) is an expansion on the te(xi), including all the terms 
te(x1)*te(x2)**te(xj)**te(xn) of the different orders?

Example: in the case of 4 variables, including te(x1)*te(x2), te(x2)*te(x3), 
 te(x1)*te(x2)*te(x3)  to te(x1)*te(x2)*te(x3)*te(x4) .

Sorry for being particularly daft 

Regards


On Wednesday 26 August 2009 09:56:13 you wrote:
   I am trying to understand the relationships between:
  
   y~s(x1)+s(x2)+s(x3)+s(x4)
  
   and
  
   y~s(x1,x2,x3,x4)
  
   Does the latter contain the former? what about the smoothers of all
   interaction terms?

 The first says that you want a model
 E(y) = f_1(x_1) + f_2(x_2) + f_3(x_3) + f_4(x_4) (1)
 where the f_j are smooth functions. The additive decomposition is quite a
 strong assumption, since it assumes that the effect of x_j is not dependent
 on x_k unless j=k. The second model is just
 E(y) = f(x_1,x_2,x_3,x4)  (2)
 where f is a smooth function. This looks very general, but actually `s'
 terms assume isotropic smoothness, which is also quite a strong assumption.

 Now if I simply state that f and the f_j are `smooth functions', and leave
 it at that, then (2) would of course contain (1), but to actually estimate
 the models I need to state, mathematically, what I mean by `smooth'. Once
 I've done that I've pretty much determined the function spaces in which f
 and the f_j will lie, and in general (2) will no longer strictly contain
 (1). mgcv's `s' terms use a thin plate spline measure of smoothness for
 multivariate smooths, and this means that (1) will not be strictly nested
 within (2), since e.g. a 4D thin plate spline can not generally represent
 exactly what the sum of 4 1D splines can represent.

 If you want to acheive exact nesting then using tensor product smooths with
 something like

 y~te(x1)+te(x2)+te(x3)+te(x4)   (3)

 y~te(x1,x2,x3,x4) (4)

 will do the trick (because the function space for (4) is built up from the
 function spaces used in (3)).

 As to where all the 2 and 3 way interactions have gone in (4)... it's just
 like ANOVA - if you put in a 4 way interaction then the lower order
 interactions are not identifiable, unless you choose to add constraints to
 make them so. `mgcv' will allow you add main effects and interactions, and
 will handle the constraints automatically, but if this sort of functional
 ANOVA is a major component of what you want to do, then it is probably
 worth checking out the gss package and Chong Gu's book on smoothing spline
 ANOVA.

 best,
 Simon



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rd] Formulas in gam function of mgcv package

2009-08-26 Thread Corrado
Dear Simon,

thanks again.

Concerning the whole 36 variables  well, I have run a principal components 
analysis, and I am only using part of them (I am running a test with the pc 
which cover the 95% of variance and then the 99%). :)  so I will possibly 
end up with s(x1,,x8). I wonder if using isotropic smoothers on principal 
component is a good idea  the variance diminishes from component to 
component, so theoretically also the wiggliness of the smoother should be less 
and less  what do you think? am I saying something stupid?

If that is the case, and if I want to enclose some interaction, then I have so 
include the interaction terms manually  like s(x1,x2). Is that right?

Sorry for the avalanche of questions, but I am trying to understand the 
principles underlying the working of gam in mgcv. It looks very powerful, 
particularly for exploring dependencies.

I have run te() instead of s(), but the predictive power seems to be less than 
with s() in this particular situation. At the same time, does te() include the 
interaction? I did not understand well your previous point on interaction term 
in te(): is te(x1,,xn) build as an expansion from the t(x1),  ,t(xn)? 
Then all the interaction terms should be included 

Finally, is it possible to incorporate both s() and te() terms in the formula?

Machine learning: I am not too well versed in the area. Did you mean 
regression trees or maximum entropy models?

Best,


On Wednesday 26 August 2009 10:27:08 Simon Wood wrote:
 This will not work...

  2) y~s(x1,  ,x36)

 Estimating a 36 dimensional functions reasonably well would require a
 tremendous quantity of data, but in any case the 36 dimensional TPS
 smoothnes measure will involve such high order derivatives that it will no
 longer be practically useful: in fact you will not have enough data to
 estimate the unpenalized coefficients of the smoother (and if you did R
 would run out of memory first).

 In such a high dimensional situation, I think that GAMs are really only
 useful if you have some prior knowledge of which variables are likely to
 interact (and it's not too many of them). If there's no prior information
 saying roughly what sort of smooth additive structure might be useful then,
 I'm not sure that GAMs are the right way to go, and some sort of machine
 learning approach might be better.

 Then again, the real problem with
 y~s(x1,  ,x36)
 is that the data just won't contain enough information to estimate s, if
 all you can say is that s is smooth, but this also means that it's very
 unlikely that you really need to estimate s(x1,  ,x36) in order to
 predict well. In that case, starting from
 y ~ s(x1) +  + s(x36)
 and building the model up might result in something that does a reasonable
 predictive job.

 On the subject of tensor product smoothing vs isotropic smoothing.
 Isotropic smooths are really only reasonable if you think  that the smooth
 should display approximately the same amount of wiggliness in all
 directions. If this is not the case then tensor product smoothing is a
 better bet. Centering and scaling alone is not enough to ensure that
 isotropy is reasonable (although in particular cases it may help, of
 course).

 best,
 Simon

  I am trying to build a predictive model. Since the the variables are
  centred and scaled, I think I need an isotropic smooth. I am also
  interested in having the interactions between the variables included,
  that is not a purely additive model.
 
  It is not clear to me when should I give preference to tensor smooths,
  possibly because I have not understood well how they work.
 
  I am reading Wood(2003) as recommended and I have also read rather
  extensively Simon N. Wood. Generalized Additive Models: An Introduction,
  2006, but still I am stuck. Any additional suggestion or reading
  recommendation would be greatly appreciated.
 
  I have also some difficulties in understanding the values you have chosen
  for k in the first example (why 60?).
 
  Thanks
 
  Best,
 
  On Monday 24 August 2009 17:33:55 Gavin Simpson wrote:
   [Note R-Devel is the wrong list for such questions. R-Help is where
   this should have been directed - redirected there now]
  
   On Mon, 2009-08-24 at 17:02 +0100, Corrado wrote:
Dear R-experts,
   
I have a question on the formulas used in the gam function of the
mgcv package.
   
I am trying to understand the relationships between:
   
y~s(x1)+s(x2)+s(x3)+s(x4)
   
and
   
y~s(x1,x2,x3,x4)
   
Does the latter contain the former? what about the smoothers of all
interaction terms?
  
   I'm not 100% certain how this scales to smooths of more than 2
   variables, but Sections 4.10.2 and 5.2.2 of Simon Wood's book GAM: An
   Introduction with R (2006, Chapman Hall/CRC) discuss this for smooths
   of 2 variables.
  
   Strictly y ~ s(x1) + s(x2) is not nested in y ~ s(x1, x2) as the bases
   used to produce the smoothers

Re: [R] [Rd] Formulas in gam function of mgcv package

2009-08-25 Thread Corrado
Dear Gavin / Rlings,

thanks for your kind answer and sorry for posting to the dev mailing list.

Concerning the specific of your answer:

I am working with 6 to 36 covariates, and they are all centred and scaled. I 
represented the problem with two variables to simplify the question.

So ideally, the situation is:

1) y ~ s(x1) +  + s(x36)

vs.

2) y~s(x1,  ,x36)

I am trying to build a predictive model. Since the the variables are centred 
and scaled, I think I need an isotropic smooth. I am also interested in having 
the interactions between the variables included, that is not a purely additive 
model.

It is not clear to me when should I give preference to tensor smooths, 
possibly because I have not understood well how they work.

I am reading Wood(2003) as recommended and I have also read rather extensively 
Simon N. Wood. Generalized Additive Models: An Introduction, 2006, but still I 
am stuck. Any additional suggestion or reading recommendation would be greatly 
appreciated.

I have also some difficulties in understanding the values you have chosen for k 
in the first example (why 60?).

Thanks

Best,



On Monday 24 August 2009 17:33:55 Gavin Simpson wrote:
 [Note R-Devel is the wrong list for such questions. R-Help is where this
 should have been directed - redirected there now]

 On Mon, 2009-08-24 at 17:02 +0100, Corrado wrote:
  Dear R-experts,
 
  I have a question on the formulas used in the gam function of the mgcv
  package.
 
  I am trying to understand the relationships between:
 
  y~s(x1)+s(x2)+s(x3)+s(x4)
 
  and
 
  y~s(x1,x2,x3,x4)
 
  Does the latter contain the former? what about the smoothers of all
  interaction terms?

 I'm not 100% certain how this scales to smooths of more than 2
 variables, but Sections 4.10.2 and 5.2.2 of Simon Wood's book GAM: An
 Introduction with R (2006, Chapman Hall/CRC) discuss this for smooths of
 2 variables.

 Strictly y ~ s(x1) + s(x2) is not nested in y ~ s(x1, x2) as the bases
 used to produce the smoothers in the two models may not be the same in
 both models. One option to ensure nestedness is to fit the more
 complicated model as something like this:

 ## if simpler model were: y ~ s(x1, k=20) + s(x2, k = 20)
 y ~ s(x1, k=20) + s(x2, k = 20) + s(x1, x2, k = 60)
   ^
 where the last term (^^^ above) has the same k as used in s(x1, x2)

 Note that these are isotropic smooths; are x1 and x2 measured in the
 same units etc.? Tensor product smooths may be more appropriate if not,
 and if we specify the bases when fitting models s(x1) + s(x2) *is*
 strictly nested in te(x1, x2), eg.

 y ~ s(x1, bs = cr, k = 10) + s(x2, bs = cr, k = 10)

 is strictly nested within

 y ~ te(x1, x2, k = 10)
 ## is the same as y ~ te(x1, x2, bs = cr, k = 10)

 [Note that bs = cr is the default basis in te() smooths, hence we
 don't need to specify it, and k = 10 refers to each individual smooth in
 the te().]

 HTH

 G

  I have (tried to) read the manual pages of gam, formula.gam,
  smooth.terms, linear.functional.terms but could not understand properly.
 
  Regards



-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Distance between clusters

2009-03-02 Thread Corrado
Dear friends

I reformulate the question. I think I did not formulate it properly.

I have some data on some sites. I can define a dissimilarity between each pair 
of sites. Using this dissimilarity, I have clustered the sites using the 
hclust algorithm, with method ward. I then obtain 48 clusters, by cutting the 
tree using cutree with k=48. 

I would now like to estimate the distance between each pair of the 48 
resulting clusters. I have read the documentation, but I cannot find a 
solution.

Any clue on how I can do that?

This is a snippet of the code:

distPredTurn-as.dist(dissimilarityMatrix)
hctr-hclust(distPredTurn,ward)
cutree(hctr,k=48)


Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using very large matrix

2009-03-02 Thread Corrado
Thanks a lot!

Unfortunately, the R package I have to sue for my research was only released 
on 32 bit R on 32 bit MS Windows and only closed source   I normally use 
64 bit R on 64 bit Linux  :) 

I tried to use the bigmemory in cran with 32 bit windows, but I had some 
serious problems.

Best,

On Thursday 26 February 2009 15:43:11 Jay Emerson wrote:
 Corrado,

 Package bigmemory has undergone a major re-engineering and will be
 available soon (available now in Beta version upon request).  The version
 currently on CRAN
 is probably of limited use unless you're in Linux.

 bigmemory may be useful to you for data management, at the very least,
 where

 x - filebacked.big.matrix(8, 8, init=n, type=double)

 would accomplish what you want using filebacking (disk space) to hold
 the object.
 But even this requires 64-bit R (Linux or Mac, or perhaps a Beta
 version of Windows 64-bit
 R that REvolution Computing is working on).

 Subsequent operations (e.g. extraction of a small portion for analysis) are
 then easy enough:

 y - x[1,]

 would give you the first row of x as an object y in R.  Note that x is
 not itself an R matrix,
 and most existing R analytics can't work on x directly (and would max
 out the RAM if they
 tried, anyway).

 Feel free to email me for more information (and this invitation
 applies to anyone who is
 interested in this).

 Cheers,

 Jay

 #Dear friends,
 #
 #I have to use a very large matrix. Something of the sort of
 #matrix(8,8,n)  where n is something numeric of the sort
 0.xx #
 #I have not found a way of doing it. I keep getting the error
 #
 #Error in matrix(nrow = 8, ncol = 8, 0.2) : too many elements
 specified #
 #Any suggestions? I have searched the mailing list, but to no avail.
 #
 #Best,
 #--
 #Corrado Topi
 #
 #Global Climate Change  Biodiversity Indicators
 #Area 18,Department of Biology
 #University of York, York, YO10 5YW, UK
 #Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk


-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using very large matrix

2009-02-25 Thread Corrado
Dear friends,

I have to use a very large matrix. Something of the sort of 
matrix(8,8,n)  where n is something numeric of the sort 0.xx

I have not found a way of doing it. I keep getting the error

Error in matrix(nrow = 8, ncol = 8, 0.2) : too many elements specified

Any suggestions? I have searched the mailing list, but to no avail. 

Best,
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Similarity between clusters generated by hclust + cutree

2009-02-25 Thread Corrado
Dear friends

I have clustered some objects using the hclust algorithm, with method ward. I 
then cutree with 48 classes. 

distPredTurn-as.dist(resultMatrix)
hctr-hclust(distPredTurn,ward)
cutree(hctr,k=NC)

I would like to estimate the similarity between each pair of the 48 clusters, 
for example as (1-distance or dissimilarity) between the centroids.

Any clue on how I can do that?

Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Principal Component Analysis - Selecting compo nents? + right choice?

2008-12-17 Thread Corrado
Hi,

I have been testing some of the alternative suggested approaches. The best PC 
set may not be the best predictors subset, but is that true that it is not 
generally the case? If you have to explore data patterns and (potential) 
relationships between a response variables and a large set of candidate 
predictors, PC still seem to be best candidate for a relatively quick test. I 
think some time you have to trade off against time (for example: computing 
time), and if any pattern emerges from response vs . first k PC then you 
investigate further  am I completely wrong there? what alternative do you 
have that reduces so drastically the computation request for exploratory 
purposes? 

Furthermore, is it really generally not the case that the best PC set, say, 
the top k PCs contain the best predictor subset in linear regression, or does 
that happens only in specific situations (that is, generally the best PC set 
is actually a good set of predictors, but in some specific cases it is not)?

Best,

On Thursday 11 December 2008 17:30:51 you wrote:
 Hi,

 It is generally not the case that the best PC set, say, the top k PCs
 (where k  p, p being the number of predcitors) contain the best predictor
 subset in linear regression.  Hadi and Ling (Amer Stat, 1998) show that it
 is even possible to have an extreme situation where the first (p-1) PCs
 contribute nothing towards explaining the variation in the response, yet
 the last PC alone contributes everything.   Their theorem is that if the
 true vector of regression coefficients is in the direction of the j-th
 eigenvector (of the correlation matrix), then the j-th PC alone will
 contribute everything to the model fit, while the remaining PCs will
 contribute zilch.  They illustrate this phenomenon with a real data set
 from a classic text on regression, Draper and Smith.

 Ravi.
 ---
- ---

 Ravi Varadhan, Ph.D.

 Assistant Professor, The Center on Aging and Health

 Division of Geriatric Medicine and Gerontology

 Johns Hopkins University

 Ph: (410) 502-2619

 Fax: (410) 614-9625

 Email: rvarad...@jhmi.edu

 Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html



 ---
- 


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of S Ellison
 Sent: Thursday, December 11, 2008 9:37 AM
 To: r-help@r-project.org; Corrado
 Subject: Re: [R] Principal Component Analysis - Selecting components? +
 right choice?

 If you're intending to create a model using PCs as predictors, select the
 PCs based on whether they contribute significanctly to the model fit.

 In chemometrics (multivariate stats in chemistry, among other things), if
 we're expecting 3 or 4 PC's to be useful in a principal component
 regression, we'd generally start with at least the first half-dozen or so
 and let the model fit sort them out.

 The reason for not preselecting too rigorously early on is that there's no
 guarantee at all that the first couple of PC's are good predictors for what
 you're interested in. The're properties of the predictor set, not of the
 response set.

 Mind you, there used to be something of a gap between chemometrics and
 proper statistics; I'm sure chemometricians used to do things with data
 that would turn a statistician pale.

 You could also look for a PLS model, which (if I recall correctly) actually
 uses the response data to select the latent variables used for prediction.

 S

  Corrado ct...@york.ac.uk 11/12/2008 11:46:37 

 Dear R gurus,

 I have some climatic data for a region of the world. They are monthly
 averages 1950 -2000 of precipitation (12 months), minimum temperature (12
 months), maximum temperature (12 months). I have scaled them to 2 km x 2km
 cells, and I have around 75,000 cells.

 I need to feed them into a statistical model as co-variates, to use them to
 predict a response variable.

 The climatic data are obviously correlated: precipitation for January is
 correlated to precipitation for February and so on  even precipitation
 and temperature are heavily correlated. I did some correlation analysis and
 they are all strongly correlated.

 I though of running PCA on them, in order to reduce the number of
 co-variates I feed into the model.

 I run the PCA using prcomp, quite successfully. Now I need to use a
 criteria to select the right number of PC. (that is: is it 1,2,3,4?)

 What criteria would you suggest?

 At the moment, I am using a criteria based on threshold, but that is highly
 subjective, even if there are some rules of thumb (Jolliffe,Principal
 Component Analysis, II Edition, Springer Verlag,2002).

 Could you suggest something more rigorous?

 By the way, do you think I would have been better off by using something
 different from PCA?

 Best,
 --
 Corrado Topi

 Global Climate

[R] Principal Component Analysis - Selecting components? + right choice?

2008-12-11 Thread Corrado
Dear R gurus,

I have some climatic data for a region of the world. They are monthly averages 
1950 -2000 of precipitation (12 months), minimum temperature (12 months), 
maximum temperature (12 months). I have scaled them to 2 km x 2km cells, and 
I have around 75,000 cells.

I need to feed them into a statistical model as co-variates, to use them to 
predict a response variable.

The climatic data are obviously correlated: precipitation for January is 
correlated to precipitation for February and so on  even precipitation 
and temperature are heavily correlated. I did some correlation analysis and 
they are all strongly correlated.

I though of running PCA on them, in order to reduce the number of co-variates 
I feed into the model.

I run the PCA using prcomp, quite successfully. Now I need to use a criteria 
to select the right number of PC. (that is: is it 1,2,3,4?)

What criteria would you suggest?

At the moment, I am using a criteria based on threshold, but that is highly 
subjective, even if there are some rules of thumb (Jolliffe,Principal 
Component Analysis, II Edition, Springer Verlag,2002). 

Could you suggest something more rigorous?

By the way, do you think I would have been better off by using something 
different from PCA?

Best,
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated on R 2.6.2 to 2.8.0: logging a bug?

2008-10-31 Thread Corrado
2ad38add9000-2ad38ae23000 r-xp  08:02 
2044157/lib64/libncurses.so.5.6
2ad38ae23000-2ad38b022000 ---p 0004a000 08:02 
2044157/lib64/libncurses.so.5.6
2ad38b022000-2ad38b027000 rw-p 00049000 08:02 
2044157/lib64/libncurses.so.5.6
2ad38b027000-2ad38b04d000 r-xp  08:02 
2044050/lib64/libpcre.so.0.0.1
2ad38b04d000-2ad38b24c000 ---p 00026000 08:02 
2044050/lib64/libpcre.so.0.0.1
2ad38b24c000-2ad38b24d000 rw-p 00025000 08:02 
2044050/lib64/libpcre.so.0.0.1
2ad38b24d000-2ad38b24e000 rw-p 2ad38b24d000 00:00 0
2ad38b24e000-2ad38b262000 r-xp  08:02 
2044118/lib64/libz.so.1.2.3
2ad38b262000-2ad38b461000 ---p 00014000 08:02 
2044118/lib64/libz.so.1.2.3
2ad38b461000-2ad38b462000 rw-p 00013000 08:02 
2044118/lib64/libz.so.1.2.3
2ad38b462000-2ad38b464000 r-xp  08:02 
2044015/lib64/libdl-2.7.so
2ad38b464000-2ad38b664000 ---p 2000 08:02 
2044015/lib64/libdl-2.7.so
2ad38b664000-2ad38b665000 r--p 2000 08:02 
2044015/lib64/libdl-2.7.so
2ad38b665000-2ad38b666000 rw-p 3000 08:02 
2044015/lib64/libdl-2.7.so
2ad38b666000-2ad38b668000 rw-p 2ad38b666000 00:00 0
2ad38b668000-2ad38b6a7000 r--p  08:02 
720800 /usr/share/locale/UTF-8/LC_CTYPE
2ad38b6a7000-2ad38b78b000 r--p  08:02 
720801 /usr/share/locale/UTF-8/LC_COLLATE
2ad38b78b000-2ad38b78c000 r--p  08:02 
892249 /usr/share/locale/en_GB.UTF-8/LC_TIME
2ad38b78c000-2ad38b78d000 r--p  08:02 
892496 /usr/share/locale/en_GB.UTF-8/LC_PAPER
2ad38b78d000-2ad38b78e000 r--p  08:02 
892500 /usr/share/locale/en_GB.UTF-8/LC_MAborted
[EMAIL PROTECTED]:~$ 

OS: Mandriva 2008.1 x86_64
Postgresql: 8.3.1 (PostGIS enabled)
R: from 2.6.2 from repository to 2.8.0 repackaged

Is it my doing, or R's doing?

Best,

-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R base 2.7.2 packaged for Mandriva 2008.1 x86_64: anyone interested?

2008-09-04 Thread Corrado
I have packaged R base 2.7.2 for Mandriva 2008.1 x86_64  who should I send 
it to so that it can be made available to everybody? 

It is my first attempt and it works well on my computer, but it will need some 
testing.

Best,
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] installing source package RGtk2 on mac os x 10.4.11

2008-05-25 Thread Corrado Giannasca
Dear All,

I'm trying to install the source package RGtk2  
(RGtk2_2.12.5-3.tar.gz) on a i-Mac running os x 10.4.11:
   Nome modello:iMac
   Identificatore modello:  iMac6,1
   Nome processore: Intel Core 2 Duo
   Velocità processore: 2.16 GHz
   Numero di processori:1
   Numero totale di nuclei: 2
   L2 Cache (per processore):   4 MB
   Memoria: 2 GB
   Velocità bus:667 MHz
   Versione Boot ROM:   IM61.0093.B07
   Versione SMC:1.10f3

This is a step needed to use Rattle package, as detailed in:

http://datamining.togaware.com/survivor/Installation_Details.html:
...Mac/OSX: download the source package from ggobi,2.12 and run the  
command line install:
Mac/OSX: $ R CMD INSTALL RGtk2_2.8.6.tar.gz   (30 minutes)

You may not be able to compile RGtk2 via the R GUI on Mac/OSX as the  
GTK libraries can not be found when gcc is called. Once installed  
though, R will detect the package--don't try to load it within the  
GUI as GTK is not a native Mac/OSX application and it will fail. On  
Mac/OSX be sure to run R from the X11 environment. ...

The installation is performed in a Terminal window with the following  
command:

corradogiannasca$ R CMD INSTALL RGtk2_2.12.5-3.tar.gz

There is a long listing in output and the installation fails with the  
following statement:


/usr/bin/libtool: internal link edit command failed
make: *** [RGtk2.so] Error 1
chmod: /Library/Frameworks/R.framework/Versions/2.7/Resources/library/ 
RGtk2/libs/i386/*: No such file or directory
ERROR: compilation failed for package 'RGtk2'
** Removing '/Library/Frameworks/R.framework/Versions/2.7/Resources/ 
library/RGtk2'

CAN ANYONE HELP IN RESOLVING THIS ERROR?

Thank you very much for your support.

Corrado Giannasca


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.