[R] notation question

2011-07-19 Thread Carson Farmer
Dear list, I am currently writing up some of my R models in a more
formal sense for a paper, and I am having trouble with the notation.
Although this isn't really an 'R' question, it should help me to
understand a bit better what I am actually doing when fitting my
models!

Using the analysis of co-variance example from MASS (fourth edition, p
142), what is the correct notation for the formula Gas, ~ Insul/Temp
- 1? Obviously, if we fit it as two separate models (as in the
example above it), we would have something like y_i = \beta x_i for
each of the two models. So my question is, when we have a single model
with a k-level factor interaction term as in the equation above, what
is the correct/standard statistical (LaTeX style) notation?

Cheers,

Carson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] notation question

2011-07-19 Thread Carson Farmer
Thank you Rolf,
 Using the analysis of co-variance example from MASS (fourth edition, p
 142), what is the correct notation for the formula Gas, ~ Insul/Temp
    There shouldn't be a comma after ``Gas'' in that formula.
 - 1? Obviously, if we fit it as two separate models (as in the
 example above it), we would have something like y_i = \beta x_i for
 each of the two models.
    No.  y_i = alpha + beta x_i  .  Clearly you need an intercept.
    Do you really expect gas consumption to be nil when the average
    external temperature is 0 degrees C ?
Right, of course... both silly mistakes, my apologies!
 So my question is, when we have a single model
 with a k-level factor interaction term as in the equation above, what
 is the correct/standard statistical (LaTeX style) notation?
 The model is simply
    y_ij = alpha_i + beta_i x_ij
 where i = 1 (before) or 2 (after).  I.e. you are allowing a different slope
 and intercept
 for each of the scenarios (before and after).
 But this is the ``deterministic'' part of the model.  You should really
 include the random part:
    y_ij = alpha_i + beta_i x_ij + E_ij
 where the E_ij are independent random variables with mean 0 and common
 variance sigma^2.  (Often the E_ij are assumed to be Gaussian, mainly
 because if all you have is a hammer, then everything looks like a nail).
Perfect! Thanks for the clarification. I think I was previously trying
to be a bit more clever than necessary (and as a result not being very
clever at all :-p)

Carson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Deviance of zeroinfl/hurdle models

2011-07-12 Thread Carson Farmer
Dear list, I'm wondering if anyone can help me calculate the deviance
of either a zeroinfl or hurdle model from package pscl?
Even if someone could point me to the correct formula for calculating
the deviance, I could do the rest on my own.

I am trying to calculate a pseudo-R-squared measure based on the
R^{2}_{DEV} of [1], so I need to be able to calculate the deviance of
the full and null models. Does anyone have any suggestions?
Alternatively, does anyone have a suggestion for a better measure to
report (I'm aware that R^2 measures aren't really appropriate here),
preferably something that is easy enough to program or compute using
existing packages...

Thanks in advance,

Carson

[1] Cameron, A.C., Windmeijer, F.A.G., 1996. R^2 measures for count
data regression models with applications to health-care utilization.
J. Bus. Econom. Statist. 14, 209–220


-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 'constrained' negative.binomial model estimates

2011-05-26 Thread Carson Farmer
Hello list,

I am not sure if the terminology that I am using here is widely used,
however, I provide an example in the hopes that my problem will become
clear. My basic problem is that I am unsure of how to 'constrain' my
model estimates to reproduce the aggregate (by factor levels) observed
dependent variable for a negative.binomial model. I realize this
sounds rather vague, so I provide an example to illustrate:

When working with migration data, it is possible to model migration
flows via a simple Poisson GLM like the following:

 model1 - glm(flows ~ destination.pop+distance+origin-0, data=migration, 
 family=poisson())
 model2 - glm(flows ~ destination.pop+distance+origin-0, data=migration, 
 family=quasipoisson())

where origin is a factor of origins (1 - n). This has the effect of
'constraining' the predicted flows to reproduce the observed outgoing
flows from each origin:

 sums - aggregate(migration$flows, by=list(migration$origin), sum)
 sums1 - aggregate(predict(model1, type='response'), 
 by=list(migration$origin), sum)
 sums2 - aggregate(predict(model2, type='response'), 
 by=list(migration$origin), sum)

which works fine for both poisson and quasipoisson models. However, I
would also like to fit a negative.binomial model to this data to
(again) account for over-dispersion (which may still exist even after
the constraint is imposed), however, the effect of the 'constraint'
factor does not seem to work, due likely to the additional theta
parameter.

 model3 - glm.nb(flows ~ destination.pop+distance+origin-0, data=migration)
 sums3 - aggregate(predict(model3,type='response'), 
 by=list(migration$origin), sum)

 data.frame(origin=unique(migration$origin), observed=sums$x, 
 poisson=sums1$x,quasi=sums2$x,negbin=sums3$x)
  origin observed poisson   quasinegbin
1  1   676308  676308  676308  651113.7
2  2  1155811 1155811 1155811 1141729.4
3  3  1789112 1789112 1789112 1845908.1
4  4   942162  942162  942162  978599.6
5  5  2484387 2484387 2484387 2486435.1
6  6   819222  819222  819222  747022.1
7  7  1237079 1237079 1237079 1405065.0
8  8  1067069 1067069 1067069  963339.5
9  9  2143172 2143172 2143172 2139171.5

Can anyone tell me how I might go about doing this for the
negative.binomial model? Or perhaps better still, an alternative way
of enforcing this 'constraint'?

The following dataset is what I used for the above example (this data
comes from Fotheringham and O’Kelly, 1989. Spatial interaction models:
Formulations and applications. Dordrecht: Kluwer Academic Publishers):

 dput(migration)
structure(list(flows = c(283049L, 87267L, 29877L, 130830L, 21434L,
30287L, 21450L, 72114L, 180048L, 237229L, 60681L, 382565L, 53772L,
64645L, 43749L, 133122L, 79223L, 300345L, 286580L, 346407L, 287340L,
161645L, 97808L, 229764L, 26887L, 67280L, 281791L, 92308L, 49828L,
144980L, 113683L, 165405L, 198144L, 718673L, 551483L, 143860L,
316650L, 199466L, 89806L, 266305L, 17995L, 55094L, 230788L, 49892L,
252189L, 121366L, 25574L, 66324L, 35563L, 93434L, 178517L, 185618L,
192223L, 141679L, 158006L, 252039L, 30528L, 87987L, 172711L,
181868L, 89389L, 27409L, 134229L, 342948L, 110792L, 268458L,
394481L, 274629L, 279739L, 87938L, 289880L, 437255L), distance = c(219L,
1009L, 1514L, 974L, 1268L, 1795L, 2420L, 3174L, 219L, 831L, 1336L,
755L, 1049L, 1576L, 2242L, 2996L, 1009L, 831L, 505L, 1019L, 662L,
933L, 1451L, 2205L, 1514L, 1336L, 505L, 1370L, 888L, 654L, 946L,
1700L, 974L, 755L, 1019L, 1370L, 482L, 1144L, 2278L, 2862L, 1268L,
1049L, 662L, 888L, 484L, 662L, 1795L, 2380L, 1795L, 1576L, 933L,
654L, 1144L, 662L, 1278L, 1779L, 2420L, 2242L, 1451L, 946L, 2278L,
1795L, 1278L, 754L, 3174L, 2996L, 2205L, 1700L, 2862L, 2380L,
1779L, 754L), origin.pop = c(16.2876696349298, 16.2876696349298,
16.2876696349298, 16.2876696349298, 16.2876696349298, 16.2876696349298,
16.2876696349298, 16.2876696349298, 17.4279408399148, 17.4279408399148,
17.4279408399148, 17.4279408399148, 17.4279408399148, 17.4279408399148,
17.4279408399148, 17.4279408399148, 17.5110179983684, 17.5110179983684,
17.5110179983684, 17.5110179983684, 17.5110179983684, 17.5110179983684,
17.5110179983684, 17.5110179983684, 16.6083307371083, 16.6083307371083,
16.6083307371083, 16.6083307371083, 16.6083307371083, 16.6083307371083,
16.6083307371083, 16.6083307371083, 17.2140377110705, 17.2140377110705,
17.2140377110705, 17.2140377110705, 17.2140377110705, 17.2140377110705,
17.2140377110705, 17.2140377110705, 16.3878173980331, 16.3878173980331,
16.3878173980331, 16.3878173980331, 16.3878173980331, 16.3878173980331,
16.3878173980331, 16.3878173980331, 16.761264461712, 16.761264461712,
16.761264461712, 16.761264461712, 16.761264461712, 16.761264461712,
16.761264461712, 16.761264461712, 15.9304398925737, 15.9304398925737,
15.9304398925737, 15.9304398925737, 15.9304398925737, 15.9304398925737,
15.9304398925737, 15.9304398925737, 17.0532473904734, 17.0532473904734,
17.0532473904734, 17.0532473904734, 17.0532473904734, 17.0532473904734,

[R] rank of Matrix

2011-03-31 Thread Carson Farmer
Dear list,

Can anyone tell me how to obtain the rank of a sparse Matrix, for
example from package Matrix (class dgCMatrix)? Here is an example of
QR decomposition of a sparse matrix (from the sparseQR class help).

library(Matrix)
data(KNex)
mm - KNex$mm
str(mmQR - qr(mm))

Similarly, using the functions/classes from the relatively new
MatrixModels package:

library(MatrixModels)
str(trial - data.frame(counts=c(18,17,15,20,10,20,25,13,12),
outcome=gl(3,1,9,labels=LETTERS[1:3]),
treatment=gl(3,3,labels=letters[1:3])))
glmS - glm4(counts ~ 0+outcome + treatment, poisson, trial,
  verbose = TRUE, sparse = TRUE)
str(glmS)
str(X - glmS@pred@X)
str(QR - qr(X))

I'm not very familiar with matrix decomposition, but is the 'p' slot
from a sparseQR object the same as $pivot from base qr? Additionally,
how do I obtain the rank of the input matrix? qr from the base package
produces an object with several components, two of which are rank, and
pivot, is there an equivalent way of getting at these values/vectors
for sparse matrices?
Basically, I am trying to get at the variance-covariance matrix in
order to compute t-values for the coefficients of a (sparse) GLM (i.e.
a summary function for class glpModel). Am I going about this the
right way? Are they perhaps more efficient ways of doing this when
working with sparse matrices. Is there a preferred way of calculating
t-statistics? Note that 'rankMatrix' from package Matrix appears to
densify the matrix before use, so this is not a viable option.

I have asked a similar question before [1], though I am now trying to
'answer' it myself by digging in and actually coding a bit more
myself, however, as you can see I'm still a bit stuck :-(

Thanks for any suggestions,

Carson

[1] http://www.mail-archive.com/r-help@r-project.org/msg130145.html

-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rank of Matrix

2011-03-31 Thread Carson Farmer
Hmm, looks like I'm 'answering' my own question here...

library(Matrix)
data(KNex)
mm - KNex$mm
str(mmQR - qr(mm))
# new stuff:
R - mmQR@R
Rdiag - diag(R)
rank - sum(Rdiag  max(dim(mm))*.Machine$double.eps*abs(R[1,1])) #
this is the matlab default I think?
# 712

for comparison, rankMatrix from package Matrix gives the same answer,
but takes considerably more time and memory!
rankMatrix(mm)
# 712

Does the above make sense to others? Alternatively, is it possible to
derive rank from the model itself (I can't see how)?

library(MatrixModels)
trial - data.frame(counts=c(18,17,15,20,10,20,25,13,12),
                       outcome=gl(3,1,9,labels=LETTERS[1:3]),
                       treatment=gl(3,3,labels=letters[1:3]))
glmS - glm4(counts ~ 0+outcome + treatment, family=poisson, data=trial,
                     verbose = TRUE, sparse = TRUE)
str(glmS)


-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] model diagnostics for MatrixModels

2011-03-27 Thread Carson Farmer
Dear list,

I have been working with the MatrixModels package quite a bit this
week, and it is proving to be extremely valuable for my current work
(I am working with several factors with many levels, leading to a
sparse model matrix). However, as my knowledge of statistical theory
leaves much to be desired, there are certain aspects of model
evaluation etc that I am having trouble with. Has anyone developed any
examples of model diagnostics using the glpModel class (from the
MatrixModels package)? Something akin to 'summary', or perhaps helper
functions such as 'logLik', 'AIC', 'predict' or even a means of
running some of the tests from package 'lmtest'? The structure of the
'glpModel' class is pretty straight-forward, so I am able to extract
*some* of the relevant information to perform *some* of these tests
myself, however, an example of performing some 'standard' model
diagnostics or tests/comparisons would be very helpful (In this case I
am really only interested in tests etc relevant to GLMs). I'm not
against writing some of these functions myself, so any
tips/suggestions/resources/examples are appreciated. Furthermore, does
anyone know if there facilities to obtain, for example, the rank of
the (sparse) model matrix efficiently? Quite a few tests require rank,
but computing this separately using for example rankMatrix from
package Matrix uses up tonnes of memory, which is exactly what I was
trying to avoid by using sparse matrices and glm4 :-p

Thanks for any pointers,

Carson

 sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_IE.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_IE.utf8LC_COLLATE=en_IE.utf8
 [5] LC_MONETARY=C LC_MESSAGES=en_IE.utf8
 [7] LC_PAPER=en_IE.utf8   LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IE.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
 [1] MatrixModels_0.2-1 pscl_1.03.6gam_1.03   akima_0.5-4
 [5] coda_0.13-5mvtnorm_0.9-92 lmtest_0.9-27  zoo_1.6-4
 [9] Matrix_0.999375-48 lattice_0.19-17MASS_7.3-7

loaded via a namespace (and not attached):
[1] grid_2.12.2  tools_2.12.2


-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mixture models/latent class regression comparison

2011-02-28 Thread Carson Farmer
Dear list,

I have been comparing the outputs of two packages for latent class
regression, namely 'flexmix', and 'mmlcr'. What I have noticed is that
the flexmix package appears to come up with a much better fit than the
mmlcr package (based on logLik, AIC, BIC, and visual inspection). Has
anyone else observed such behaviour? Has anyone else been successful
in using the mmlcr package? I ask because I am interested in latent
class negative binomial regression, which the mmlcr package appears to
support, however, the results for basic Poisson latent class
regression appear to be inferior to the results from flexmix. Below is
a simple reproducible example to illustrate the comparison:

library(flexmix)
library(mmlcr)
data(NPreg) # from package flexmix
m1 - flexmix(yp ~ x, k=2, data=NPreg, model=FLXMRglm(family='poisson'))
NPreg$id - 1:200 # mmlcr requires an id column
m2 - mmlcr(outer=~1|id, components=list(list(formula=yp~x,
class=poisonce)), data=NPreg, n.groups=2)

# summary and coefficients for flexmix model
summary(m1)
summary(refit(m1))

# summary and coefficients for mmlcr model
summary(m2)
m2

Regards,

Carson

P.S. I have attached a copy of the mmlcr package with a modified
mmlcr.poisonce function due to errors in the version available here:
http://cran.r-project.org/src/contrib/Archive/mmlcr/. See also
http://jeldi.com/Members/jthacher/tips-and-tricks/programs/r/mmlcr
section Bugs? subsection Poisson.

-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixture models/latent class regression comparison

2011-02-28 Thread Carson Farmer
Thanks for the reply Christian,

 I have never used mmlcr for this, but quite generally when fitting such
 models, the likelihood has often very many local optima. This means that the
 result of the EM (or a similar) algorithm depends on the initialisation,
 which in flexmix (and perhaps also in mmlcr) is done in a random fashion.
 This means that results may differ even if the same method is applied twice,
 and unfortunately, depending on the dataset, the result may be quite
 unstable. This may explain that the two functions give you strongly
 different results, not of course implying that one of them is generally
 better.
I though this might be the issue here. So is the solution to simply
run it many times and look for the best likelihood? I am still
confused as to why the mmlcr function consistently produces 'poorer'
results? Perhaps the flexmix package is being a bit more 'clever' in
terms of avoiding local optima? I will have to dig a bit deeper.

Cheers,

Carson

 Dear list,

 I have been comparing the outputs of two packages for latent class
 regression, namely 'flexmix', and 'mmlcr'. What I have noticed is that
 the flexmix package appears to come up with a much better fit than the
 mmlcr package (based on logLik, AIC, BIC, and visual inspection). Has
 anyone else observed such behaviour? Has anyone else been successful
 in using the mmlcr package? I ask because I am interested in latent
 class negative binomial regression, which the mmlcr package appears to
 support, however, the results for basic Poisson latent class
 regression appear to be inferior to the results from flexmix. Below is
 a simple reproducible example to illustrate the comparison:

 library(flexmix)
 library(mmlcr)
 data(NPreg) # from package flexmix
 m1 - flexmix(yp ~ x, k=2, data=NPreg, model=FLXMRglm(family='poisson'))
 NPreg$id - 1:200 # mmlcr requires an id column
 m2 - mmlcr(outer=~1|id, components=list(list(formula=yp~x,
 class=poisonce)), data=NPreg, n.groups=2)

 # summary and coefficients for flexmix model
 summary(m1)
 summary(refit(m1))

 # summary and coefficients for mmlcr model
 summary(m2)
 m2

 Regards,

 Carson

 P.S. I have attached a copy of the mmlcr package with a modified
 mmlcr.poisonce function due to errors in the version available here:
 http://cran.r-project.org/src/contrib/Archive/mmlcr/. See also
 http://jeldi.com/Members/jthacher/tips-and-tricks/programs/r/mmlcr
 section Bugs? subsection Poisson.

 --
 Carson J. Q. Farmer
 ISSP Doctoral Fellow
 National Centre for Geocomputation
 National University of Ireland, Maynooth,
 http://www.carsonfarmer.com/


 *** --- ***
 Christian Hennig
 University College London, Department of Statistical Science
 Gower St., London WC1E 6BT, phone +44 207 679 1698
 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche




-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] negative binomial latent class regression in package flexmix

2011-02-23 Thread Carson Farmer
Hello list,

Has anyone had any luck creating an M-step driver for negative
binomial regression for use with package flexmix? I've had a look
here: http://cran.r-project.org/web/packages/flexmix/vignettes/flexmix-intro.pdf
as well as poking around in the flexmix source, but I haven't had much
luck getting anything to work. I can't figure out how to a) come up
with an initial estimate of theta, as I don't think this really
belongs in the M-step, and b) how to properly implement the M-step so
that refit() doesn't give me:
Error in .local(object, ...) :
  not implemented yet for restricted parameters.
(which I think is due to the additional theta parameter)? I realize
this is a bit of a vague question, but my hope is that there is
someone out there who has experience working with NB in a latent class
setting that can help me out here. Frankly I'm pretty stumped on this
one, and I don't think posting any code in this case will enlighten
any one!

Thanks in advance for any tips or pointers.

P.S. I'm not married to using flexmix, though I do like the
implementation and extensibility so far.

-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spatial interaction (gravity) model as Poisson regression

2010-10-04 Thread Carson Farmer
Dear list,

I posted essentially this same question to the r-sig-geo mailing list
last week with no response :( Unfortunately I am no closer to reaching
a solution, so I now post it here (with some clarifications) in the
hope that someone following this list might have an answer for me:

Has anyone had much experience with spatial interaction (or gravity)
models, specifically in the form of Poisson regression? I'm a bit
unsure of how to operationalize this using glm(), and would appreciate
any pointers from those with more experience. I have searched the
archives extensively, and there are several mentions of this question,
but I have yet to find anything concrete that I can wrap my head
around... apologies if I have missed something.

Basically, the conventional origin constrained model would look
something like this:
T_{ij} = exp(\delta_{i} + \log{A_{j}} - \beta D_{ij}) ~ \varepsilon_{ij}
where \delta_{i} is a constant parameter specific to the ith zone,
A_{j} is the attractiveness of the jth location, and D_{ij} is the
distance between i and j. Note that \varepsilon_{ij} is just the
multiplicative error term of the flow from i to j, and \beta is the
distance decay parameter.

Similarly, the doubly constrained model follows the form:
T_{ij} = exp(\delta_{i} + \gamma_{j} - \beta D_{ij}) ~ \varepsilon_{ij}
where everything is defined as above, except exp(\gamma_{j}) is an
estimate of the attractiveness of location A_{j}.

Hopefully the above description makes things a bit clearer,
essentially my question is this:
What factors or in what form do I have to have my data in order to be
able to run such a model following the glm syntax?
I know this should be relatively straight-forward, I just can't seem
to get my head wrapped around it at all?
If it helps, I can provide some sample data to those who request it.

TIA, Carson

-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting radial dendrograms

2010-06-17 Thread Carson Farmer
Dear list,

I am trying to plot a radial dendrogram using the ape package, which
requires my data to be of class 'phylo'. Currently I have my
dendrogram stored as an object of class 'dendrogram' which was
produced from an outside bit of C code, but was made into an object of
class 'igraph.eigenc' and converted to a dendrogram using
'as.dendrogram()' from the igraph package. I would like to find a way
to convert this dendrogram to a 'phylo' object, either directly, or
through conversion via 'hclust'. I have tried creating an 'hclust'
object 'manually' but keep running into Error: segfault from C stack
overflow when I try to plot it:

 result - list()
 result$merge-merges
 result$method - NULL
 result$dist.method - modularity
 result$height - seq(length(merges[,1]),1,-1)
 result$order - seq(length(merges[,1])+1)
 result$call - some call
 class(result) - hclust
 plot(result)

where merges is a two column matrix of cluster joins that looks
something like this:

 merges
 [,1] [,2]
[1,]02
[2,]31

The dendrogram object was created using something like this:

 result - list()
 class(result) - igraph.eigenc
 result$merges - merges
 result$membership - membership
 dend - as.dendrogram(result)

where merges is as above, and membership is:

 membership
[1] 0 0 0 0 0 2 2 2 2 2 1 1 1 1 1

Any thoughts on how to convert a dendrogram object to an hclust or
phylo object, or barring that, another way to plot a radial dendrogam
in R?

Regards,

Carson

-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] set values in data.frame to NA conditional on another data.frame

2008-07-02 Thread Carson Farmer

Hello List,

Is there a faster way to set values in one data.frame equal to NA 
conditional on the corresponding value in another data.frame? Currently 
I am using:

 b[is.na(a)] - NA
where 'a' and 'b' are data.frames of equal size/dimensions, and 'a' 
contains NAs but 'b' does not. This is extremely slow as is, as my 
data.frames are about 6000 columns by 6000 rows in size.

Are there any tricks to speed things up here?

Thank you in advance,

Carson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Replace values in data.frame conditional on another data.frame

2008-03-05 Thread Carson Farmer
Dear List,

I am looking for an efficient method for replacing values in a
data.frame conditional on the values of a separate data.frame. Here is
my scenario:
I have a data.frame (A) with say 1000 columns, and 365 rows. Each cell
in the data.frame has either valid value, or NA. I have an additional
data.frame (B) with the same number of rows and columns, with valid
values in all cells. What I would like to do, is replace the cells in B
with NA if and only if the corresponding cell in A is NA.
I have search extensively for a method to do this, and I'm sure that in
the end the solution will be embarrassingly simple, however, any help is
greatly appreciated!

Cheers,

Carson

Apologies if this has been posted twice, we are currently experiencing 
server problems...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fourier Analysis and Curve Fitting in R

2008-01-28 Thread Carson Farmer
Rolf Turner wrote:

 On 26/01/2008, at 10:54 AM, Carson Farmer wrote:

 Dear List,

 I am attempting to perform a harmonic analysis on a time series of snow
 depth, in which the annual curve is essentially asymmetric (i.e. snow
 accumulates slowly over time, and the subsequent melt occurs relatively
 rapidly).  I am trying to fit a curve to the data, however, the actual
 frequency is unknown.
 In general the actual frequency of the curve will indeed be close to 
 1/(1 year). However, because I intend to perform this analysis on many 
 regions, this will not always be the case. This is perhaps an 
 acceptable assumption however...
 Obviously there is something I am not understanding here.
 I would have thought that the ``actual frequency'' would
 be 1/(1 year) (period = 1 year) --- modulo the fact that
 the length of the year is constantly changing a tiny bit.
 (But I would've thought that this would have no practical
 impact in respect of any observed series.)

 My sampling interval is daily.
 What is your sampling interval, BTW? Day?  Week?  Month?
 I have been trying to follow the methods in Peter
 Bloomfields text Fourier Analysis of Time Series, but am having
 trouble implementing this in R.
 Yes it certainly would.
 Note that even though the ``actual frequency'' is (???) 1/(1 year),
 the representation of the mean function in terms of sinusoids
 will involve in theory infinitely many terms/frequencies since
 the mean function is clearly (!) not a sinusoid.

 Does anyone have any suggestions, or perhaps directions on how this
 might be done properly? Am I using the right methods for fitting an
 asymmetric curve?
 What I am really trying to do is fit a relatively smooth line to my 
 data which will preferentially weight the larger values. This method 
 needs to be able to fit through data gaps however, which is why I was 
 originally looking to fit sinusoids. A jpg of a single year of the 
 data is available here: 
 http://www.geog.uvic.ca/spar/carson/snowDepth.jpg to give you an 
 idea of the shape of my curve.
 Thank you again for your help,

 Carson

 I would have to know more about what you are *really* trying
 to do, and what the data are like, before I could make any
 useful suggestions.  Many modelling issues could come into
 play, and many modelling strategies are potentially applicable.

 cheers,

 Rolf Turner


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fourier Analysis and Curve Fitting in R

2008-01-25 Thread Carson Farmer
Dear List,

I am attempting to perform a harmonic analysis on a time series of snow 
depth, in which the annual curve is essentially asymmetric (i.e. snow 
accumulates slowly over time, and the subsequent melt occurs relatively 
rapidly).  I am trying to fit a curve to the data, however, the actual 
frequency is unknown. I have been trying to follow the methods in Peter 
Bloomfields text Fourier Analysis of Time Series, but am having 
trouble implementing this in R.

Does anyone have any suggestions, or perhaps directions on how this 
might be done properly? Am I using the right methods for fitting an 
asymmetric curve?

Any help is greatly appreciated,

Sincerely,

Carson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cycle Regression Analysis in R?

2008-01-10 Thread Carson Farmer
Hello R community,

Does anyone know of a package that will perform cycle regression 
analysis? I have searched the R-help archives etc. but have come up with 
nothing so far.
If I am unable to find an existing R package to do so, is there anyone 
familiar with fitting sine functions to data.  My problem is this:
I have a long time-series of daily SWE estimates (SWE = snow water 
equivalence, or the amount of water stored in a snowpack) which follows 
a sinusoidal pattern, and I need to estimate the parameters of the sine 
function that best fits this data. While there may be many contributing 
sine functions and/or linear trends, I am only interested in a single 
sine function that most closely fits the data (trends can be removed 
separately if need be).  Perhaps some sort of non-linear least squares 
method would be best?

Any help, or suggestions to get me on the right track are greatly 
appreciated.

Carson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.