[R] Instagram Analysis

2018-09-25 Thread Michael Haenlein
Dear all,

I'm looking for an R package that allows me to analyze Instagram.
Specifically I would like to download for a given account the list of other
accounts that either this account follows or that follow this account (the
followers and following numbers).

I know there is instaR but this package is quite old (August 2016) and
seems not to have been updated in the meantime. Is there a new package or
any other way to get this information in an easy way?

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Speeding up npreg

2018-02-03 Thread Michael Haenlein
Dear all,

I am using npreg from the np library to run a Kernel regression. My dataset
is relatively large and has about 3000 observations. The dependent variable
is continuous and I have a total of six independent variables -- two
continuous, two ordinal and two categorical.

The model converges without problems but it takes a very long time to do so
(nearly one hour).

Is there any way to speed up the npreg function to decrease the running
time? Or is there another function/ package for Kernel regression that may
be faster?

Any advice would be much appreciated

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregation across two variables in data.table

2017-12-13 Thread Michael Haenlein
Dear all,

I have a data.frame that includes a series of demographic variables for a
set of respondents plus a dependent variable (Theta). For example:

   AgeEducation   Marital Familysize
IncomeHousingTheta
1:  50 Associate degree  Divorced  4
 70K+Owned with mortgage 9.14
2:  65  Bachelor degree   Married  1
10-15K Owned without mortgage 7.345036
3:  33  Bachelor degree   Married  2
30-40KOwned with mortgage 7.974937
4:  69  Bachelor degree Never married  1
 70K+Owned with mortgage 7.733053
5:  54 Some college, less than college graduate Never married  3
30-40K Rented 7.648642
6:  35 Associate degree Separated  2
10-15K Rented 7.496411

My objective is to calculate the average of Theta across all pairs of two
demographics.

For 1 demographic this is straightforward:

Demo_names <- c("Age", "Education", "Marital", "Familysize", "Income",
"Housing")
means1 <- as.list(rep(0, length(Demo_names)))
for (i in 1:length(Demo_names)) {
Demo_tmp <- Demo_names[i]
means1[[i]] <- data_tmp[,list(mean(Theta)),by=Demo_tmp]}

Is there an easy way to extent this logic to more than 1 variable? I know
how to do this manually, e.g.,
data_tmp[,list(mean(Theta)),by=list(Marital, Education)]

But I don't know how to integrate this into a loop.

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simulate data from Structural Equation Model

2017-03-16 Thread Michael Haenlein
Dear all,

I am looking for an R package or code that allows me to simulate data
consistent with a given structural equation model. Essentially my idea is
to define (a) the number of endogenous and exogenous latent variables, (b)
the strength of relationship between them and (c) the way of measurement
(number of indicators, distribution of indicators) and to obtain simulated
data consistent with this specification.

I know there is some literature on this topic (e.g., Mattson, S. (1997).
How to generate non-normal data for simulation of structural equation
models. Multivariate behavioral research, 32(4), 355 – 373), but I do not
know whether some of these approaches have already been implanted in R and/
or whether better methods exist.

Any help would be very much appreciated,

Thanks,

Michael


Michael Haenlein
Professor of Marketing
ESCP Europe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lm model with many categorical variables

2016-09-20 Thread Michael Haenlein
Dear all,

I am trying to estimate a lm model with one continuous dependent variable
and 11 independent variables that are all categorical, some of which have
many categories (several dozens in some cases).

I am not interested in statistical inference to a larger population. The
objective of my model is to find a way to best predict my continuous
variable within the sample.

When I run the lm model I evidently get many regression coefficients that
are not significant. Is there some way to automatically combine levels of a
categorical variable together if the regression coefficients for the
individual levels are not significant?

My idea is to find some form of grouping of the different categories that
allows me to work with less levels while keeping or even improving the
quality of predictions.

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fractional Factorial Design on 4-level factor

2016-05-31 Thread Michael Haenlein
Dear all,

I am running a simulation experiment with 8 factors that each have 4
levels. Each combination is repeated 100 times. If I run a full factorial
this would mean 100*8^4 = 409,600 runs.

I am trying to reduce the number of scenarios to run using a fractional
factorial design. I'm interested in estimating the main effects of the 8
factors plus their 2-way interactions. Any higher level interactions are
not of interest to me. My plan is to use a standard OLS regression for
that, once the simulations are over.

I tried to use the FrF2 package to derive a fractional factorial design but
it seems that this is only working for factors on two levels. Any idea how
I could derive a fractional factorial design on factors with four levels?

Thanks for your help,

Michael



Michael Haenlein
Professor of Marketing
ESCP Europe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (Small) programming job related to network analysis

2016-04-19 Thread Michael Haenlein
Dear all,

I am looking for help in programming three functions. Those functions
should simulate (social) networks according to the process described in :

(1) A.H. Dekker - "Realistic Social Networks for Simulation using Network
Rewiring" (
http://www.mssanz.org.au/MODSIM07/papers/13_s20/RealisticSocial_s20_Dekker_.pdf
)

(2) Konstantin Klemm and Vıctor M. Eguıluz - "Growing scale-free networks
with small-world behavior" (http://ifisc.uib-csic.es/victor/Nets/sw.pdf)

(3) Petter Holme and Beom Jun Kim - "Growing Scale-Free Networks with
Tunable Clustering" (http://arxiv.org/pdf/cond-mat/0110452.pdf)

I am looking for three functions (e.g., sample_dekker, sample_klemm,
sample_holme) that generate an output similar to the functions sample_pa
and sample_smallworld in the R package igraph. The input should be the
number of nodes in the network (e.g., 1000) and any other parameters those
models require.

In case this is of relevance please get in touch with me by email to
discuss further details.

Thanks,

Michael



Michael Haenlein
Professor of Marketing
ESCP Europe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Social Network Simulation

2016-04-16 Thread Michael Haenlein
Dear all,

I am trying to simulate a series of networks that have characteristics
similar to real life social networks. Specifically I am interested in
networks that have (a) a reasonable degree of clustering (as measured by
the transitivity function in igraph) and (b) a reasonable degree of degree
polarization (as measured by the average degree of the top 10% nodes with
highest degree divided by the overall average degree).

Right now I am using two functions from irgaph (sample_pa and
sample_smallworld) but these are not ideal since they only allow me to vary
one of the two characteristics. Either the network has good clustering but
not enough polarization or the other way round.

I looked around and I found some network algorithms that solve the problem
(E.g., Jackson and Rogers, Meeting Strangers and Friends of Friends), but I
did not find their implemented in an R package. I also found the R package
NetSim which seems to be in this spirit, but I cannot get it to work.

Could anyone point me to an R library that I could check out? I do not care
much about the specific algorithm used as long as it allows me to vary
clustering and degree polarization in certain ranges.

Thanks,

Michael


Michael Haenlein
Professor of Marketing
ESCP Europe, Paris

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] igraph -- Selecting closest neighbors

2015-10-17 Thread Michael Haenlein
Dear all,

I am looking for a function to select the N closest neighbors (in terms of
distance) of a vertex in igraph.

Assume for example N=7. If the vertex has 3 direct neighbors, I would like
that the function selects those 3 plus a random 4 among the second degree
neighbors.

Is there some way to do this in an efficient way? I have been trying to
program something using ego () with varying levels of distance but I have
not managed to get a conclusive solution.

Thanks for your help,

Michael


Michael Haenlein
Professor of Marketing
ESCP Europe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Running R Remotely on LINUX

2015-04-14 Thread Michael Haenlein
Dear all,

I am used to running R locally on my Windows-based PC. Since some of my
computations are taking a lot of time I am now trying to move to a remote R
session on a LINUX server but I am having trouble to getting things work.

I am able to access the LINUX server using PuTTY and SSH. Once I have
access I can log in with my username and password (which is asked through
keyboard-interactive authentication). I can then open an R session.

Since I am not used to working with LINUX, I have several questions:

(1) Ideally I am looking for a Windows-based software that would allow me
to work on R as I am used to with the difference that the computations are
run remotely on the LINUX server. Does a software like this exist? Please
note that I do not think that I can install any software on the LINUX
server. But I can install stuff on my Windows-based PC.

(2) I am running an extensive simulation that takes about one week to run.
Right now it seems that when I log out of R on LINUX and close PuTTY, the R
session closes as well. Is there a way to let R run in the background for
the week and just check into the progress 1-2 times a day?

(3) Can I open several instances of R in parallel? On my PC I sometimes
have 2-3 windows open in parallel that work on different calculations to
save time. Not sure to which extent this is possible on LINUX.

I assume that this questions are very naïve. But since I’m only used to
working with Windows I’m quite stuck at the moment. Any help would be very
appreciated!

Thanks in advance,

Michael




Michael Haenlein
Professor of Marketing
ESCP Europe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Teaching materials for R course

2015-02-03 Thread Michael Haenlein
Dear all,

I am Professor at a business school and I would like to develop a course
about quantitative research using R.

My current plan is that the course should cover (a) an introduction
(assuming that students have never used R before), (b) basic econometric
analysis (e.g., regression, logit) as well as (c) structural equation
modelling.

Are there any textbooks and teaching materials (e.g., PowerPoint slides)
that one of you could recommend for me to have a look at?

Thanks,

Michael

Michael Haenlein
Professor of Marketing
ESCP Europe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bayesian multivariate linear regression

2014-09-02 Thread Michael Haenlein
Dear all,

I'm looking for a package that allows me to run a Bayesian multivariate
linear regression and extract predicted values. In essence I'm looking for
the equivalent of lm and lm.predict in a Bayesian framework.

I have found several libraries that allow to run Bayesian multivariate
linear regression (e.g., bayesm), but those do not seem to have a
prediction function. And the ones with prediction (e.g., MCMCPack) do not
support multiple dependent variables.

If you have any pointers, please let me know.

Best wishes,

Michael


Michael Haenlein
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] boxcox alternative

2014-02-24 Thread Michael Haenlein
Dear all,

I am working with a set of variables that are very non-normally
distributed. To improve the performance of my model, I'm currently applying
a boxcox transformation to them. While this improves things, the
performance is still not great.

So my question: Are there any alternatives to boxcox in R? I would need a
model that estimates the best transformation automatically without input
from the user since my approach should be flexible enough to deal with any
kind of distribution. boxcox allows me to do this by picking the lambda
that leads to the best fit but I wonder whether there are other options
out there.

Thanks,

Michael


Michael Haenlein
Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R in remote mode

2014-01-22 Thread Michael Haenlein
Dear all,

I have written a simulation in R that has a significant running time
(probably 60-80 hours). While I can run the code on my laptop, it tends to
slow things down to a significant extent and it leads to a very high CPU
temperature overall.

Is there an easy and convenient way to run R remotely on some outside
server or PC? Any services that you are aware off? I know that there is a
way to run R on Amazon EC but I'm wondering whether there is something even
simpler. Ideally I am looking for a remote access to a PC where R is
already installed and where I can simply copy-paste my code and run it.

Please let me know in case you have any ideas,

Thanks in advance,

Michael


Michael Haenlein
Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Looking for consultant in mathematics/ statistics

2013-12-03 Thread Michael Haenlein
Dear all,

I am looking for a consultant who can help me to solve a mathematical/
statistical problem I have. The problem is more conceptual in nature (How
to solve a given problem analytically) than programming-related. Although I
also would need some programming support later, once the analytic solution
has been found. My question relates to categorical variables that are
represented by underlying latent variables.

If you think you could help, please send me an email. I will then describe
the problem in more detail and we can agree on fees and timelines. My gut
feeling is that it will not take long (like a couple of hours), but I might
be wrong.

Looking forward to hearing from you,

Michael


Michael Haenlein
Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lp.transport in package lpSolve

2013-04-23 Thread Michael Haenlein
Dear all,

I'm working on a very complex linear optimization problem using the
lp.transport function in lpSolve. My PC has 10 cores, but by default R uses
only one of them.

Is there a straightforward way to make lp.transport use all cores
available? I had a look at High-performance and parallel computing in R (
http://cran.r-project.org/web/views/HighPerformanceComputing.html), but I
have the impression that using multiple cores would require me to change
the function underlying lp.transport. The problem is that I'm not able
whether I'm able to make those adjustments.

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict.lm

2013-04-08 Thread Michael Haenlein
Dear all,

I would like to use predict.lm to obtain a set of predicted values based on
a regression model I estimated.

When I apply predict.lm to two vectors that have the same values, the
predicted values will be identical. I know that my regression model is not
perfect and I would like to take account of the error inherent in the model
within my predictions. So, while I understand that the expected value of
both vectors should be the same (since they have the same value), I would
like to have different predictions to take account of the error inherent in
my model.

I assume I can probably use se.fit to achieve my objective of including
random error in my predictions but I don't really know how. Could anybody
give me a pointer on how this can be done?

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Approximating discrete distribution by continuous distribution

2013-01-22 Thread Michael Haenlein
Dear all,

I have a discrete distribution showing how age is distributed across a
population using a certain set of bands:

Age - matrix(c(74045062, 71978405, 122718362, 40489415), ncol=1,
dimnames=list(c(18, 18-34, 35-64, 65+),c()))
Age_dist - Age/sum(Age)

For example I know that 23.94% of all people are between 0-18 years, 23.28%
between 18-34 years and so forth.

I would like to find a continuous approximation of this discrete
distribution in order to estimate the probability that a person is for
example 16 years old.

Is there some automatic way in R through which this can be done? I tried a
Kernel density estimation of the histogram but this does not seem to
provide what I'm looking for.

Thanks very much for your help,

Michael


Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression with very high number of categorical variables

2012-05-08 Thread Michael Haenlein
Dear all,

I would like to run a simple regression model y~x1+x2+x3+...

The problem is that I have a lot of independent variables (xi) -- around
one hundred -- and that some of them are categorical with a lot of
categories (like, for example, ZIP code). One straightforward way would be
to (a) transform all categorical variables into 1/0 dummies and (b) enter
all the variables into an lm model. But I'm not sure whether this is very
efficient, especially since the analysis is exploratory in nature and I
expect that many of the xi will have no significant impact on y.

Is there a R library that can handle such a setting? I have read about
Hierarchical Bayesian variance components models that have been used with
ZIP data (www.jstor.org/stable/10.2307/4129723), but I'm not sure to which
extent there is a function in R to do that in a straightforward manner.

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Curve fitting, probably splines

2012-04-12 Thread Michael Haenlein
Dear all,

This is probably more related to statistics than to [R] but I hope someone
can give me an idea how to solve it nevertheless:

Assume I have a variable y that is a function of x: y=f(x). I know the
average value of y for different intervals of x. For example, I know that
in the interval[0;x1] the average y is y1, in the interval [x1;x2] the
average y is y2 and so forth.

I would like to find a line of minimum curvature so that the average values
of y in each interval correspond to y1, y2, ...

My idea was to use (cubic) splines. But the problem I have seems somewhat
different to what is usually done with splines. As far as I understand it,
splines help to find a curve that passes a set of given points. But I don't
have any points, I only have average values of y per interval.

If you have any suggestions on how to solve this, I'd love to hear them.

Thanks very much in advance,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Consultant to program R-code dealing with social networks

2012-01-30 Thread Michael Haenlein
Dear all,

I am looking for a consultant/ programmer to program a relatively simple R
code for me.

Specifically, I have about 50 social networks. These networks have between
5,000 and 5 million nodes and between 30,000 and 70 million edges. The code
should (a) read one network into R, (b) draw a snowball sample of size x
out of the network (e.g., a snowball sample of 1,000 nodes), (c) determine
some basic network statistics for that sample and (d) save the sample and
network statistics into two files for further use.

Let me know by email on case you are interested so that we can speak about
the remaining details.

Thanks,

Michael




Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extract BIC for coxph

2011-12-20 Thread Michael Haenlein
Dear all,

is there a function similar to extractAIC based on which I can extract the
BIC (Bayesian Information Criterion) of a coxph model?
I found some functions that provide BIC in other packages, but none of them
seems to work with coxph.

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression model when dependent variable can only take positive values

2011-12-06 Thread Michael Haenlein
Dear all,

I would like to run a regression of the form lm(y ~ x1+x2) where the
dependent variable y can only take positive values. Assume, for example,
that y is the height of a person (measured in cm), x1 is the gender
(measured as a binary indicator with 0=male and 1=female) and x2 is the age
of the person (measured in years).

When I run a simple lm(y ~ x1+x2), I obtain an intercept value that is
negative. I interpret that in a way that a person who is male (x1=0) and
just born (x2=0), has a negative height. This evidently does not make
sense. I therefore assume that my estimates might be biased and that I need
to use some other form of estimation that takes account of the fact that
y0 for all observations.

Could anybody please tell me which type of regression would be most
recommendable for this type of analysis?

Thanks very much in advance,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time-dependent covariates in coxph model

2011-10-07 Thread Michael Haenlein
Dear all,

I have a question about time-dependent covariates in a coxph model.
Specifically I am wondering whether it is possible to give more recent
events a higher weight when constructing time-dependent covariates.

Assume I have a sample of cancer patients and I would like to predict
whether the number of treatments a patient received has an impact on
survival time. For each patient in my sample I know (a) the date when a
patient is diagnosed with cancer, (b) all the dates where a treatment took
place and (c) the date of death or, alternatively, the date where the
observation window ends.

Take the following example: Bob is diagnosed with cancer on 01/01/1990, has
three treatments (on 01/01/1993, 01/01/1995 and 01/01/1997) and dies on
01/01/1999. In order to incorporate the time-dependent covariates into my
model, I transform this into four separate datapoints:

(1) Start: 01.01.1990, End: 01.01.1993, Number of treatments: 0
(2) Start: 01.01.1993, End: 01.01.1995, Number of treatments: 1
(3) Start: 01.01.1995, End: 01.01.1997, Number of treatments: 2
(4) Start: 01.01.1997, End: 01.01.1999, Number of treatments: 3

The problem is that in this formulation all treatments count the same way,
no matter when they took place. I would like to introduce some form of
discount factor that takes account of the fact that the potential impact of
each treatment decays over time. If that discount factor is d, I would like
to model the following four datapoints:

(1) Start: 01.01.1990, End: 31.12.1992, Number of treatments: 0
(2) Start: 01.01.1993, End: 31.12.1994, Number of treatments: 1
(3) Start: 01.01.1995, End: 31.12.1996, Number of treatments: 1*d^2 + 1
(4) Start: 01.01.1997, End: 01.01.1999, Number of treatments: 1*d^4 + 1*d^2
+ 1

d^n hereby accounts for the fact that the treatment was already n years ago
at the start of the observation.

My question: Is it possible to include such a formulation in a coxph model?
Is there a way to estimate the optimal d, so that I can estimate how fast
the effect of a treatment decays over time, given the data I have?

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pearson chi-square test

2011-09-27 Thread Michael Haenlein
Dear all,

I have some trouble understanding the chisq.test function.
Take the following example:

set.seed(1)
A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE)
B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE)
C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE)
x - table(A,B)
y - table(A,C)

When I calculate the test statistic by hand I get a value of approximately
75.9:
http://en.wikipedia.org/wiki/Pearson's_chi-square_test#Calculating_the_test-statistic
sum((x-y)^2/y)

But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x)
gives a value of 10.3.

I understand that I must be doing something wrong here, but I'm not sure
what.

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pearson chi-square test

2011-09-27 Thread Michael Haenlein
Dear Michael,



Thanks very much for your answers!



The purpose of my analysis is to test whether the contingency table x is
different from the contingency table y.

Or, to put it differently, whether there is a significant difference between
the joint distribution AB and AC.



Based on your answer I'm wondering whether the best way to do this is really
a chisq.test?

Or is there probably a different function or package I should use
altogether?



Thanks,



Michael







-Original Message-
From: Meyners, Michael [mailto:meyner...@pg.com]
Sent: Dienstag, 27. September 2011 17:00
To: Michael Haenlein; r-help@r-project.org
Subject: RE: [R] Pearson chi-square test



Just for completeness: the manual calculation you'd want is most likely



sum((x-y)^2  / (x+y))



(that's one you can find on the Wikipedia link you provided). To get the
same from chisq.test, try something like



chisq.test(data.frame(x,y)[,c(3,6)])



(there are surely smarter ways, but at least it works here). Note that
something like



chisq.test(as.vector(x), as.vector(y))



will give a different test, i.e. based on a contingency table of x cross y).

M.



 -Original Message-

 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-

 project.org] On Behalf Of Meyners, Michael

 Sent: Tuesday, September 27, 2011 13:28

 To: Michael Haenlein; r-help@r-project.org

 Subject: Re: [R] Pearson chi-square test



 Not sure what you want to test here with two matrices, but reading the

 manual helps here as well:



 y   a vector; ignored if x is a matrix.



 x and y are matrices in your example, so it comes as no surprise that

 you get different results. On top of that, your manual calculation is

 not correct if you want to test whether two samples come from the same

 distribution (so don't be surprised if R still gives a different

 value...).



 HTH, Michael



  -Original Message-

  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-

  project.org] On Behalf Of Michael Haenlein

  Sent: Tuesday, September 27, 2011 12:45

  To: r-help@r-project.org

  Subject: [R] Pearson chi-square test

 

  Dear all,

 

  I have some trouble understanding the chisq.test function.

  Take the following example:

 

  set.seed(1)

  A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE)

  B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE)

  C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE)

  x - table(A,B)

  y - table(A,C)

 

  When I calculate the test statistic by hand I get a value of

  approximately

  75.9:

  http://en.wikipedia.org/wiki/Pearson's_chi-

  square_test#Calculating_the_test-statistic

  sum((x-y)^2/y)

 

  But when I do chisq.test(x,y) I get a value of 12.2 while

  chisq.test(y,x)

  gives a value of 10.3.

 

  I understand that I must be doing something wrong here, but I'm not

  sure

  what.

 

  Thanks,

 

  Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pearson correlation of sum of variables

2011-09-25 Thread Michael Haenlein
Dear all,

this is more a math-related question, but probably can help me nevertheless:

Assume I have two random variables: A and B.
Furthermore assume that I know the Pearson Correlation Coefficient between A
and B: cor(A,B)

I now define C = 1-(A+B).
Is there some way to determine cor(C,A) and cor(C,B)?
Or, to put it differently, what is cor(1-A-B)?

I know that the Pearson Correlation Coefficient is not additive, but
probably there is still some way to solve that.

Thanks very much in advance,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cannot allocate vector of size x

2011-09-21 Thread Michael Haenlein
Dear all,

I am running a simulation in which I randomly generate a series of vectors
to test whether they fulfill a certain condition. In most cases, there is no
problem. But from time to time, the (randomly) generated vectors are too
large for my system and I get the error message: Cannot allocate vector of
size x.

The problem is that in those cases my simulation stops and I have to start
it again manually. What I would like to do is to simply ignore that the
error happened (or probably report that it did) and then continue with
another (randomly) generated vector.

So my question: Is there a way to avoid that R stops in such a case and just
restarts the program from the beginning as if nothing happened?
I hope I'm making myself clear here ...

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Binary optimization problem in R

2011-09-19 Thread Michael Haenlein
Dear all,



I would like to solve a problem similar to a multiple knapsack problem and
am looking for a function in R that can help me.



Specifically, my situation is as follows: I have a list of n items which I
would like to allocate to m groups with fixed size. Each item has a certain
profit value and this profit depends on the type of group the item is in. My
problem is to allocate the items into groups so the overall profit is
maximized while respecting the fixed size of each group.



Take the following example with 20 items (n=5) and 5 groups (m=5):

set.seed(1)

profits - matrix(runif(100), nrow=20)

size-c(2,3,4,5,6)



The matrix profits describes the profit of each item when it is allocated
to a certain group. For example, when item 1 is allocated to group 1 it
generates a profit of 0.26550866. However, when item 1 is allocated to group
2 it generates a profit of 0.93470523. The matrix size describes the size
of each group. So group 1 can contain 2 items, group 2 3 items, group 4 4
items, etc.



I think this is probably something that could be done with constrOptim() but
I'm not exactly sure how.



Any help is very much appreciated!



Thanks very much in advance,



Michael







Michael Haenlein

Associate Professor of Marketing

ESCP Europe

Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Allocation of data points to groups based on membership probabilities

2011-09-15 Thread Michael Haenlein
Dear all,

I have a matrix that provides, for a series of data points, the probability
that each of these points belongs to a certain group.
Take the following example, which represents 20 data points and their group
membership probability to five groups (A-E):

set.seed(1)
probs - matrix(runif(100),nrow=20,
dimnames=list(c(),c(A,B,C,D,E)))

In addition  know how large each group should be.
Assume for example, that the groups sizes in the aforementioned example are
5, 4, 1, 6, 4 for A, B, C, D and E respectively.

I would like to allocate individuals to the groups so that
(a) each group has the size it is supposed to have and
(b) all data points are part of the group where they have a high probability
of belonging.

For some data points this allocation is straightforward, because one group
membership probability is much larger than the others.
But for others two or more probabilities are very similar which means that a
datapoint could be allocated to either one or the other group.

I guess it should be possible to write some iterative code or an
optimization routine that can do what I would like to do, but I do not know
how.

Does anyone have an idea how this could be done?

Thanks very much in advance,

Michael Haenlein



Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Convert continuous variable into discrete variable

2011-07-15 Thread Michael Haenlein
Dear all,

I have a continuous variable that can take on values between 0 and 100, for
example: x-runif(100,0,100)

I also have a second variable that defines a series of thresholds, for
example: y-c(3, 4.5, 6, 8)

I would like to convert my continuous variable into a discrete one using the
threshold variables:

If x is between 0 and 3 the discrete variable should be 1
If x is between 3 and 4.5 the discrete variable should be 2
If x is between 4.5 and 6 the discrete variable should be 3
If x is between 6 and 8 the discrete variable should be 4
If x is larger than 8 the discrete variable should be 5

Is there a straightforward way of doing this (besides working with several
if statements in a row)?

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with estimating copulas

2011-06-29 Thread Michael Haenlein
Dear all,

I am looking to hire a consultant/ adviser who can help me to get my head
around copulas. For a person familiar with the topic (
http://en.wikipedia.org/wiki/Copula_(statistics)) who knows the copula
package or similar (http://www.jstatsoft.org/v21/i04/paper) I think the job
should not take more than a couple of hours, one day maximum.

Please contact me in case you are interested so that I can provide you with
additional details on the problem I'd like to solve.

Thanks,

Michael


Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help to improve existing R-Code

2011-05-27 Thread Michael Haenlein
Dear all,

I have written a relatively brief R-Code to run a series of simulations.
Currently the code runs for a very long time (up to several days, depending
on the conditions) and I expect this to be the case because it might not be
very efficiently written. I am, for example, relying on several for(...)
loops which could probably be done much faster using a different way of
programming.

I am looking for a consultant who could help me to improve my code. The idea
is that I send the code to the person, s/he works on improving it and then
sends the improved version back to me. I think for an experienced programmer
the job should not take more than 2-3 days (probably less), but this is to
be decided once the person has looked at the code.

In case you are interested, please send me a brief message so that I can
provide you with more details,

Thanks,

Michael



Michael Haenlein
Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help to improve existing R-Code

2011-05-27 Thread Michael Haenlein
I'm looking to hire someone -- sorry for not having been more precise!
Michael

On Fri, May 27, 2011 at 1:23 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote:

 On 11-05-27 3:23 AM, Michael Haenlein wrote:

 Dear all,

 I have written a relatively brief R-Code to run a series of simulations.
 Currently the code runs for a very long time (up to several days,
 depending
 on the conditions) and I expect this to be the case because it might not
 be
 very efficiently written. I am, for example, relying on several for(...)
 loops which could probably be done much faster using a different way of
 programming.

 I am looking for a consultant who could help me to improve my code. The
 idea
 is that I send the code to the person, s/he works on improving it and then
 sends the improved version back to me. I think for an experienced
 programmer
 the job should not take more than 2-3 days (probably less), but this is to
 be decided once the person has looked at the code.


 Your message is ambiguous:  are you asking for someone to volunteer 2-3
 days to help you, or are you trying to hire someone?

 Duncan Murdoch

  In case you are interested, please send me a brief message so that I can
 provide you with more details,

 Thanks,

 Michael



 Michael Haenlein
 Professor of Marketing
 ESCP Europe
 Paris, France

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Powerful PC to run R

2011-05-13 Thread Michael Haenlein
Dear all,

I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
calculations run for several days sometimes even weeks (mainly simulations
over a large parameter space). Depending on the external conditions, my
laptop sometimes shuts down due to overheating.

I'm now thinking about buying a more powerful desktop PC or laptop. Can
anybody advise me on the best configuration to run R as fast as possible? I
will use this PC exclusively for R so any other factors are of limited
importance.

Thanks,

Michael


Michael Haenlein
Assocaite Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Total effect of X on Y under presence of interaction effects

2011-05-11 Thread Michael Haenlein
Dear all,

this is probably more a statistics question than an R question but probably
there is somebody who can help me nevertheless.

I'm running a regression with four predictors (a, b, c, d) and all their
interaction effects using lm. Based on theory I assume that a influences y
positively. In my output (see below) I see, however, a negative regression
coefficient for a. But several of the interaction effects of a with b, c and
d have positive signs. I don't really understand this. Do I have to add up
the coefficient for the main effect and the ones of all interaction effects
to get a total effect of a on y? Or am I doing something wrong here?

Thanks very much for your answer in advance,

Regards,

Michael


Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France



Call:
lm(formula = y ~ a * b * c * d)

Residuals:
Min  1Q  Median  3Q Max
-44.919  -5.184   0.294   5.232 115.984

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  27.3067 0.8181  33.379   2e-16 ***
a   -11.0524 2.0602  -5.365 8.25e-08 ***
b-2.5950 0.4287  -6.053 1.47e-09 ***
c   -22.0025 2.8833  -7.631 2.50e-14 ***
d20.5037 0.3189  64.292   2e-16 ***
a:b  15.1411 1.1862  12.764   2e-16 ***
a:c  26.8415 7.2484   3.703 0.000214 ***
b:c   8.3127 1.5080   5.512 3.61e-08 ***
a:d   6.6221 0.8061   8.215 2.33e-16 ***
b:d  -2.0449 0.1629 -12.550   2e-16 ***
c:d  10.0454 1.1506   8.731   2e-16 ***
a:b:c 1.4137 4.1579   0.340 0.733862
a:b:d-6.1547 0.4572 -13.463   2e-16 ***
a:c:d   -20.6848 2.8832  -7.174 7.69e-13 ***
b:c:d-3.4864 0.6041  -5.772 8.05e-09 ***
a:b:c:d   5.6184 1.6539   3.397 0.000683 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.913 on 12272 degrees of freedom
Multiple R-squared: 0.8845, Adjusted R-squared: 0.8844
F-statistic:  6267 on 15 and 12272 DF,  p-value:  2.2e-16

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Testing equality of coefficients in coxph model

2011-04-12 Thread Michael Haenlein
Dear all,

I'm running a coxph model of the form:
coxph(Surv(Start, End, Death.ID) ~ x1 + x2 + a1 + a2 + a3)

Within this model, I would like to compare the influence of x1 and x2 on the
hazard rate.
Specifically I am interested in testing whether the estimated coefficient
for x1 is equal (or not) to the estimated coefficient for x2.

I was thinking of using a Chow-test for this but the Chow test appears to
work for linear regression only (see: http://en.wikipedia.org/wiki/Chow_test).
Another option I was thinking of is to estimate an alternative model in
which the coefficients for x1 and x2 are constraint to be equal and to
compare the fit of such a constraint model with the one of an unconstraint
one. But again I'm not sure how this can be done using coxph.

Could anyone help me out on this please?

Thanks,

Michael



Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Effect size in multiple regression

2011-03-26 Thread Michael Haenlein
Dear all,

is there a convenient way to determine the effect size for a regression
coefficient in a multiple regression model?
I have a model of the form lm(y ~ A*B*C*D) and would like to determine
Cohen's f2 (http://en.wikipedia.org/wiki/Effect_size) for each predictor
without having to do it manually.

Thanks,

Michael



Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] System of related regression equations

2011-02-22 Thread Michael Haenlein
Dear all,

I would like to estimate a system of regression equations of the following
form:

y1 = a1 + b1 x1 + b2x2 + e1
y2 = a2 + c1 y1 + c2 x2 + c3 x3 + e2

Specifically the dependent variable in Equation 1 appears as an independent
variable in Equation 2. Additionally some independent variables that appear
in Equation 1 are also included in Equation 2.

I assume that I cannot estimate these two regressions separately using lm.
Is there an efficient way to estimate these equations?

Thanks very much in advance,

Michael



Michael Haenlein
Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
Dear all,

I'm struggling with predicting expected time until death for a coxph and
survreg model.

I have two datasets. Dataset 1 includes a certain number of people for which
I know a vector of covariates (age, gender, etc.) and their event times
(i.e., I know whether they have died and when if death occurred prior to the
end of the observation period). Dataset 2 includes another set of people for
which I only have the covariate vector. I would like to use Dataset 1 to
calibrate either a coxph or survreg model and then use this model to
determine an expected time until death for the individuals in Dataset 2.
For example, I would like to know when a person in Dataset 2 will die, given
his/ her age and gender.

I checked predict.coxph and predict.survreg as well as the document A
Package for Survival Analysis in S written by Terry M. Therneau but I have
to admit that I'm a bit lost here.

Could anyone give me some advice on how this could be done?

Thanks very much in advance,

Michael



Michael Haenlein
Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
Thanks very much for your answers, David and Mattia.

I understand that the baseline hazard in a Cox model is unknown and that
this makes the calculation of expected survival difficult.
Does this change when I move to a survreg model instead?

I think I'm OK with estimating a Cox model (or a survreg model) as I've done
so in the past.
But I'm lost with the different options in the prediction part (e.g.,
linear, quantile, risk, expected, ...).
Is there any document that can provide an explanation what these options
mean?

Sorry in case these questions are naive ... hope they're not too stupd ;-)


On Thu, Nov 11, 2010 at 5:03 PM, Mattia Prosperi ahn...@gmail.com wrote:

 Indeed, from the predict() function of the coxph you cannot get
 directly time predictions, but only linear and exponential risk
 scores. This is because, in order to get the time, a baseline hazard
 has to be computed and it is not straightforward since it is implicit
 in the Cox model.

 2010/11/11 David Winsemius dwinsem...@comcast.net:
 
  On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:
 
  Dear all,
 
  I'm struggling with predicting expected time until death for a coxph
 and
  survreg model.
 
  I have two datasets. Dataset 1 includes a certain number of people for
  which
  I know a vector of covariates (age, gender, etc.) and their event times
  (i.e., I know whether they have died and when if death occurred prior to
  the
  end of the observation period). Dataset 2 includes another set of people
  for
  which I only have the covariate vector. I would like to use Dataset 1 to
  calibrate either a coxph or survreg model and then use this model to
  determine an expected time until death for the individuals in Dataset
 2.
  For example, I would like to know when a person in Dataset 2 will die,
  given
  his/ her age and gender.
 
  I checked predict.coxph and predict.survreg as well as the document A
  Package for Survival Analysis in S written by Terry M. Therneau but I
  have
  to admit that I'm a bit lost here.
 
  The first step would be creating a Surv-object, followed by running a
  regression that created a coxph-object,  using dataset1 as input. So you
  should be looking at:
 
  ?Surv
  ?coxph
 
  There are worked examples in the help pages. You would then run predict()
 on
  the coxph fit with dataset2 as the newdata argument. The default output
 is
  the linear predictor for the log-hazard relative to a mean survival
 estimate
  but other sorts of estimates are possible. The survfit function provides
  survival curve suitable for plotting.
 
  (You may want to inquire at a local medical school to find statisticians
 who
  have experience with this approach. This is ordinary biostatistics these
  days.)
 
  --
  David.
 
 
  Could anyone give me some advice on how this could be done?
 
  Thanks very much in advance,
 
  Michael
 
 
 
  Michael Haenlein
  Professor of Marketing
  ESCP Europe
  Paris, France
 
  David Winsemius, MD
  West Hartford, CT
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
Thanks for the comment, James!

The problem is that my initial sample (Dataset 1) is truncated. That means I
only observe time to death for those individuals who actually died before
end of my observation period. It is my understanding that this type of
truncation creates a bias when I use a normal regression analysis. Hence
my idea to use some form of survival model.

I had another look at predict.survreg and I think the option response
could work for me.
When I run the following code I get ptime = 290.3648.
I assume this means that an individual with ph.ecog=2 can be expected to
life another 290.3648 days before death occurs [days is the time scale of
the time variable).
Could someone confirm whether this makes sense?

lfit - survreg(Surv(time, status) ~ ph.ecog, data=lung)
ptime - predict(lfit, newdata=data.frame(ph.ecog=2), type='response')



On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger
james.whan...@gmail.comwrote:

 Michael,

 You are looking to compute an estimated time to death -- rather than the
 odds of death conditional upon time.  Thus, you will want to use time to
 death as your dependent variable rather than a dichotomous outcome (
 0=alive, 1=death).   You can accomplish this with a straight forward
 regression analysis.

 Best,

 Jim

 On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein 
 haenl...@escpeurope.euwrote:

 Dear all,

 I'm struggling with predicting expected time until death for a coxph and
 survreg model.

 I have two datasets. Dataset 1 includes a certain number of people for
 which
 I know a vector of covariates (age, gender, etc.) and their event times
 (i.e., I know whether they have died and when if death occurred prior to
 the
 end of the observation period). Dataset 2 includes another set of people
 for
 which I only have the covariate vector. I would like to use Dataset 1 to
 calibrate either a coxph or survreg model and then use this model to
 determine an expected time until death for the individuals in Dataset 2.
 For example, I would like to know when a person in Dataset 2 will die,
 given
 his/ her age and gender.

 I checked predict.coxph and predict.survreg as well as the document A
 Package for Survival Analysis in S written by Terry M. Therneau but I
 have
 to admit that I'm a bit lost here.

 Could anyone give me some advice on how this could be done?

 Thanks very much in advance,

 Michael



 Michael Haenlein
 Professor of Marketing
 ESCP Europe
 Paris, France

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 *James C. Whanger
 Research Consultant
 2 Wolf Ridge Gap
 Ledyard, CT  06339

 Phone: 860.389.0414*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
David, Mattia, James -- thanks so much for all your helpful comments!
I now have a much better understanding of how to calculate what I'm
interested in ... and what the risks are of doing so.
Thanks and all the best,
Michael


On Thu, Nov 11, 2010 at 7:33 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote:

  Thanks for the comment, James!

 The problem is that my initial sample (Dataset 1) is truncated. That means
 I
 only observe time to death for those individuals who actually died
 before
 end of my observation period. It is my understanding that this type of
 truncation creates a bias when I use a normal regression analysis. Hence
 my idea to use some form of survival model.

 I had another look at predict.survreg and I think the option response
 could work for me.
 When I run the following code I get ptime = 290.3648.
 I assume this means that an individual with ph.ecog=2 can be expected to
 life another 290.3648 days before death occurs [days is the time scale of
 the time variable).


 It is a prediction under specific assumptions underpinning a parametric
 estimate.


  Could someone confirm whether this makes sense?


 You ought to confirm that it makes sense by comparing to your data:
 reauire(Hmisc); require(survival)
 your code

  describe(lung[lung$status==1lung$ph.ecog==2,time])
 lung[lung$status == 1  lung$ph.ecog == 2, time]
  n missing  uniqueMean
  6   0   6   293.7

  92 105 211 292 511 551
 Frequency  1   1   1   1   1   1
 % 17  17  17  17  17  17

  ?lung

 So status==1 is a censored case and the observed times are status==2
  describe(lung[lung$status==2lung$ph.ecog==2,time])
 lung[lung$status == 2  lung$ph.ecog == 2, time]
  n missing  uniqueMean .05 .10 .25 .50 .75
 .90 .95
 44   1  44   226.0   14.95   36.90   94.50  178.50  295.75
  500.00  635.85

 lowest :  11  12  13  26  30, highest: 524 533 654 707 814

 And the mean time to death (in a group that had only 6 censored individual
 at times from 92 to 551)  was 226 and median time to death among 44
 individuals is 178 with a right skewed distribution. You need to decide
 whether you want to make that particular prediction when you know that you
 forced a specific distributional form on the regression machinery by
 accepting the default.




 lfit - survreg(Surv(time, status) ~ ph.ecog, data=lung)
 ptime - predict(lfit, newdata=data.frame(ph.ecog=2), type='response')



 On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger
 james.whan...@gmail.comwrote:

  Michael,

 You are looking to compute an estimated time to death -- rather than the
 odds of death conditional upon time.  Thus, you will want to use time to
 death as your dependent variable rather than a dichotomous outcome (
 0=alive, 1=death).   You can accomplish this with a straight forward
 regression analysis.

 Best,

 Jim

 On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein 
 haenl...@escpeurope.euwrote:

  Dear all,

 I'm struggling with predicting expected time until death for a coxph
 and
 survreg model.

 I have two datasets. Dataset 1 includes a certain number of people for
 which
 I know a vector of covariates (age, gender, etc.) and their event times
 (i.e., I know whether they have died and when if death occurred prior to
 the
 end of the observation period). Dataset 2 includes another set of people
 for
 which I only have the covariate vector. I would like to use Dataset 1 to
 calibrate either a coxph or survreg model and then use this model to
 determine an expected time until death for the individuals in Dataset
 2.
 For example, I would like to know when a person in Dataset 2 will die,
 given
 his/ her age and gender.

 I checked predict.coxph and predict.survreg as well as the document A
 Package for Survival Analysis in S written by Terry M. Therneau but I
 have
 to admit that I'm a bit lost here.

 Could anyone give me some advice on how this could be done?

 Thanks very much in advance,

 Michael



 Michael Haenlein
 Professor of Marketing



 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregating data from two data frames

2010-09-08 Thread Michael Haenlein
Dear all,

I'm working with two data frames.

The first frame (agg_data) consists of two columns. agg_data[,1] is a unique
ID for each row and agg_data[,2] contains a continuous variable.

The second data frame (geo_data) consists of several columns. One of these
columns (geo_data$ZCTA) corresponds to the unique ID in the first data
frame. The problem is that only a subset of the unique ID present in the
first data frame also appears in the second data fame.

What I would like to do is to add another column to the second data frame
(geo_data) that includes the value of the continuous variable from the first
frame that corresponds to the unique ID. To put it differently, I want R to
look at each row in the second data frame, look for the unique ID
(geo_data$ZCTA), look for the same unique ID in the first data frame and
then paste the value from the continous variable as a new column into the
second data frame. I hope I'm somewhat clear here ...

Is there a convenient way of doing this?

Thanks very much in advance,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Collinearity in Moderated Multiple Regression

2010-08-03 Thread Michael Haenlein
Dear all,

I have one dependent variable y and two independent variables x1 and x2
which I would like to use to explain y. x1 and x2 are design factors in an
experiment and are not correlated with each other. For example assume that:

x1 - rbind(1,1,1,2,2,2,3,3,3)
x2 - rbind(1,2,3,1,2,3,1,2,3)
cor(x1,x2)

The problem is that I do not only want to analyze the effect of x1 and x2 on
y but also of their interaction x1*x2. Evidently this interaction term has a
substantial correlation with both x1 and x2:

x3 - x1*x2
cor(x1,x3)
cor(x2,x3)

I therefore expect that a simple regression of y on x1, x2 and x1*x2 will
lead to biased results due to multicollinearity. For example, even when y is
completely random and unrelated to x1 and x2, I obtain a substantial R2 for
a simple linear model which includes all three variables. This evidently
does not make sense:

y - rnorm(9)
model - lm (y ~ x1 + x2 + x1*x2)
summary(model)

Is there some function within R or in some separate library that allows me
to estimate such a regression without obtaining inconsistent results?

Thanks for your help in advance,

Michael


Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Collinearity in Moderated Multiple Regression

2010-08-03 Thread Michael Haenlein
Thanks for your comment!
Actually, they are continuous variables which have a very low correlation --
I just wanted to make the whole story easier for explanation.

My general question is: Does R offer an alternative to lm for situations
where there is substantial collinearity between the independent variables?
I have found the perturb package, but this seems to be focused on
identifying collinearity not on dealing with it.

Thanks,

Michael


On Tue, Aug 3, 2010 at 3:25 PM, Nikhil Kaza nikhil.l...@gmail.com wrote:

 Are x1 and x2 are factors (dummy variables)? cor does not make sense in
 this case.

 Nikhil Kaza
 Asst. Professor,
 City and Regional Planning
 University of North Carolina

 nikhil.l...@gmail.com


 On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote:

  Dear all,

 I have one dependent variable y and two independent variables x1 and x2
 which I would like to use to explain y. x1 and x2 are design factors in an
 experiment and are not correlated with each other. For example assume
 that:

 x1 - rbind(1,1,1,2,2,2,3,3,3)
 x2 - rbind(1,2,3,1,2,3,1,2,3)
 cor(x1,x2)

 The problem is that I do not only want to analyze the effect of x1 and x2
 on
 y but also of their interaction x1*x2. Evidently this interaction term has
 a
 substantial correlation with both x1 and x2:

 x3 - x1*x2
 cor(x1,x3)
 cor(x2,x3)

 I therefore expect that a simple regression of y on x1, x2 and x1*x2 will
 lead to biased results due to multicollinearity. For example, even when y
 is
 completely random and unrelated to x1 and x2, I obtain a substantial R2
 for
 a simple linear model which includes all three variables. This evidently
 does not make sense:

 y - rnorm(9)
 model - lm (y ~ x1 + x2 + x1*x2)
 summary(model)

 Is there some function within R or in some separate library that allows me
 to estimate such a regression without obtaining inconsistent results?

 Thanks for your help in advance,

 Michael


 Michael Haenlein
 Associate Professor of Marketing
 ESCP Europe
 Paris, France

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Collinearity in Moderated Multiple Regression

2010-08-03 Thread Michael Haenlein
Thanks very much -- it seems that Ridge Regression can do what I'm looking
for!
Best,
Michael



-Original Message-
From: Nikhil Kaza [mailto:nikhil.l...@gmail.com] 
Sent: Tuesday, August 03, 2010 16:21
To: haenl...@gmail.com
Cc: r-help@r-project.org (r-help@R-project.org)
Subject: Re: [R] Collinearity in Moderated Multiple Regression

My usual strategy of dealing with multicollinearity is to drop the offending
variable or transform one them. I would also check vif functions in car and
Design.

I think you are looking for lm.ridge in MASS package.


Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina

nikhil.l...@gmail.com

On Aug 3, 2010, at 9:51 AM, haenl...@gmail.com wrote:

 I'm sorry -- I think I chose a bad example. Let me start over again:

 I want to estimate a moderated regression model of the following form:
 y = a*x1 + b*x2 + c*x1*x2 + e

 Based on my understanding, including an interaction term (x1*x2) into 
 the regression in addition to x1 and x2 leads to issues of 
 multicollinearity, as x1*x2 is likely to covary to some degree with x1 
 (and x2). One recommendation I have seen in this context is to use 
 mean centering, but apparently this does not solve the problem (see: 
 Echambadi, Raj and James D. Hess (2007), Mean-centering does not 
 alleviate collinearity problems in moderated multiple regression 
 models, Marketing science, 26 (3),
 438 -
 45). So my question is: Which R function can I use to estimate this 
 type of model.

 Sorry for the confusion caused due to my previous message,

 Michael






 On Aug 3, 2010 3:42pm, David Winsemius dwinsem...@comcast.net wrote:
 I think you are attributing to collinearity a problem that is due 
 to your small sample size. You are predicting 9 points with 3 
 predictor terms, and incorrectly concluding that there is some 
 inconsistency
 because you get an R^2 that is above some number you deem surprising. 
 (I got values between 0.2 and 0.4 on several runs.



 Try:

 x1
 x2
 x3


 y
 model
 summary(model)



 # Multiple R-squared: 0.04269



 --

 David.



 On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote:




 Dear all,



 I have one dependent variable y and two independent variables x1 and 
 x2

 which I would like to use to explain y. x1 and x2 are design factors 
 in an

 experiment and are not correlated with each other. For example assume
 that:



 x1
 x2
 cor(x1,x2)



 The problem is that I do not only want to analyze the effect of x1 
 and x2 on

 y but also of their interaction x1*x2. Evidently this interaction 
 term has a

 substantial correlation with both x1 and x2:



 x3
 cor(x1,x3)

 cor(x2,x3)



 I therefore expect that a simple regression of y on x1, x2 and
 x1*x2 will

 lead to biased results due to multicollinearity. For example, even 
 when y is

 completely random and unrelated to x1 and x2, I obtain a substantial 
 R2 for

 a simple linear model which includes all three variables. This 
 evidently

 does not make sense:



 y
 model
 summary(model)



 Is there some function within R or in some separate library that 
 allows me

 to estimate such a regression without obtaining inconsistent results?



 Thanks for your help in advance,



 Michael





 Michael Haenlein

 Associate Professor of Marketing

 ESCP Europe

 Paris, France



 [[alternative HTML version deleted]]



 __

 R-help@r-project.org mailing list

 https://stat.ethz.ch/mailman/listinfo/r-help

 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.




 David Winsemius, MD

 West Hartford, CT




   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Collinearity in Moderated Multiple Regression

2010-08-03 Thread Michael Haenlein
 regression models, 
  Marketing science, 26 (3), 438 - 45). So my question is: Which R 
  function can I use to estimate this type of model.
 

  Sorry for the confusion caused due to my previous message,
 
  Michael
 
 
 
 
 
 
  On Aug 3, 2010 3:42pm, David Winsemius dwinsem...@comcast.net wrote:
   I think you are attributing to collinearity a problem that is 
   due to your small sample size. You are predicting 9 points with 3 
   predictor terms, and incorrectly concluding that there is some
inconsistency
   because you get an R^2 that is above some number you deem 
   surprising. (I got values between 0.2 and 0.4 on several runs.
 
 
 
   Try:
 
   x1
   x2
   x3
 
 
   y
   model
   summary(model)
 
 
 
   # Multiple R-squared: 0.04269
 
 
 
   --
 
   David.
 
 
 
   On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote:
 
 
 
 
   Dear all,
 
 
 
   I have one dependent variable y and two independent variables x1 
   and x2
 
   which I would like to use to explain y. x1 and x2 are design 
   factors in
  an
 
   experiment and are not correlated with each other. For example 
   assume
   that:
 
 
 
   x1
   x2
   cor(x1,x2)
 
 
 
   The problem is that I do not only want to analyze the effect of x1 
   and x2 on
 
   y but also of their interaction x1*x2. Evidently this interaction 
   term has a
 
   substantial correlation with both x1 and x2:
 
 
 
   x3
   cor(x1,x3)
 
   cor(x2,x3)
 
 
 
   I therefore expect that a simple regression of y on x1, x2 and 
   x1*x2 will
 
   lead to biased results due to multicollinearity. For example, even 
   when y is
 
   completely random and unrelated to x1 and x2, I obtain a 
   substantial R2 for
 
   a simple linear model which includes all three variables. This 
   evidently
 
   does not make sense:
 
 
 
   y
   model
   summary(model)
 
 
 
   Is there some function within R or in some separate library that 
   allows
  me
 
   to estimate such a regression without obtaining inconsistent results?
 
 
 
   Thanks for your help in advance,
 
 
 
   Michael
 
 
 
 
 
   Michael Haenlein
 
   Associate Professor of Marketing
 
   ESCP Europe
 
   Paris, France
 
 
 
   [[alternative HTML version deleted]]
 
 
 
   __
 
   R-help@r-project.org mailing list
 
   https://stat.ethz.ch/mailman/listinfo/r-help
 
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
 
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
   David Winsemius, MD
 
   West Hartford, CT
 
 
 
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time-dependent covariates in survreg function

2010-07-28 Thread Michael Haenlein
Dear all,

I'm asking this question again as I didn't get a reply last time:

I'm doing a survival analysis with time-dependent covariates. Until now,
I have used a simple Cox model for this, specifically the coxph function
from the survival library. Now, I would like to try out an accelerated
failure time model with a parametric specification as implemented for
example in the survreg function.

Two questions: First, can survreg handle time-dependent covariates?
The description for this function does not make reference to them. And
second, in case survreg cannot deal with time-dependent covariates, is there
a similar function in some other package that can?

Thanks very much,

Michael


Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time-dependent covarites in survreg function

2010-07-26 Thread Michael Haenlein
Dear all,

I'm doing a survival analysis with time-dependent covariates. Until now, I
have used a simple Cox model for this, specifically the coxph function from
the survival library. Now, I would like to try out an accelerated failure
time model with a parametric specification as implemented for example in the
survreg function.

Two questions: First, can survreg handle time-dependent covariates? The
description for this function does not make reference to them. And second,
in case survreg cannot deal with time-dependent covariates, is there a
similar function in some other package that can?

Thanks very much,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Equivalent to go-to statement

2010-07-24 Thread Michael Haenlein
Dear all,

I'm working with a code that consists of two parts: In Part 1 I'm generating
a random graph using the igraph library (which represents the relationships
between different nodes) and a vector (which represents a certain
characteristic for each node):

library(igraph)
g - watts.strogatz.game(1,100,5,0.05)
z - rlnorm(100,0,1)

In Part 2 I'm iteratively changing the elements of z in order to reach a
certain value of a certain target variable. I'm doing this using a while
statement:

while (target_variable  threshold) {## adapt z}

The problem is that in some rare cases this iterative procedure can take
very long (a couple of million of iterations), depending on the specific
structure of the graph generated in Part 1. I therefore would like to change
Part 2 of my code in the sense that once a certain threshold number of
iterations has been achieved, the iterative process in Part 2 stops and goes
back to Part 1 to generate a new graph structure. So my idea is as follows:

- Run Part 1 and generate g and z
- Run Part 2 and iteratively modify z to maximize the target variable
- If Part 2 can be obtained in less than X steps, then go to Part 3
- If Part 2 takes more than X steps then go back to Part 1 and start again

I think that R does not have a function like go-to or go-back.

Does anybody know of a convenient way of doing this?

Thanks very much for your help,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Consultant for Mathematica - R translation

2010-07-20 Thread Michael Haenlein
Dear all,

I have a very short code written in Mathematica which I would need to get
translated for use in R.

I'm not an expert in Mathematica (which is why I would not
feel comfortable with doing the translation myself), but the code is very
short (probably 30-40 lines) and looks quite simple from my perspective.

Anyone who would be interested in taking over this job, please get in touch
with me so that we can agree on terms  conditions.

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Printing status updates in while-loop

2010-07-14 Thread Michael Haenlein
Dear all,

I'm using a while loop in the context of an iterative optimization
procedure. Within my while loop I have a counter variable that helps me to
determine how long the loop has been running. Before the loop I initialize
it as counter - 0 and the last condition within my loop is counter -
counter + 1.

I'd like to print out the current status of counter while the loop is
running to know where the optimization routine is standing. I tried to do so
by adding print(counter) within the while loop. This does however not seem
to work as instead of printing regular updates all print commands are
executed only after the loop is finished.

Is there some easy way to print regular status updates while the while loop
is still running?

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Import graph object

2010-07-14 Thread Michael Haenlein
Dear all,

I have a txt file of the following format that describes the relationships
between a network of a certain number of nodes.

{4, 2, 3}
{3, 4, 1}
{4, 2, 1}
{2, 1, 3}
{2, 3}
{}
{2, 5, 1}
{3, 5, 4}
{3, 4}
{2, 5, 3}

For example the first line {4, 2, 3} implies that there is a connection
between Node 1 and Node 4, a connection between Node 1 and Node 2 and a
connection between Node 1 and Node 3. The second line {3, 4, 1} implies that
there is a connection between Node 2 and Node 3 as well as Node 4 and Node
1. Note that some of the nodes can be isolated (i.e., not have any
connections to any other node) which is then indicated by {}. Also note that
the elements in each row are not necessarily ordered (i.e., {4, 2, 3}
instead of {2, 3, 4}). I would like to (a) read the txt file into R and (b)
convert it to an adjacency matrix. For example the adjacency matrix
corresponding to the aforementioned example is as follows:

0 1 1 1 0 0 0 0 0 0
1 0 1 1 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 1 0 0 1 0 0 0 0 0
0 0 1 1 1 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0
0 1 1 0 1 0 0 0 0 0

Is there any convenient way of doing this?

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Batch file export

2010-07-13 Thread Michael Haenlein
Dear all,

I have a code that generates data vectors within R. For example assume:
z - rlnorm(1000, meanlog = 0, sdlog = 1)

Every time a vector has been generated I would like to export it into a csv
file. So my idea is something as follows:

for (i in 1:100) {
z - rlnorm(1000, meanlog = 0, sdlog = 1)
write.csv(z, c:/z_i.csv)

Where z_i.csv is a filename that is related to the run (e.g. z_001.csv,
z_002.csv, ...).

Could anyone please advice me on the most convenient way of doing this?

Thanks very much in advance,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Convert Mathematica code into R

2010-07-12 Thread Michael Haenlein
Dear all,

I have a reasonably short piece of code written in Mathematica 5.2 which I
would like to convert to R. The problem is that I'm not familiar with
Mathematica. I would, however, also be OK with some interface that allows me
to run Mathematica from within R and use the output of the Mathematica for
further analysis within R. Any advice on how to conveniently convert the
code or on how to run Mathematica from within R?

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interpreting output of coxph with frailty.gamma

2010-04-26 Thread Michael Haenlein
Dear all,

this is probably a very silly question, but could anyone tell me what the
different parameters in a coxph model with a frailty.gamma term mean?

Specifically I have two questions:

(1) Compared to a normal coxph model, it seems that I obtain two standard
errors [se(coef) and se2].
What is the difference between those?

(2) Again compared to a normal coxph model, the z/p-values are replaced by
a chi-squared test (Chisq, DF, p).
What is the reason for this? Does a standard z-test not work once a frailty
term is included?

Thanks very much for your help in advance,

Michael




Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Convert character string to top levels + NAN

2010-04-22 Thread Michael Haenlein
Dear all,

I have several character strings with a high number of different levels.
unique(x) gives me values in the range of 100-200.
This creates problems as I would like to use them as predictors in a coxph
model.

I therefore would like to convert each of these strings to a new string
(x_new).
x_new should be equal to x for the top n categories (i.e. the top n levels
with the highest occurrence) and NAN elsewhere.
For example, for n=3 x_new would have three levels: The three most common
levels of x + NAN.

Is there some convenient way of doing this?

Thanks in advance,

Michael


Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help for programming a short code in R

2010-02-14 Thread Michael Haenlein
Dear all,

I'm looking for a person who could help me to program a short code in R. The
code involves Bayesian analysis so some familiarity with WinBUGS or another
package/ software dealing with Bayesian estimation would be helpful. 

I have an academic paper in which the code is described (Abe, M. (2009),
Counting your customers one by one: A hierarchical Bayes extension to the
Pareto/NBD model, Marketing science, Vol. 28 No. 3, pp. 541 - 53) as well
as one of the datasets mentioned in this manuscript to test the code. My
assumption is that the job does not take very long -- although I cannot give
a precise estimate of the number of hours required.

If anyone is interested, please let me know and I can send you an electronic
copy of the manuscript mentioned above.

Best,

Michael




Michael Haenlein
Professor of Marketing
ESCP Europe - The School of Management for Europe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.