Re: [R] mixed-effects models with (g)lmer in R and model selection

2016-02-19 Thread Bert Gunter
Absolutely!  Even more, consult a local expert in applying mixed effects
models. The op's strategy sounded to me like a prescription to produce
irreproducible results (due to over fitting).

Cheers,
Bert



On Friday, February 19, 2016, Don McKenzie  wrote:

> This is a complicated and subtle statistical issue, not an R question, the
> latter being the purpose of this list.  There are people on the list who
> could give you literate answers,
> to be sure, but a statistically oriented list would be a better match.
>
> e.g.,
>
> http://stats.stackexchange.com/
>
>
> > On Feb 19, 2016, at 5:01 AM, Wilbert Heeringa  > wrote:
> >
> > Dear all,
> >
> > Mixed-effects models are wonderful for analyzing data, but it is always a
> > hassle to find the best model, i.e. the model with the lowest AIC,
> > especially when the number of predictor variables is large.
> >
> > Presently when trying to find the right model, I perform the following
> > steps:
> >
> >   1.
> >
> >   Start with a model containing all predictors. Assuming dependent
> >   variable X and predictors A, B, C, D, E, I start with: X~A+B+C+D+E
> >   2.
> >
> >   Lmer warns that is has dropped columns/coefficients. These are
> variables
> >   which have a *perfect* correlation with any of the other variables or
> >   with a combination of variables. With summary() it can be found which
> >   columns have been dropped. Assume predictor D has been dropped, I
> continue
> >   with this model: X~A+B+C+E
> >   3.
> >
> >   Subsequently I need to check whether there are variables (or groups of
> >   variables) which *strongly* corrrelate to each other. I included the
> >   function vif.mer (developed by Austin F. Frank and available at:
> >   https://raw.github.com/aufrank/R-hacks/master/mer-utils.R) in my
> script,
> >   and when applying this function to my reduced model, I got vif values
> for
> >   each of the variables. When vif>5 for a predictor, it probably should
> be
> >   removed. In case multiple variables have a vif>5, I first remove the
> >   predictor with the highest vif, then re-run lmer en vif.mer. I remove
> again
> >   the predictor with highest vif (if one or more predictors have still a
> >   vif>5), and I repeat this until none of the remaining predictors has a
> >   vif>5. In case I got a warning "Model failed to converge" in the larger
> >   model(s), this warning does not appear any longer in the 'cleaned'
> model.
> >   4.
> >
> >   Assume the following predictors have survived: A, B en E. Now I want to
> >   find the combination of predictors that gives the smallest AIC. For
> three
> >   predictors it is easy to try all combinations, but if it would have
> been 10
> >   predictors, manually trying all combinations would be time-consuming.
> So I
> >   used the function fitLMER.fnc from the LMERConvenienceFunctions
> package.
> >   This function back fit fixed effects, forward fit random effects, and
> >   re-back fit fixed effects. I consider the model given by fitLMER.fnc
> as the
> >   right one.
> >
> > I am not an expert in mixed-effects models and have struggled with model
> > selection. I found the procedure which I decribed working, but I would
> > really be appreciate to hear whether the procedure is sound, or whether
> > there are better alternatives.
> >
> > Best,
> >
> > Wilbert
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org  mailing list -- To UNSUBSCRIBE and
> more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> __
> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and
> more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixed-effects models with (g)lmer in R and model selection

2016-02-19 Thread Jianling Fan
Hello, Wilbert,

You did give a good procedure for lme model selection! thanks! I learn some.
I am also working on similar problem recently, maybe you can take a
look at "glmmLasso" package, which allows model selection in
generalized linear mixed effects models using the LASSO shrinkage
method.


Regards,

Jianling

On 19 February 2016 at 07:01, Wilbert Heeringa  wrote:
> Dear all,
>
> Mixed-effects models are wonderful for analyzing data, but it is always a
> hassle to find the best model, i.e. the model with the lowest AIC,
> especially when the number of predictor variables is large.
>
> Presently when trying to find the right model, I perform the following
> steps:
>
>1.
>
>Start with a model containing all predictors. Assuming dependent
>variable X and predictors A, B, C, D, E, I start with: X~A+B+C+D+E
>2.
>
>Lmer warns that is has dropped columns/coefficients. These are variables
>which have a *perfect* correlation with any of the other variables or
>with a combination of variables. With summary() it can be found which
>columns have been dropped. Assume predictor D has been dropped, I continue
>with this model: X~A+B+C+E
>3.
>
>Subsequently I need to check whether there are variables (or groups of
>variables) which *strongly* corrrelate to each other. I included the
>function vif.mer (developed by Austin F. Frank and available at:
>https://raw.github.com/aufrank/R-hacks/master/mer-utils.R) in my script,
>and when applying this function to my reduced model, I got vif values for
>each of the variables. When vif>5 for a predictor, it probably should be
>removed. In case multiple variables have a vif>5, I first remove the
>predictor with the highest vif, then re-run lmer en vif.mer. I remove again
>the predictor with highest vif (if one or more predictors have still a
>vif>5), and I repeat this until none of the remaining predictors has a
>vif>5. In case I got a warning "Model failed to converge" in the larger
>model(s), this warning does not appear any longer in the 'cleaned' model.
>4.
>
>Assume the following predictors have survived: A, B en E. Now I want to
>find the combination of predictors that gives the smallest AIC. For three
>predictors it is easy to try all combinations, but if it would have been 10
>predictors, manually trying all combinations would be time-consuming. So I
>used the function fitLMER.fnc from the LMERConvenienceFunctions package.
>This function back fit fixed effects, forward fit random effects, and
>re-back fit fixed effects. I consider the model given by fitLMER.fnc as the
>right one.
>
> I am not an expert in mixed-effects models and have struggled with model
> selection. I found the procedure which I decribed working, but I would
> really be appreciate to hear whether the procedure is sound, or whether
> there are better alternatives.
>
> Best,
>
> Wilbert
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create an executable file from R GUI?

2016-02-19 Thread Adrian Dușa
Your function, buildGui(), what does it use, Tcl/Tk or something else?
If it's Tcl/Tk, I believe you need a normal R console opened. My .bat file
only works for a command line, which is fine if the user interface opens up
in a webpage, but I'm pretty sure it doesn\t work with Tcl/Tk.
What kind of window does your function return?
Adrian

On Fri, Feb 19, 2016 at 4:35 PM, Zahra Samadi  wrote:

> Adriana,
> My GUI file is a function returning a window. This function is named
> buildGui(). How should I create this batch file using the piece of code
> you've written?
>
> --
> * From: * Adrian Dușa ;
> * To: * Greg Snow <538...@gmail.com>;
> * Cc: * simon0...@yahoo.com ; r-help@r-project.org <
> r-help@r-project.org>;
> * Subject: * Re: [R] How to create an executable file from R GUI?
> * Sent: * Thu, Feb 18, 2016 9:03:52 PM
>
> Simon, Greg,
>
> That is the very reason why I've given up on Tck/Tk, in favor of shiny.
> The user interface opens up in a webpage, without opening the normal R
> console (it only opens a Terminal window).
>
> To exemplify, package QCAGUI has a function called runGUI(), and on
> Windows it's a simple matter of creating a .bat file,
> which for my user interface it only contains this:
>
> CLS
>
> TITLE QCA Qualitative Comparative Analysis
>
> C:/PROGRA~1/R/R-3.2.3/bin/R.exe --slave --no-restore -e
> "setwd('D:/');QCAGUI::runGUI()"
>
>
> The double click on the .bat file, and that's it.
> I hope it helps,
> Adrian
>
>
>
> On Thu, Feb 18, 2016 at 7:24 PM, Greg Snow <538...@gmail.com> wrote:
>
>> To give a full answer we need some more detail from you.  For example
>> what operating system are you on? what do you mean by "users click on
>> it"? and at what point do you want them to click (after running R,
>> when looking at the desktop, etc.)
>>
>> But to help get you started you may want to look at the help page
>> `?Startup` which tells you all the things that R does as it starts up
>> and how to have it run commands automatically as it is starting up.
>>
>> I have created some GUI examples in the past that clients then wanted
>> to have on their own computer to play with and demonstrate to others.
>> I usually would install R on their machine for them and create a
>> shortcut on the desktop (these were all MS Windows computers) that
>> pointed to the standard R executable, but started in a specific
>> directory/folder.  Then in that folder I created a ".Rprofile" file
>> with the commands to load in the appropriate data and packages and run
>> the gui demonstration.  The user could then double click on the
>> shortcut on the desktop and 2 windows would pop up (the regular R
>> interface and my gui demo), I instructed the client to just minimize
>> and ignore the regular R window and they were then able to use my demo
>> and then close everything when they were finished.  You could do
>> something similar (but exactly how will differ between Windows, Mac,
>> and Linux computers).
>>
>> On Thu, Feb 18, 2016 at 9:27 AM, simon0098--- via R-help
>>  wrote:
>> > Hi,
>> > I've created a GUI using RGtk2 package. How can I make an executable
>> file from my R script so that users click on it and the GUI appears for
>> them?
>> >
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> 538...@gmail.com
>>
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr.90
> 050663 Bucharest sector 5
> Romania
>



-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How a clustering algorithm in R can end up with negative silhouette values?

2016-02-19 Thread ABABAEI, Behnam
Hi Sarah,

Thank you for the response. But it is said in its description that after each 
run (sample), each observation in the whole dataset is assigned to the closest 
cluster. So how is it possible for one observation to be wrongly allocated, 
even with clara?

Behnam

Behnam



On Fri, Feb 19, 2016 at 11:48 AM -0800, "Sarah Goslee" 
> wrote:

That means that points have been assigned to the wrong groups. This
may readily happen with a clustering method like cluster::clara() that
uses a subset of the data to cluster a dataset too large to analyze as
a unit. Negative silhouette numbers strongly suggest that your
clustering parameters should be changed.

Sarah

On Fri, Feb 19, 2016 at 6:33 AM, ABABAEI, Behnam
 wrote:
> Hi,
>
>
> We know that clustering methods in R assign observations to the closest 
> medoids. Hence, it is supposed to be the closest cluster each observation can 
> have. So, I wonder how it is possible to have negative values of silhouette , 
> while we are supposedly assign each observation to the closest cluster and 
> the formula in silhouette method cannot get negative?
>
>
> Behnam.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixed-effects models with (g)lmer in R and model selection

2016-02-19 Thread Don McKenzie
This is a complicated and subtle statistical issue, not an R question, the 
latter being the purpose of this list.  There are people on the list who could 
give you literate answers,
to be sure, but a statistically oriented list would be a better match.

e.g., 

http://stats.stackexchange.com/


> On Feb 19, 2016, at 5:01 AM, Wilbert Heeringa  wrote:
> 
> Dear all,
> 
> Mixed-effects models are wonderful for analyzing data, but it is always a
> hassle to find the best model, i.e. the model with the lowest AIC,
> especially when the number of predictor variables is large.
> 
> Presently when trying to find the right model, I perform the following
> steps:
> 
>   1.
> 
>   Start with a model containing all predictors. Assuming dependent
>   variable X and predictors A, B, C, D, E, I start with: X~A+B+C+D+E
>   2.
> 
>   Lmer warns that is has dropped columns/coefficients. These are variables
>   which have a *perfect* correlation with any of the other variables or
>   with a combination of variables. With summary() it can be found which
>   columns have been dropped. Assume predictor D has been dropped, I continue
>   with this model: X~A+B+C+E
>   3.
> 
>   Subsequently I need to check whether there are variables (or groups of
>   variables) which *strongly* corrrelate to each other. I included the
>   function vif.mer (developed by Austin F. Frank and available at:
>   https://raw.github.com/aufrank/R-hacks/master/mer-utils.R) in my script,
>   and when applying this function to my reduced model, I got vif values for
>   each of the variables. When vif>5 for a predictor, it probably should be
>   removed. In case multiple variables have a vif>5, I first remove the
>   predictor with the highest vif, then re-run lmer en vif.mer. I remove again
>   the predictor with highest vif (if one or more predictors have still a
>   vif>5), and I repeat this until none of the remaining predictors has a
>   vif>5. In case I got a warning "Model failed to converge" in the larger
>   model(s), this warning does not appear any longer in the 'cleaned' model.
>   4.
> 
>   Assume the following predictors have survived: A, B en E. Now I want to
>   find the combination of predictors that gives the smallest AIC. For three
>   predictors it is easy to try all combinations, but if it would have been 10
>   predictors, manually trying all combinations would be time-consuming. So I
>   used the function fitLMER.fnc from the LMERConvenienceFunctions package.
>   This function back fit fixed effects, forward fit random effects, and
>   re-back fit fixed effects. I consider the model given by fitLMER.fnc as the
>   right one.
> 
> I am not an expert in mixed-effects models and have struggled with model
> selection. I found the procedure which I decribed working, but I would
> really be appreciate to hear whether the procedure is sound, or whether
> there are better alternatives.
> 
> Best,
> 
> Wilbert
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why CLARA clustering method does not give the same classes as when I do clustering manually?

2016-02-19 Thread Sarah Goslee
clara() is a version of pam() adapted to use large datasets.

pam() uses the entire dataset, and should give results identical to
your manual procedure, or nearly so. clara() works on subsets of the
data, so it may give a slightly different result each time you run it.

The default parameters for clara() are very small, so you can get
substantially different results from run to run on a large dataset if
you don't change them.

Sarah

On Fri, Feb 19, 2016 at 6:30 AM, ABABAEI, Behnam
 wrote:
> Hi,
>
>
> I am using CLARA (in 'cluster' package). This method is supposed to assign 
> each observation to the closest 'medoid'. But when I calculate the distance 
> of medoids and observations manually and assign them manually, the results 
> are slightly different (1-2 percent of occurrence probability). Does anyone 
> know how clara calculates dissimilarities and why I get different clustering 
> results?
>
>
> Behnam.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create an executable file from R GUI?

2016-02-19 Thread Zahra Samadi
Adriana, 
My GUI file is a function returning a window. This function is named 
buildGui(). How should I create this batch file using the piece of code you've 
written?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How a clustering algorithm in R can end up with negative silhouette values?

2016-02-19 Thread Sarah Goslee
Ah, my guess about the confusion was wrong, then. You're
misunderstanding silhouette() instead.

>From ?silhouette:

 Observations with a large s(i) (almost 1) are very well clustered,
 a small s(i) (around 0) means that the observation lies between
 two clusters, and observations with a negative s(i) are probably
 placed in the wrong cluster.


In more detail, they're looking at different things.
clara() assigns each point to a cluster based on the distance to the
nearest medoid.

silhouette() does something different: instead of comparing the
distances to the closest medoid and the next closest medoid, which is
what you seem to be assuming, silhouette() looks at the mean distance
to ALL other points assigned to that cluster, vs the mean distance to
all points in other clusters. The distance to the medoid is
irrelevant, except as it is one of the points in that cluster.

So a negative silhouette value is entirely possible, and means that
the cluster produced doesn't represent the dataset very well.



On Fri, Feb 19, 2016 at 3:04 PM, ABABAEI, Behnam
 wrote:
> Sarah,
> sorry for taking up your time.
>
> I totally agree with you about how it works. But please let's take a look at 
> this part of the description:
>
> "Once k representative objects have been selected from the sub-dataset, each 
> observation of the entire dataset is assigned to the nearest medoid. The mean 
> (equivalent to the sum) of the dissimilarities of the observations to their 
> closest medoid is used as a measure of the quality of the clustering. The 
> sub-dataset for which the mean (or sum) is minimal, is retained. A further 
> analysis is carried out on the final partition."
>
> It says each observation is finally assigned to the closest medoid. The whole 
> clustering process may be imperfect in terms of isolation of clusters, but 
> each observation is already assigned to the closest one and according to the 
> silhouette formula, the silhouette value cannot be negative, as a must be 
> always less than b.
>
> Regards,
> Behnam.
>
> 
> From: Sarah Goslee 
> Sent: 19 February 2016 20:58
> To: ABABAEI, Behnam
> Cc: r-help@r-project.org
> Subject: Re: [R] How a clustering algorithm in R can end up with negative 
> silhouette values?
>
> You need to think more carefully about the details of the clara() method.
>
> The algorithm draws repeated samples of sampsize from the larger
> dataset, as specified by the arguments to the function.
> It clusters each sample in turn, and saves the best one.
> It uses the medoids from the best one to assign all of the points to a 
> cluster.
>
> But because the clustering is based on a subsample, it may not be
> representative of the dataset as a whole, and may not provide a good
> clustering overall. Just because it clusters the subsample well,
> doesn't mean it clusters the entirety. The details section of the help
> describes this, and the book references goes into more detail.
>
> Sarah
>
>
>
> On Fri, Feb 19, 2016 at 2:55 PM, ABABAEI, Behnam
>  wrote:
>> Hi Sarah,
>>
>> Thank you for the response. But it is said in its description that after
>> each run (sample), each observation in the whole dataset is assigned to the
>> closest cluster. So how is it possible for one observation to be wrongly
>> allocated, even with clara?
>>
>> Behnam
>>
>> Behnam
>>
>>
>>
>>
>> On Fri, Feb 19, 2016 at 11:48 AM -0800, "Sarah Goslee"
>>  wrote:
>>
>> That means that points have been assigned to the wrong groups. This
>> may readily happen with a clustering method like cluster::clara() that
>> uses a subset of the data to cluster a dataset too large to analyze as
>> a unit. Negative silhouette numbers strongly suggest that your
>> clustering parameters should be changed.
>>
>> Sarah
>>
>> On Fri, Feb 19, 2016 at 6:33 AM, ABABAEI, Behnam
>>  wrote:
>>> Hi,
>>>
>>>
>>> We know that clustering methods in R assign observations to the closest
>>> medoids. Hence, it is supposed to be the closest cluster each observation
>>> can have. So, I wonder how it is possible to have negative values of
>>> silhouette , while we are supposedly assign each observation to the closest
>>> cluster and the formula in silhouette method cannot get negative?
>>>
>>>
>>> Behnam.
>>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How a clustering algorithm in R can end up with negative silhouette values?

2016-02-19 Thread Sarah Goslee
You need to think more carefully about the details of the clara() method.

The algorithm draws repeated samples of sampsize from the larger
dataset, as specified by the arguments to the function.
It clusters each sample in turn, and saves the best one.
It uses the medoids from the best one to assign all of the points to a cluster.

But because the clustering is based on a subsample, it may not be
representative of the dataset as a whole, and may not provide a good
clustering overall. Just because it clusters the subsample well,
doesn't mean it clusters the entirety. The details section of the help
describes this, and the book references goes into more detail.

Sarah



On Fri, Feb 19, 2016 at 2:55 PM, ABABAEI, Behnam
 wrote:
> Hi Sarah,
>
> Thank you for the response. But it is said in its description that after
> each run (sample), each observation in the whole dataset is assigned to the
> closest cluster. So how is it possible for one observation to be wrongly
> allocated, even with clara?
>
> Behnam
>
> Behnam
>
>
>
>
> On Fri, Feb 19, 2016 at 11:48 AM -0800, "Sarah Goslee"
>  wrote:
>
> That means that points have been assigned to the wrong groups. This
> may readily happen with a clustering method like cluster::clara() that
> uses a subset of the data to cluster a dataset too large to analyze as
> a unit. Negative silhouette numbers strongly suggest that your
> clustering parameters should be changed.
>
> Sarah
>
> On Fri, Feb 19, 2016 at 6:33 AM, ABABAEI, Behnam
>  wrote:
>> Hi,
>>
>>
>> We know that clustering methods in R assign observations to the closest
>> medoids. Hence, it is supposed to be the closest cluster each observation
>> can have. So, I wonder how it is possible to have negative values of
>> silhouette , while we are supposedly assign each observation to the closest
>> cluster and the formula in silhouette method cannot get negative?
>>
>>
>> Behnam.
>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How a clustering algorithm in R can end up with negative silhouette values?

2016-02-19 Thread Sarah Goslee
That means that points have been assigned to the wrong groups. This
may readily happen with a clustering method like cluster::clara() that
uses a subset of the data to cluster a dataset too large to analyze as
a unit. Negative silhouette numbers strongly suggest that your
clustering parameters should be changed.

Sarah

On Fri, Feb 19, 2016 at 6:33 AM, ABABAEI, Behnam
 wrote:
> Hi,
>
>
> We know that clustering methods in R assign observations to the closest 
> medoids. Hence, it is supposed to be the closest cluster each observation can 
> have. So, I wonder how it is possible to have negative values of silhouette , 
> while we are supposedly assign each observation to the closest cluster and 
> the formula in silhouette method cannot get negative?
>
>
> Behnam.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problems with the deSolve package

2016-02-19 Thread Alexandre Suire
Hello R-users,

I'm trying to build a SIR-like model under R, 
using the "deSolve" package. I'm trying to do simulations of its dynamic
 over time, with three differential equations. I'm also looking to 
calculate the equilibrium state. 

So far, my code looks like this 

library(deSolve)
#This is my system, with three differential equations
system<-function(times, init, parameters){
with(as.list(c(init, parameters)),{
di1_dt=(alpha1*(N-i1-i2-i12)*(i1+i12))+(beta2*i12+gamma1*(N-i1-i2-i12))-(beta1*i1)-(delta*alpha2*i1*(i2+i12))
di2_dt=(alpha2*(N-i1-i2-i12)*(i2+i12))+(beta1*i12+gamma2*(N-i1-i2-i12))-(beta2*i2)-(delta*alpha1*i2*(i1+i12))
di12_dt=(delta*alpha2*i1*(i12+i2))+(delta*alpha1*i2*(i12*i1))+(delta*gamma1*i1)+(delta*gamma2*i2)-((beta1+beta2)*i12)
return(list(c(di1_dt,di2_dt,di12_dt)))
})
}

# Initials values and parameters
init<-c(i1=10, i2=10, i12=0) 
parameters<-c(alpha1=0.7, alpha2=0.5, beta1=0.5, beta2=0.3, gamma1=0.5, 
gamma2=0.5, delta=0.5, N=100) 
times<-seq(0,200, by=1)
simul <- as.data.frame(ode(y = init, times = times, func = system, parms = 
parameters, method="ode45"))
simul$time <- NULL
head(simul,200) 

#Plotting the results
matplot(times,
 simul, type = "l", xlab = "Time", ylab = "i1 i2 i12", main = 
"Simulation", lwd = 2, lty = 2, bty = "l", col=c("darkblue", 
"darkred","mediumseagreen"))
legend("bottomright", c("i1", "i2","i12"), lty=2,lwd=2, col = c("darkblue", 
"darkred", "mediumseagreen"))

At
 first, I just tried studying with only the first two equations, and it 
seems to work completely fine, but when I wanted to add the 3rd 
equation, I sometimes get this message, even when I juggle the 
parameters, when i launch the line:
#simul <- as.data.frame(ode(y = init, times = times, func = system, parms = 
parameters))
Warning messages:
1: In lsode(y, times, func, parms, mf = 10, ...) :
  an excessive amount of work (> maxsteps ) was done, but integration was not 
successful - increase maxsteps
2: In lsode(y, times, func, parms, mf = 10, ...) :
  Returning early. Results are accurate, as far as they go

Have I overlooked something ? I tried to use methods="ode45" and 
methods="adams", without any sucess.
Thank you for your time
Alex
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a datetime vector

2016-02-19 Thread D Wolf via R-help
Hello Jim,
I ran str() on the vector and it returned character:str("DF_exp.xlsx") chr 
"DF_exp.xlsx"
I tried df2_TZ$DateTimeStamp <- 
strptime(as.Date(as.character(df2_TZ$DateTimeStamp, format = "%m/%d/%Y %H:%M", 
tz = "GMT"))), which produced an error: Error in charToDate(x) :   character 
string is not in a standard unambiguous formatIn Excel, the column is formatted 
to m/d/ h:mm
Removing %S from these linesdf2_TZ$DateTimeStamp = 
as.POSIXct(df2_TZ$DateTimeStamp, format="%m/%d/%Y %H:%M", tz="GMT") 
df2_TZ$DateTimeStamp = as.POSIXct(as.character(df2_TZ$DateTimeStamp), format = 
"%m/%d/%Y %H:%M")
made the column NA

Thank You,Doug Wolfinger


 

On Friday, February 19, 2016 1:35 AM, Jim Lemon  
wrote:
 

 Hi Doug,For one thing, you may be using the wrong format. Your example format 
has no seconds field. The other thing to watch is whether the data are in 
%m/%d/%Y or %d/%m/%Y date format. If the latter, you would probably get that 
error on dates like 19/02/2016.
Jim

On Fri, Feb 19, 2016 at 8:12 AM, D Wolf via R-help  wrote:

Hello,I am trying to read a data frame column named DateTimeStamp. The time is 
in GMT in this format: 1/4/2013 23:30
require(xlsx)
df2_TZ = read.xlsx2("DF_exp.xlsx", sheetName = "Sheet1")

It's good to that line. But these three lines, which makes the dataframe, 
converts the column's values to NA:df2_TZ$DateTimeStamp = 
as.POSIXct(df2_TZ$DateTimeStamp, format="%m/%d/%Y %H:%M:%S", tz="GMT")

and... df2_TZ$DateTimeStamp = as.POSIXct(as.character(df2_TZ$DateTimeStamp), 
format = "%m/%d/%Y %H:%M:%S")

and...df2_TZ$DateTimeStamp = as.Date(df2_TZ$DateTimeStamp, format = "%m/%d/%Y 
%H:%M:%S")

This line returns and error...df2_TZ$DateTimeStamp = 
as.POSIXct(as.Date(df2_TZ$DateTimeStamp), format = "%m/%d/%Y %H:%M:%S")
"Error in charToDate(x) :   character string is not in a standard unambiguous 
format"
Additionally, I need to convert from GMT to North American time zones, and I 
think the advice on this page would be good for that: 
http://blog.revolutionanalytics.com/2009/06/converting-time-zones.html
My ultimate goal is to write an R program that finds data in another variable 
in df2_TZ that corresponds to a date and time that match up with the date and 
time in another data frame. For now, any help reading the column would be much 
appreciated.
Thank You,Doug
        [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How a clustering algorithm in R can end up with negative silhouette values?

2016-02-19 Thread ABABAEI, Behnam
Hi,


We know that clustering methods in R assign observations to the closest 
medoids. Hence, it is supposed to be the closest cluster each observation can 
have. So, I wonder how it is possible to have negative values of silhouette , 
while we are supposedly assign each observation to the closest cluster and the 
formula in silhouette method cannot get negative?


Behnam.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mixed-effects models with (g)lmer in R and model selection

2016-02-19 Thread Wilbert Heeringa
Dear all,

Mixed-effects models are wonderful for analyzing data, but it is always a
hassle to find the best model, i.e. the model with the lowest AIC,
especially when the number of predictor variables is large.

Presently when trying to find the right model, I perform the following
steps:

   1.

   Start with a model containing all predictors. Assuming dependent
   variable X and predictors A, B, C, D, E, I start with: X~A+B+C+D+E
   2.

   Lmer warns that is has dropped columns/coefficients. These are variables
   which have a *perfect* correlation with any of the other variables or
   with a combination of variables. With summary() it can be found which
   columns have been dropped. Assume predictor D has been dropped, I continue
   with this model: X~A+B+C+E
   3.

   Subsequently I need to check whether there are variables (or groups of
   variables) which *strongly* corrrelate to each other. I included the
   function vif.mer (developed by Austin F. Frank and available at:
   https://raw.github.com/aufrank/R-hacks/master/mer-utils.R) in my script,
   and when applying this function to my reduced model, I got vif values for
   each of the variables. When vif>5 for a predictor, it probably should be
   removed. In case multiple variables have a vif>5, I first remove the
   predictor with the highest vif, then re-run lmer en vif.mer. I remove again
   the predictor with highest vif (if one or more predictors have still a
   vif>5), and I repeat this until none of the remaining predictors has a
   vif>5. In case I got a warning "Model failed to converge" in the larger
   model(s), this warning does not appear any longer in the 'cleaned' model.
   4.

   Assume the following predictors have survived: A, B en E. Now I want to
   find the combination of predictors that gives the smallest AIC. For three
   predictors it is easy to try all combinations, but if it would have been 10
   predictors, manually trying all combinations would be time-consuming. So I
   used the function fitLMER.fnc from the LMERConvenienceFunctions package.
   This function back fit fixed effects, forward fit random effects, and
   re-back fit fixed effects. I consider the model given by fitLMER.fnc as the
   right one.

I am not an expert in mixed-effects models and have struggled with model
selection. I found the procedure which I decribed working, but I would
really be appreciate to hear whether the procedure is sound, or whether
there are better alternatives.

Best,

Wilbert

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why CLARA clustering method does not give the same classes as when I do clustering manually?

2016-02-19 Thread ABABAEI, Behnam
Hi,


I am using CLARA (in 'cluster' package). This method is supposed to assign each 
observation to the closest 'medoid'. But when I calculate the distance of 
medoids and observations manually and assign them manually, the results are 
slightly different (1-2 percent of occurrence probability). Does anyone know 
how clara calculates dissimilarities and why I get different clustering results?


Behnam.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf --Warning message:

2016-02-19 Thread Gabor Grothendieck
sqldf does not use Tk so you can ignore this.

On Fri, Feb 19, 2016 at 12:32 PM, Divakar Reddy
 wrote:
> Dear R users,
>
> I'm getting Waring message while trying to load "sqldf" package in R3.2.3
> and assuming that we can ignore this as it's WARNING Message and not an
> error message.
> Can you guide me if my assumption is wrong?
>
>
>> library(sqldf);
> Loading required package: gsubfn
> Loading required package: proto
> Loading required package: RSQLite
> Loading required package: DBI
> Warning message:
> no DISPLAY variable so Tk is not available
>
>> version   _
> platform   x86_64-redhat-linux-gnu
> version.string R version 3.2.3 (2015-12-10)
>>
>
> Thanks,
> Divakar
> Phoenix,USA
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a datetime vector

2016-02-19 Thread Jeff Newmiller
This is a mailing list. I don't know how you are interacting with it... using a 
website rather than an email program can lead to some confusion since there can 
be many ways to accomplish the task of interacting with the mailing list. My 
email program has a "reply-all" button when I am looking at an email. It also 
has an option to write the email in plain text, which often prevents the 
message from getting corrupted (recipient not seeing what you sent to the list).

Using the str function on a literal string (the name of a file) will indeed 
tell you that you gave it a character string. Specifying a column in your data 
might tell you something more interesting... e.g.

str( df2_TZ$DateTimeStamp )

If that says you have character data then Jim Lemon's suggestion would be a 
good next thing to look at.  If it is factor data then you should use the 
as.character function on the data column and then follow Jim's suggestion. If 
it is numeric then you probably need to convert it using an appropriate origin 
(e.g. as described at [1] or [2]).

I have had best luck setting the default timezone string when converting to 
POSIXt types... e.g.

# specify timezone assumed by input data
Sys.setenv( TZ="GMT" )
testdtm <- as.POSIXct( "1/1/2016 00:00", format = "%m/%d/%Y %H:%M" )
# inspect the result
testdtm
str( testdtm )
# view data from a different timezone
Sys.setenv( TZ="Etc/GMT+8" )
# no change to the underlying data, but it prints out differently now because 
the tz attribute is "" which implies using the default TZ
testdtm

[1] http://blog.mollietaylor.com/2013/08/date-formats-in-r.html
[2] https://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf

-- 
Sent from my phone. Please excuse my brevity.

On February 19, 2016 7:48:31 AM PST, D Wolf  wrote:
>Hello Jeff,
>I ran str() on the vector and it returned character.>
>str("DF_exp.xlsx") chr "DF_exp.xlsx"
>This is my first thread on this forum, and I'm not sure how to reply to
>the thread instead of just sending the reply to your email account; I
>don't see a 'reply' link in the thread.I've read this page and I don't
>think it advises on how to reply in the thread: R: Posting Guide: How
>to ask good questions that prompt useful answers
>
>|   |
>|   |  |   |   |   |   |   |
>| R: Posting Guide: How to ask good questions that prompt ...Posting
>Guide: How to ask good questions that prompt useful answers This guide
>is intended to help you get the most out of the R mailing lists, and to
>avoid embarra... |
>|  |
>| View on www.r-project.org | Preview by Yahoo |
>|  |
>|   |
>
>
>Thank You,Doug Wolfinger
> 
>
>On Friday, February 19, 2016 12:51 AM, Jeff Newmiller
> wrote:
> 
>
>You are being rather scattershot in your explanation, so I suspect you
>are not being systematic in your troubleshooting. Use the str function
>to examine the data column after you pull it in from excel. It may be
>numeric, factor, or character, and the approach depends on which that
>function returns. 
>-- 
>Sent from my phone. Please excuse my brevity.
>
>On February 18, 2016 1:12:40 PM PST, D Wolf via R-help
> wrote:
>Hello,I am trying to read a data frame column named DateTimeStamp. The
>time is in GMT in this format: 1/4/2013 23:30
>require(xlsx)
>df2_TZ = read.xlsx2("DF_exp.xlsx", sheetName = "Sheet1")
>
>It's good to that line. But these three lines, which makes the
>dataframe, converts the column's values to NA:df2_TZ$DateTimeStamp =
>as.POSIXct(df2_TZ$DateTimeStamp, format="%m/%d/%Y %H:%M:%S", tz="GMT")
>
>and... df2_TZ$DateTimeStamp =
>as.POSIXct(as.character(df2_TZ$DateTimeStamp), format = "%m/%d/%Y
>%H:%M:%S")
>
>and...df2_TZ$DateTimeStamp = as.Date(df2_TZ$DateTimeStamp, format =
>"%m/%d/%Y %H:%M:%S")
>
>This line returns and error...df2_TZ$DateTimeStamp =
>as.POSIXct(as.Date(df2_TZ$DateTimeStamp), format = "%m/%d/%Y %H:%M:%S")
>"Error in charToDate(x) :   character string is not in a standard
>unambiguous format"
>Additionally, I need to convert from GMT to North American time zones,
>and I think the advice on this page would
>be good for
>that: http://blog.revolutionanalytics.com/2009/06/converting-time-zones.html
>My ultimate goal is to write an R program that finds data in another
>variable in df2_TZ that corresponds to a date and time that match up
>with the date and time in another data frame. For now, any help reading
>the column would be much appreciated.
>Thank You,Doug
> [[alternative HTML version deleted]]
>
>
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

[R] sqldf --Warning message:

2016-02-19 Thread Divakar Reddy
Dear R users,

I'm getting Waring message while trying to load "sqldf" package in R3.2.3
and assuming that we can ignore this as it's WARNING Message and not an
error message.
Can you guide me if my assumption is wrong?


> library(sqldf);
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
Loading required package: DBI
Warning message:
no DISPLAY variable so Tk is not available

> version   _
platform   x86_64-redhat-linux-gnu
version.string R version 3.2.3 (2015-12-10)
>

Thanks,
Divakar
Phoenix,USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MACRO-LOOP in R

2016-02-19 Thread Thierry Onkelinx
Please keep the mailinglist in cc.

You should be able to solve this after reading the helpfiles of
?dplyr::mutate and ?tidyr::spread

Best regards,


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-02-18 18:44 GMT+01:00 Amoy Yang :

> So simply. What happens if I have multiple columns (b, d and e) to be
> transposed and renamed as you did for b? Thanks again for helps!
>
> x<-data.frame(
>  a=c(1,2,3),
>  b=c("1","2","3"),
>  d=c(5,6,7),
>  e=c("n1","n2","n3"));
> x; str(x)
>
>
> On Thursday, February 18, 2016 10:37 AM, Thierry Onkelinx <
> thierry.onkel...@inbo.be> wrote:
>
>
> R has no concept of macro language like SAS because it doesn't need it.
> Here is a solution for your problem.
>
> library(dplyr)
> library(tidyr)
> x %>%
>   mutate(b = paste0("b_", b)) %>%
>   spread(key = b, value = a)
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2016-02-18 17:17 GMT+01:00 Amoy Yang via R-help :
>
>  I am doing the data transpose with rename as shown below (step1 ~ step4)
> 1. Is any way in R similar to PROC TRANSPOSE used in SAS?2. How to use
> MACRO-LOOP to simplify the following procedure?
> THANK YOU FOR HELPS!
> # create data for test
> x<-data.frame(
>  a=c(1,2,3),
>  b=c("1","2","3"));
> x; str(x)# step1: parse out to 3 tabs
> x1<-x[x$a == 1,]; x1
> x2<-x[x$a == 2,]; x2
> x3<-x[x$a == 3,]; x3# step2: remove column a in each tab
> x1$a<-NULL; x1
> x2$a<-NULL; x2
> x3$a<-NULL; x3# step3: rename column b to b1, b2 and b3 by y1, y2 and y3
> names(x1)[names(x1)=="b"]<-"b_1"; x1
> names(x2)[names(x2)=="b"]<-"b_2"; x2
> names(x3)[names(x3)=="b"]<-"b_3"; x3# setp4: set x1, x3 and x3 together
> x123=cbind(x1,x2,x3); x123
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> 
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.