I'd like to reinforce Rod Little's warning about MCMC for models with collinear parameters. These can give you seriously wrong MCMC results because the Gibbs sampler gets stuck buzzing around in one bit of the joint distribution and never learns that there are other bits it should be representing. I don't think the SAS implementation of MI has any way of examining the MCMC iterations to see what the chains look like. Does anyone else know if thsi is possible? These will often show high autocorrealtions when this is happening. Even if the iterations looked OK I would be doubtful of the MCMC results if EM has failed. Why not re-express your data in some way that avoids the correlated data. If you have continuous variables you could take their mean and difference or if you have categorical data you could do some recoding to avoid the collinearity, though I hope you don't have since MI does not handle this kind of thing very well. Gillian Raab Napier University Edinburgh
________________________________ From: [email protected] on behalf of Rod Little Sent: Wed 22/06/2005 00:56 To: Howells, William Cc: [email protected] Subject: RE: [Impute] convergence of EM under collinearity Dear Paul: with a uniform prior on the mean and covariance matrix the ML estimate is the posterior mode, so I am not sure what is the default prior that is the basis for the posterior mode statement. MCMC convergence is much harder to determine than EM convergence, since it is convergence in the stochastic sense rather than in the sense of the maximum value of the likelihood function. If ML is having problems converging then so will MCMC. Including close to collinear covariates may not hurt you much for imputation, but it doesn't help much either. I'd check whether the imputations look reasonable -- a good idea any time MI is applied -- I would not assume the program produces sensible imputations, particularly when multicollinearity is an issue. On Tue, 21 Jun 2005, Howells, William wrote: > This is just off the top of my head, but I recall that MCMC is a better > method because it allows for uncertainty in the parameters themselves > (mean and variance), not only at the observation level. MCMC uses the > initial EM estimates as starting values so it might be interesting to > see what happens with the means over a large number of iterations, say > 1000, 5000 or 10,000, if you have the computing power and time. PROC MI > will plot the means and variances over the iterations and you can check > if they "stabilize" (timeplot option on the MCMC statement). What to > conclude if MCMC converges but EM does not, or how that relates to > collinearity in the model, I'm not sure. > > > > Bill Howells, MS > > Wash U Med School, St Louis > > > > ________________________________ > > From: [email protected] > [mailto:[email protected]] On Behalf Of Paul von > Hippel > Sent: Tuesday, June 21, 2005 3:00 PM > To: [email protected] > Subject: [Impute] convergence of EM under collinearity > > > > I am using SAS PROC MI with the default settings. When I include > nearly-collinear variables in my imputation model, I commonly get > messages like the following: > > WARNING: The EM algorithm (MLE) fails to converge after 200 iterations. > You can increase the number of iterations (MAXITER= option) or increase > the value of the convergence criterion (CONVERGE=option). > NOTE: The EM algorithm (posterior mode) converges in 141 iterations. > > The messages go away if I remove some of the nearly-collinear variables, > but I would like to keep those variables since I need them for analysis. > > Looking at the the messages, I would say that SAS's implementation of > the EM algorithm has two stages: the first stage estimates the MLE; the > second stage estimates the posterior mode. It also appears that the > second stage converged even though the first stage didn't. (Possibly the > second stage benefited from a default prior.) > > I don't find any of this discussed in the documentation. Would you agree > with my interpretation? > > Also, would you expect it's safe to use the imputed data despite the > warning? Since the second stage of EM converged, I'm thinking that the > imputed data may be okay. > > Many thanks for any advice. > Paul von Hippel > > > > Paul von Hippel > Department of Sociology / Initiative in Population Research > Ohio State University > > <br/>The materials in this message are private and may contain Protected > Healthcare Information. If you are not the intended recipient, be advised > that any unauthorized use, disclosure, copying or the taking of any action in > reliance on the contents of this information is strictly prohibited. If you > have received this email in error, please immediately notify the sender via > telephone or return mail. > ___________________________________________________________________________________ Roderick Little Richard D. Remington Collegiate Professor of Biostatistics U-M School of Public Health Tel (734) 936 1003 M4045 SPH II Fax (734) 763 2215 1420 Washington Hgts email [email protected] Ann Arbor, MI 48109-2029 http://www.sph.umich.edu/~rlittle/ _______________________________________________ Impute mailing list [email protected] http://lists.utsouthwestern.edu/mailman/listinfo/impute This message is intended for the addressee(s) only and should not be read, copied or disclosed to anyone else outwith the University without the permission of the sender. It is your responsibility to ensure that this message and any attachments are scanned for viruses or other defects. Napier University does not accept liability for any loss or damage which may result from this email or any attachment, or for errors or omissions arising after it was sent. Email is not a secure medium. Email entering the University's system is subject to routine monitoring and filtering by the University. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20050623/76a0ae63/attachment.htm From WhitesideMansellLeanne <@t> uams.edu Wed Jun 22 21:12:50 2005 From: WhitesideMansellLeanne <@t> uams.edu (Whiteside-Mansell, Leanne ) Date: Sun Jun 26 08:25:03 2005 Subject: [Impute] convergence of EM under collinearity Message-ID: <[email protected]> I'd be interested in a discussion on what constitutes 'checking to see if the imputations seem reasonable' ... Thanks, Leanne Leanne Whiteside-Mansell, Ed. D. Associate Professor UAMS College of Medicine Partners For Inclusive Communities Arkansas' University Center on Developmental Disabilities 2001 Pershing Circle, Suite 300 North Little Rock, AR 72114 http://www.uams.edu/partners/ Phone: 501-682-9933 Fax: 501-682-9901 Cell: 501-256-4199 As required by UAMS Administrative Guide 3.1.38: The information contained in this message is confidential and is intended for the addressee only. If you have received this message in error or there are any problems, please notify the sender immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. ________________________________ From: [email protected] [mailto:[email protected]] On Behalf Of Raab, Gillian Sent: Wednesday, June 22, 2005 7:05 PM To: [email protected]; [email protected] Subject: RE: [Impute] convergence of EM under collinearity I'd like to reinforce Rod Little's warning about MCMC for models with collinear parameters. These can give you seriously wrong MCMC results because the Gibbs sampler gets stuck buzzing around in one bit of the joint distribution and never learns that there are other bits it should be representing. I don't think the SAS implementation of MI has any way of examining the MCMC iterations to see what the chains look like. Does anyone else know if thsi is possible? These will often show high autocorrealtions when this is happening. Even if the iterations looked OK I would be doubtful of the MCMC results if EM has failed. Why not re-express your data in some way that avoids the correlated data. If you have continuous variables you could take their mean and difference or if you have categorical data you could do some recoding to avoid the collinearity, though I hope you don't have since MI does not handle this kind of thing very well. Gillian Raab Napier University Edinburgh ________________________________ From: [email protected] on behalf of Rod Little Sent: Wed 22/06/2005 00:56 To: Howells, William Cc: [email protected] Subject: RE: [Impute] convergence of EM under collinearity Dear Paul: with a uniform prior on the mean and covariance matrix the ML estimate is the posterior mode, so I am not sure what is the default prior that is the basis for the posterior mode statement. MCMC convergence is much harder to determine than EM convergence, since it is convergence in the stochastic sense rather than in the sense of the maximum value of the likelihood function. If ML is having problems converging then so will MCMC. Including close to collinear covariates may not hurt you much for imputation, but it doesn't help much either. I'd check whether the imputations look reasonable -- a good idea any time MI is applied -- I would not assume the program produces sensible imputations, particularly when multicollinearity is an issue. On Tue, 21 Jun 2005, Howells, William wrote: > This is just off the top of my head, but I recall that MCMC is a better > method because it allows for uncertainty in the parameters themselves > (mean and variance), not only at the observation level. MCMC uses the > initial EM estimates as starting values so it might be interesting to > see what happens with the means over a large number of iterations, say > 1000, 5000 or 10,000, if you have the computing power and time. PROC MI > will plot the means and variances over the iterations and you can check > if they "stabilize" (timeplot option on the MCMC statement). What to > conclude if MCMC converges but EM does not, or how that relates to > collinearity in the model, I'm not sure. > > > > Bill Howells, MS > > Wash U Med School, St Louis > > > > ________________________________ > > From: [email protected] > [mailto:[email protected]] On Behalf Of Paul von > Hippel > Sent: Tuesday, June 21, 2005 3:00 PM > To: [email protected] > Subject: [Impute] convergence of EM under collinearity > > > > I am using SAS PROC MI with the default settings. When I include > nearly-collinear variables in my imputation model, I commonly get > messages like the following: > > WARNING: The EM algorithm (MLE) fails to converge after 200 iterations. > You can increase the number of iterations (MAXITER= option) or increase > the value of the convergence criterion (CONVERGE=option). > NOTE: The EM algorithm (posterior mode) converges in 141 iterations. > > The messages go away if I remove some of the nearly-collinear variables, > but I would like to keep those variables since I need them for analysis. > > Looking at the the messages, I would say that SAS's implementation of > the EM algorithm has two stages: the first stage estimates the MLE; the > second stage estimates the posterior mode. It also appears that the > second stage converged even though the first stage didn't. (Possibly the > second stage benefited from a default prior.) > > I don't find any of this discussed in the documentation. Would you agree > with my interpretation? > > Also, would you expect it's safe to use the imputed data despite the > warning? Since the second stage of EM converged, I'm thinking that the > imputed data may be okay. > > Many thanks for any advice. > Paul von Hippel > > > > Paul von Hippel > Department of Sociology / Initiative in Population Research > Ohio State University > > <br/>The materials in this message are private and may contain Protected Healthcare Information. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. > ________________________________________________________________________ ___________ Roderick Little Richard D. Remington Collegiate Professor of Biostatistics U-M School of Public Health Tel (734) 936 1003 M4045 SPH II Fax (734) 763 2215 1420 Washington Hgts email [email protected] Ann Arbor, MI 48109-2029 http://www.sph.umich.edu/~rlittle/ _______________________________________________ Impute mailing list [email protected] http://lists.utsouthwestern.edu/mailman/listinfo/impute This message is intended for the addressee(s) only and should not be read, copied or disclosed to anyone else outwith the University without the permission of the sender. It is your responsibility to ensure that this message and any attachments are scanned for viruses or other defects. Napier University does not accept liability for any loss or damage which may result from this email or any attachment, or for errors or omissions arising after it was sent. Email is not a secure medium. Email entering the University's system is subject to routine monitoring and filtering by the University. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20050622/81ac6dfa/attachment.htm From JUDKIND1 <@t> westat.com Thu Jun 23 08:51:58 2005 From: JUDKIND1 <@t> westat.com (David Judkins) Date: Sun Jun 26 08:25:03 2005 Subject: [Impute] convergence of EM under collinearity Message-ID: <[email protected]> This is a bit off target, but for those really interested in using collinear covariates, tree-based nonparametric modeling algorithms such as MART (Friedman) and GBM (Ridgeway) are not bothered by collinear covariates. Last summer, we also saw a Bayesian version called BART (Hill and McCulloch). In 2001, Paul Zador, Barnali Das, and myself reported on using MART in imputation (JSM Proceedings). David Judkins Senior Statistician Westat 1650 Research Boulevard Rockville, MD 20850 (301) 315-5970 [email protected] -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Raab, Gillian Sent: Wednesday, June 22, 2005 8:05 PM To: [email protected]; [email protected] Subject: RE: [Impute] convergence of EM under collinearity I'd like to reinforce Rod Little's warning about MCMC for models with collinear parameters. These can give you seriously wrong MCMC results because the Gibbs sampler gets stuck buzzing around in one bit of the joint distribution and never learns that there are other bits it should be representing. I don't think the SAS implementation of MI has any way of examining the MCMC iterations to see what the chains look like. Does anyone else know if thsi is possible? These will often show high autocorrealtions when this is happening. Even if the iterations looked OK I would be doubtful of the MCMC results if EM has failed. Why not re-express your data in some way that avoids the correlated data. If you have continuous variables you could take their mean and difference or if you have categorical data you could do some recoding to avoid the collinearity, though I hope you don't have since MI does not handle this kind of thing very well. Gillian Raab Napier University Edinburgh ________________________________ From: [email protected] on behalf of Rod Little Sent: Wed 22/06/2005 00:56 To: Howells, William Cc: [email protected] Subject: RE: [Impute] convergence of EM under collinearity Dear Paul: with a uniform prior on the mean and covariance matrix the ML estimate is the posterior mode, so I am not sure what is the default prior that is the basis for the posterior mode statement. MCMC convergence is much harder to determine than EM convergence, since it is convergence in the stochastic sense rather than in the sense of the maximum value of the likelihood function. If ML is having problems converging then so will MCMC. Including close to collinear covariates may not hurt you much for imputation, but it doesn't help much either. I'd check whether the imputations look reasonable -- a good idea any time MI is applied -- I would not assume the program produces sensible imputations, particularly when multicollinearity is an issue. On Tue, 21 Jun 2005, Howells, William wrote: > This is just off the top of my head, but I recall that MCMC is a better > method because it allows for uncertainty in the parameters themselves > (mean and variance), not only at the observation level. MCMC uses the > initial EM estimates as starting values so it might be interesting to > see what happens with the means over a large number of iterations, say > 1000, 5000 or 10,000, if you have the computing power and time. PROC MI > will plot the means and variances over the iterations and you can check > if they "stabilize" (timeplot option on the MCMC statement). What to > conclude if MCMC converges but EM does not, or how that relates to > collinearity in the model, I'm not sure. > > > > Bill Howells, MS > > Wash U Med School, St Louis > > > > ________________________________ > > From: [email protected] > [mailto:[email protected]] On Behalf Of Paul von > Hippel > Sent: Tuesday, June 21, 2005 3:00 PM > To: [email protected] > Subject: [Impute] convergence of EM under collinearity > > > > I am using SAS PROC MI with the default settings. When I include > nearly-collinear variables in my imputation model, I commonly get > messages like the following: > > WARNING: The EM algorithm (MLE) fails to converge after 200 iterations. > You can increase the number of iterations (MAXITER= option) or increase > the value of the convergence criterion (CONVERGE=option). > NOTE: The EM algorithm (posterior mode) converges in 141 iterations. > > The messages go away if I remove some of the nearly-collinear variables, > but I would like to keep those variables since I need them for analysis. > > Looking at the the messages, I would say that SAS's implementation of > the EM algorithm has two stages: the first stage estimates the MLE; the > second stage estimates the posterior mode. It also appears that the > second stage converged even though the first stage didn't. (Possibly the > second stage benefited from a default prior.) > > I don't find any of this discussed in the documentation. Would you agree > with my interpretation? > > Also, would you expect it's safe to use the imputed data despite the > warning? Since the second stage of EM converged, I'm thinking that the > imputed data may be okay. > > Many thanks for any advice. > Paul von Hippel > > > > Paul von Hippel > Department of Sociology / Initiative in Population Research > Ohio State University > > <br/>The materials in this message are private and may contain Protected Healthcare Information. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. > ________________________________________________________________________ ___________ Roderick Little Richard D. Remington Collegiate Professor of Biostatistics U-M School of Public Health Tel (734) 936 1003 M4045 SPH II Fax (734) 763 2215 1420 Washington Hgts email [email protected] Ann Arbor, MI 48109-2029 http://www.sph.umich.edu/~rlittle/ _______________________________________________ Impute mailing list [email protected] http://lists.utsouthwestern.edu/mailman/listinfo/impute This message is intended for the addressee(s) only and should not be read, copied or disclosed to anyone else outwith the University without the permission of the sender. It is your responsibility to ensure that this message and any attachments are scanned for viruses or other defects. Napier University does not accept liability for any loss or damage which may result from this email or any attachment, or for errors or omissions arising after it was sent. Email is not a secure medium. Email entering the University's system is subject to routine monitoring and filtering by the University. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20050623/c196d20a/attachment.htm From f.harrell <@t> vanderbilt.edu Thu Jun 23 10:51:53 2005 From: f.harrell <@t> vanderbilt.edu (Frank E Harrell Jr) Date: Sun Jun 26 08:25:03 2005 Subject: [Impute] convergence of EM under collinearity In-Reply-To: <[email protected]> References: <[email protected]> Message-ID: <[email protected]> To me this issue is worthy of new research. As David said, methods other than MCMC and EM do not have difficulties with collinearity, especially with algebraic collinearity such as putting age and the square of age in a model. Since I like using simple truncated power basis regression splines routinely, whose terms can be quite collinear, I would very much like to see new developments in MCMC and EM that do not require me to translate the model into a less interpretable form. Frank Harrell David Judkins wrote: > > > This is a bit off target, but for those really interested in using > collinear covariates, tree-based nonparametric modeling algorithms such > as MART (Friedman) and GBM (Ridgeway) are not bothered by collinear > covariates. Last summer, we also saw a Bayesian version called BART > (Hill and McCulloch). In 2001, Paul Zador , Barnali Das , and myself > reported on using MART in imputation (JSM Proceedings). > > David Judkins > Senior Statistician > Westat > 1650 Research Boulevard > Rockville , MD 20850 > (301) 315-5970 > [email protected] > > > > -----Original Message----- > *From:* [email protected] > [mailto:[email protected]] *On Behalf Of *Raab, > Gillian > *Sent:* Wednesday, June 22, 2005 8:05 PM > *To:* [email protected]; [email protected] > *Subject:* RE: [Impute] convergence of EM under collinearity > > > > I'd like to reinforce Rod Little's warning about MCMC for models with > collinear parameters. These can give you seriously wrong MCMC results > because the Gibbs sampler gets stuck buzzing around in one bit of the > joint distribution and never learns that there are other bits it should > be representing. I don't think the SAS implementation of MI has any way > of examining the MCMC iterations to see what the chains look like. Does > anyone else know if thsi is possible? These will often show high > autocorrealtions when this is happening. > > > > Even if the iterations looked OK I would be doubtful of the MCMC > results if EM has failed. Why not re-express your data in some way that > avoids the correlated data. If you have continuous variables you could > take their mean and difference or if you have categorical data you could > do some recoding to avoid the collinearity, though I hope you don't have > since MI does not handle this kind of thing very well. > > > > Gillian Raab > > Napier University > > Edinburgh > > > > * From: * [email protected] on behalf of Rod Little > *Sent:* Wed 22/06/2005 00:56 > *To:* Howells, William > *Cc:* [email protected] > *Subject:* RE: [Impute] convergence of EM under collinearity > > Dear Paul: with a uniform prior on the mean and covariance matrix the ML > estimate is the posterior mode, so I am not sure what is the default prior > that is the basis for the posterior mode statement. > > MCMC convergence is much harder to determine than EM convergence, since it > is convergence in the stochastic sense rather than in the sense of the > maximum value of the likelihood function. If ML is having problems > converging then so will MCMC. > > Including close to collinear covariates may not hurt you much for > imputation, but it doesn't help much either. I'd check whether the > imputations look reasonable -- a good idea any time MI is applied -- I > would not assume the program produces sensible imputations, particularly > when multicollinearity is an issue. > > On Tue, 21 Jun 2005, Howells, William wrote: > >> This is just off the top of my head, but I recall that MCMC is a better >> method because it allows for uncertainty in the parameters themselves >> (mean and variance), not only at the observation level. MCMC uses the >> initial EM estimates as starting values so it might be interesting to >> see what happens with the means over a large number of iterations, say >> 1000, 5000 or 10,000, if you have the computing power and time. PROC MI >> will plot the means and variances over the iterations and you can check >> if they "stabilize" (timeplot option on the MCMC statement). What to >> conclude if MCMC converges but EM does not, or how that relates to >> collinearity in the model, I'm not sure. >> >> >> >> Bill Howells, MS >> >> Wash U Med School, St Louis >> >> >> >> ________________________________ >> >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Paul von >> Hippel >> Sent: Tuesday, June 21, 2005 3:00 PM >> To: [email protected] >> Subject: [Impute] convergence of EM under collinearity >> >> >> >> I am using SAS PROC MI with the default settings. When I include >> nearly-collinear variables in my imputation model, I commonly get >> messages like the following: >> >> WARNING: The EM algorithm (MLE) fails to converge after 200 iterations. >> You can increase the number of iterations (MAXITER= option) or increase >> the value of the convergence criterion (CONVERGE=option). >> NOTE: The EM algorithm (posterior mode) converges in 141 iterations. >> >> The messages go away if I remove some of the nearly-collinear variables, >> but I would like to keep those variables since I need them for analysis. >> >> Looking at the the messages, I would say that SAS's implementation of >> the EM algorithm has two stages: the first stage estimates the MLE; the >> second stage estimates the posterior mode. It also appears that the >> second stage converged even though the first stage didn't. (Possibly the >> second stage benefited from a default prior.) >> >> I don't find any of this discussed in the documentation. Would you agree >> with my interpretation? >> >> Also, would you expect it's safe to use the imputed data despite the >> warning? Since the second stage of EM converged, I'm thinking that the >> imputed data may be okay. >> >> Many thanks for any advice. >> Paul von Hippel >> >> >> >> Paul von Hippel >> Department of Sociology / Initiative in Population Research >> Ohio State University >> >> <br/>The materials in this message are private and may contain > Protected Healthcare Information. If you are not the intended recipient, > be advised that any unauthorized use, disclosure, copying or the taking > of any action in reliance on the contents of this information is > strictly prohibited. If you have received this email in error, please > immediately notify the sender via telephone or return mail. >> > > ___________________________________________________________________________________ > Roderick Little > Richard D. Remington Collegiate Professor of Biostatistics > U-M School of Public Health Tel (734) 936 1003 > M4045 SPH II Fax (734) 763 2215 > 1420 Washington Hgts email [email protected] > Ann Arbor, MI 48109-2029 http://www.sph.umich.edu/~rlittle/ > > _______________________________________________ > Impute mailing list > [email protected] > http://lists.utsouthwestern.edu/mailman/listinfo/impute > > * This message is intended for the addressee(s) only and should not be > read, copied or disclosed to anyone else outwith the University without > the permission of the sender. It is your responsibility to ensure that > this message and any attachments are scanned for viruses or other > defects. Napier University does not accept liability for any loss or > damage which may result from this email or any attachment, or for errors > or omissions arising after it was sent. Email is not a secure medium. > Email entering the University's system is subject to routine monitoring > and filtering by the University. ** * > > > ------------------------------------------------------------------------ > > _______________________________________________ > Impute mailing list > [email protected] > http://lists.utsouthwestern.edu/mailman/listinfo/impute -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
