Re: [NMusers] Confidence intervals of PsN bootstrap output

Justin Wilkins Mon, 11 Jul 2011 01:48:07 -0700

Hi all,

Given their usefulness, maybe we should be trying to use nonparametricbootstraps more often, at key decision points in model development,especially now that the ready availability of computing power has madethis realistic. (I guess many of us already do this, but it's a pointworth making all the same.) Although Nick's point about the amount oftime required is well taken, on a large computing grid this is somewhatoffset, especially in models with short run times, and as previousposters have pointed out, they can deliver very useful information,especially when used in combination with other useful PsN tools likecase-deletion diagnostics.

Before we get too far away from it, one point needs correction -previous posts have implied that PsN uses an R script to generate itsresults. This is not strictly true. The bundled bootstrap.R script isnot run by default, and does little more than generate some very basichistograms of the parameter distributions, using the raw PsN results asinput. As Jakob rightly points out, it contains at least one bug (meansand medians are flipped), and has some counter-intuitive defaultsettings with respect to the runs it includes in the plots it creates,as well as having compatibility issues with recent R releases. (Thecdd.R and llp.R scripts are similarly simplistic.) There are plans toreplace them all with Xpose functions later on down the line, asmentioned elsewhere in this thread.

PsN's output table files (bootstrap_results.csv and raw_results1.csv),on the other hand, are created internally and seem reasonably robust.Runs included when generating the bootstrap_results.csv file can (andshould) be pre-specified using the "-skip_*" flags passed to thebootstrap command at runtime, or specified in PsN's configuration file,psn.conf (whichever option is chosen, it's important to know - atminimum - what they are and what they imply about the results, linkingin with what Marc wrote). Alternatively, the raw bootstrap data inraw_results1.csv can be analyzed by hand if a finer degree of control iscalled for.


Best
Justin

On 7/11/11 7:37 AM, Nick Holford wrote:

Leonid,
With regard to discarding runs at the boundary what I had in mind wasruns which had reached the maximum number of iterations but I realizedlater that Jacob was referring to NONMEM's often irritating messagesthat usually just mean the initial estimate changed a lot or variancewas getting close to zero.
There are of course some cases where the estimate is truly at a userdefined constraint. Assuming that the user has thought carefully aboutthese constraints then I would interpret a run that finished at thisconstraint boundary as showing NONMEM was stuck in a local minimum(probably because of the constraint boundary) and if the constraintwas relaxed then perhaps a more useful estimate would be obtained.
In those cases then I think one can make an argument for discardingruns with parameters that are at this kind of boundary as well asthose which reached an iteration limit.
In general I agree with your remarks (echoing those from MarcGastonguay) that one needs to think about the way each bootstrap runbehaved. But some things like non-convergence and failed covarianceare ignorable because they don't influence the bootstrap distribution.
There is also the need to recognize that bootstraps can be seriouslytime consuming and the effort required to understand all the ways thatruns might finish is usually not worth it given the purposes of doinga bootstrap.
The most important reason for doing a bootstrap is to get more robustestimates of the parameters. This was the main reason why thesere-sampling procedures were initially developed. The bootstrapestimate of the parameters will usually be pretty insensitive to themargins of the distribution where the questionable run results aretypically located.
A secondary semi-quantitative reason is to get a confidence intervalwhich may be helpful for model selection. This may be influenced bythe questionable runs but that is just part of the uncertainty thatthe confidence interval is used to define.
Nick

On 10/07/2011 11:13 p.m., Leonid Gibiansky wrote:
I thought that the original post was "results at a boundary shouldNOT be discarded" and Nick reply was just a typo. If it was not atypo, I would disagree and argue that all results should be included:Each data set is a particular realization. We should be able to useall of them. If some realizations are so special that the modelbehaves in an unusual way (with any definition of unusual:non-convergence, not convergence of the covariance step, parameterestimates at the boundary, etc.) we either need to accept those asis, or work with each of those special data sets one by one to pushto the parameter estimates that we can accept, or change thebootstrap procedure (add stratification by covariates, by dose level,by route of administration, etc.) so that all data sets behavesimilarly.
Leonid

--------------------------------------
Leonid Gibiansky, Ph.D.
President, QuantPharm LLC
web:    www.quantpharm.com
e-mail: LGibiansky at quantpharm.com
tel:    (301) 767 5566



On 7/10/2011 2:57 PM, Stephen Duffull wrote:
Nick, Jakob, Marc et al
Thanks for your helpful comments. I agree with you that any resultsthat
are at a boundary should be discarded from the bootstrap distribution.
On the whole I the sentiments in this thread align with anecdotalfindings from my experience. But, I was just wondering how youdefine your boundaries for variance and covariance parameters (e.g.OMEGA terms)?
For variance terms, lower boundaries seems reasonablystraightforward (e.g. 1E-5 seems close to zero). Upper boundariesare of course open, for the variance of a log-normal ETA would 1E+5or 1E+4 be large enough to be considered close to a boundary? Atwhat value would you discard the result? At what correlation valuewould you discard a result (>0.99,> 0.97...) as being close to 1.Clearly if this was for regulatory work you could define these apriori after having chosen any arbitrary cut-off. But the devilhere lies with the non-regulatory work where you may not havedefined these boundaries a priori.
Steve
--
Professor Stephen Duffull
Chair of Clinical Pharmacy
School of Pharmacy
University of Otago
PO Box 56 Dunedin
New Zealand
E: stephen.duff...@otago.ac.nz
P: +64 3 479 5044
F: +64 3 479 7034
W: http://pharmacy.otago.ac.nz/profiles/stephenduffull

Design software: www.winpopt.com


--

Justin Wilkins, PhD
Exprimo NV

Tel:    +41 (0) 81 599 23 82
Mobile:         +41 (0) 76 561 09 49
E-mail:         justin.wilk...@exprimo.com
Web:    www.exprimo.com

This e-mail is confidential. It is also privileged or otherwiseprotected by work product immunity or other legal rules. The informationis intended to be for use of the individual or entity named above. Ifyou are not the intended recipient, please be aware that any disclosure,copying, distribution or use of the contents of this information isprohibited. You should therefore delete this message from your computersystem. If you have received the message in error, please notify us byreply e-mail. The integrity and security of this message cannot beguaranteed on the Internet.


Thank you for your co-operation.

Re: [NMusers] Confidence intervals of PsN bootstrap output

Reply via email to