Hi all,

Given their usefulness, maybe we should be trying to use nonparametric bootstraps more often, at key decision points in model development, especially now that the ready availability of computing power has made this realistic. (I guess many of us already do this, but it's a point worth making all the same.) Although Nick's point about the amount of time required is well taken, on a large computing grid this is somewhat offset, especially in models with short run times, and as previous posters have pointed out, they can deliver very useful information, especially when used in combination with other useful PsN tools like case-deletion diagnostics.

Before we get too far away from it, one point needs correction - previous posts have implied that PsN uses an R script to generate its results. This is not strictly true. The bundled bootstrap.R script is not run by default, and does little more than generate some very basic histograms of the parameter distributions, using the raw PsN results as input. As Jakob rightly points out, it contains at least one bug (means and medians are flipped), and has some counter-intuitive default settings with respect to the runs it includes in the plots it creates, as well as having compatibility issues with recent R releases. (The cdd.R and llp.R scripts are similarly simplistic.) There are plans to replace them all with Xpose functions later on down the line, as mentioned elsewhere in this thread.

PsN's output table files (bootstrap_results.csv and raw_results1.csv), on the other hand, are created internally and seem reasonably robust. Runs included when generating the bootstrap_results.csv file can (and should) be pre-specified using the "-skip_*" flags passed to the bootstrap command at runtime, or specified in PsN's configuration file, psn.conf (whichever option is chosen, it's important to know - at minimum - what they are and what they imply about the results, linking in with what Marc wrote). Alternatively, the raw bootstrap data in raw_results1.csv can be analyzed by hand if a finer degree of control is called for.

Best
Justin

On 7/11/11 7:37 AM, Nick Holford wrote:
Leonid,

With regard to discarding runs at the boundary what I had in mind was runs which had reached the maximum number of iterations but I realized later that Jacob was referring to NONMEM's often irritating messages that usually just mean the initial estimate changed a lot or variance was getting close to zero.

There are of course some cases where the estimate is truly at a user defined constraint. Assuming that the user has thought carefully about these constraints then I would interpret a run that finished at this constraint boundary as showing NONMEM was stuck in a local minimum (probably because of the constraint boundary) and if the constraint was relaxed then perhaps a more useful estimate would be obtained.

In those cases then I think one can make an argument for discarding runs with parameters that are at this kind of boundary as well as those which reached an iteration limit.

In general I agree with your remarks (echoing those from Marc Gastonguay) that one needs to think about the way each bootstrap run behaved. But some things like non-convergence and failed covariance are ignorable because they don't influence the bootstrap distribution.

There is also the need to recognize that bootstraps can be seriously time consuming and the effort required to understand all the ways that runs might finish is usually not worth it given the purposes of doing a bootstrap.

The most important reason for doing a bootstrap is to get more robust estimates of the parameters. This was the main reason why these re-sampling procedures were initially developed. The bootstrap estimate of the parameters will usually be pretty insensitive to the margins of the distribution where the questionable run results are typically located.

A secondary semi-quantitative reason is to get a confidence interval which may be helpful for model selection. This may be influenced by the questionable runs but that is just part of the uncertainty that the confidence interval is used to define.

Nick

On 10/07/2011 11:13 p.m., Leonid Gibiansky wrote:
I thought that the original post was "results at a boundary should NOT be discarded" and Nick reply was just a typo. If it was not a typo, I would disagree and argue that all results should be included: Each data set is a particular realization. We should be able to use all of them. If some realizations are so special that the model behaves in an unusual way (with any definition of unusual: non-convergence, not convergence of the covariance step, parameter estimates at the boundary, etc.) we either need to accept those as is, or work with each of those special data sets one by one to push to the parameter estimates that we can accept, or change the bootstrap procedure (add stratification by covariates, by dose level, by route of administration, etc.) so that all data sets behave similarly.
Leonid

--------------------------------------
Leonid Gibiansky, Ph.D.
President, QuantPharm LLC
web:    www.quantpharm.com
e-mail: LGibiansky at quantpharm.com
tel:    (301) 767 5566



On 7/10/2011 2:57 PM, Stephen Duffull wrote:
Nick, Jakob, Marc et al

Thanks for your helpful comments. I agree with you that any results that
are at a boundary should be discarded from the bootstrap distribution.

On the whole I the sentiments in this thread align with anecdotal findings from my experience. But, I was just wondering how you define your boundaries for variance and covariance parameters (e.g. OMEGA terms)?

For variance terms, lower boundaries seems reasonably straightforward (e.g. 1E-5 seems close to zero). Upper boundaries are of course open, for the variance of a log-normal ETA would 1E+5 or 1E+4 be large enough to be considered close to a boundary? At what value would you discard the result? At what correlation value would you discard a result (>0.99,> 0.97...) as being close to 1. Clearly if this was for regulatory work you could define these a priori after having chosen any arbitrary cut-off. But the devil here lies with the non-regulatory work where you may not have defined these boundaries a priori.

Steve
--
Professor Stephen Duffull
Chair of Clinical Pharmacy
School of Pharmacy
University of Otago
PO Box 56 Dunedin
New Zealand
E: stephen.duff...@otago.ac.nz
P: +64 3 479 5044
F: +64 3 479 7034
W: http://pharmacy.otago.ac.nz/profiles/stephenduffull

Design software: www.winpopt.com




--

Justin Wilkins, PhD
Exprimo NV

Tel:    +41 (0) 81 599 23 82
Mobile:         +41 (0) 76 561 09 49
E-mail:         justin.wilk...@exprimo.com
Web:    www.exprimo.com


This e-mail is confidential. It is also privileged or otherwise protected by work product immunity or other legal rules. The information is intended to be for use of the individual or entity named above. If you are not the intended recipient, please be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. You should therefore delete this message from your computer system. If you have received the message in error, please notify us by reply e-mail. The integrity and security of this message cannot be guaranteed on the Internet.

Thank you for your co-operation.

Reply via email to