Hi all,
Given their usefulness, maybe we should be trying to use nonparametric
bootstraps more often, at key decision points in model development,
especially now that the ready availability of computing power has made
this realistic. (I guess many of us already do this, but it's a point
worth making all the same.) Although Nick's point about the amount of
time required is well taken, on a large computing grid this is somewhat
offset, especially in models with short run times, and as previous
posters have pointed out, they can deliver very useful information,
especially when used in combination with other useful PsN tools like
case-deletion diagnostics.
Before we get too far away from it, one point needs correction -
previous posts have implied that PsN uses an R script to generate its
results. This is not strictly true. The bundled bootstrap.R script is
not run by default, and does little more than generate some very basic
histograms of the parameter distributions, using the raw PsN results as
input. As Jakob rightly points out, it contains at least one bug (means
and medians are flipped), and has some counter-intuitive default
settings with respect to the runs it includes in the plots it creates,
as well as having compatibility issues with recent R releases. (The
cdd.R and llp.R scripts are similarly simplistic.) There are plans to
replace them all with Xpose functions later on down the line, as
mentioned elsewhere in this thread.
PsN's output table files (bootstrap_results.csv and raw_results1.csv),
on the other hand, are created internally and seem reasonably robust.
Runs included when generating the bootstrap_results.csv file can (and
should) be pre-specified using the "-skip_*" flags passed to the
bootstrap command at runtime, or specified in PsN's configuration file,
psn.conf (whichever option is chosen, it's important to know - at
minimum - what they are and what they imply about the results, linking
in with what Marc wrote). Alternatively, the raw bootstrap data in
raw_results1.csv can be analyzed by hand if a finer degree of control is
called for.
Best
Justin
On 7/11/11 7:37 AM, Nick Holford wrote:
Leonid,
With regard to discarding runs at the boundary what I had in mind was
runs which had reached the maximum number of iterations but I realized
later that Jacob was referring to NONMEM's often irritating messages
that usually just mean the initial estimate changed a lot or variance
was getting close to zero.
There are of course some cases where the estimate is truly at a user
defined constraint. Assuming that the user has thought carefully about
these constraints then I would interpret a run that finished at this
constraint boundary as showing NONMEM was stuck in a local minimum
(probably because of the constraint boundary) and if the constraint
was relaxed then perhaps a more useful estimate would be obtained.
In those cases then I think one can make an argument for discarding
runs with parameters that are at this kind of boundary as well as
those which reached an iteration limit.
In general I agree with your remarks (echoing those from Marc
Gastonguay) that one needs to think about the way each bootstrap run
behaved. But some things like non-convergence and failed covariance
are ignorable because they don't influence the bootstrap distribution.
There is also the need to recognize that bootstraps can be seriously
time consuming and the effort required to understand all the ways that
runs might finish is usually not worth it given the purposes of doing
a bootstrap.
The most important reason for doing a bootstrap is to get more robust
estimates of the parameters. This was the main reason why these
re-sampling procedures were initially developed. The bootstrap
estimate of the parameters will usually be pretty insensitive to the
margins of the distribution where the questionable run results are
typically located.
A secondary semi-quantitative reason is to get a confidence interval
which may be helpful for model selection. This may be influenced by
the questionable runs but that is just part of the uncertainty that
the confidence interval is used to define.
Nick
On 10/07/2011 11:13 p.m., Leonid Gibiansky wrote:
I thought that the original post was "results at a boundary should
NOT be discarded" and Nick reply was just a typo. If it was not a
typo, I would disagree and argue that all results should be included:
Each data set is a particular realization. We should be able to use
all of them. If some realizations are so special that the model
behaves in an unusual way (with any definition of unusual:
non-convergence, not convergence of the covariance step, parameter
estimates at the boundary, etc.) we either need to accept those as
is, or work with each of those special data sets one by one to push
to the parameter estimates that we can accept, or change the
bootstrap procedure (add stratification by covariates, by dose level,
by route of administration, etc.) so that all data sets behave
similarly.
Leonid
--------------------------------------
Leonid Gibiansky, Ph.D.
President, QuantPharm LLC
web: www.quantpharm.com
e-mail: LGibiansky at quantpharm.com
tel: (301) 767 5566
On 7/10/2011 2:57 PM, Stephen Duffull wrote:
Nick, Jakob, Marc et al
Thanks for your helpful comments. I agree with you that any results
that
are at a boundary should be discarded from the bootstrap distribution.
On the whole I the sentiments in this thread align with anecdotal
findings from my experience. But, I was just wondering how you
define your boundaries for variance and covariance parameters (e.g.
OMEGA terms)?
For variance terms, lower boundaries seems reasonably
straightforward (e.g. 1E-5 seems close to zero). Upper boundaries
are of course open, for the variance of a log-normal ETA would 1E+5
or 1E+4 be large enough to be considered close to a boundary? At
what value would you discard the result? At what correlation value
would you discard a result (>0.99,> 0.97...) as being close to 1.
Clearly if this was for regulatory work you could define these a
priori after having chosen any arbitrary cut-off. But the devil
here lies with the non-regulatory work where you may not have
defined these boundaries a priori.
Steve
--
Professor Stephen Duffull
Chair of Clinical Pharmacy
School of Pharmacy
University of Otago
PO Box 56 Dunedin
New Zealand
E: stephen.duff...@otago.ac.nz
P: +64 3 479 5044
F: +64 3 479 7034
W: http://pharmacy.otago.ac.nz/profiles/stephenduffull
Design software: www.winpopt.com
--
Justin Wilkins, PhD
Exprimo NV
Tel: +41 (0) 81 599 23 82
Mobile: +41 (0) 76 561 09 49
E-mail: justin.wilk...@exprimo.com
Web: www.exprimo.com
This e-mail is confidential. It is also privileged or otherwise
protected by work product immunity or other legal rules. The information
is intended to be for use of the individual or entity named above. If
you are not the intended recipient, please be aware that any disclosure,
copying, distribution or use of the contents of this information is
prohibited. You should therefore delete this message from your computer
system. If you have received the message in error, please notify us by
reply e-mail. The integrity and security of this message cannot be
guaranteed on the Internet.
Thank you for your co-operation.