Dear Martin and NMusers,

With reference to Martin saying that “A greater problem with these plots is
the commonly held expectation that for a "good model" a smooth or
regression line should align with the line of unity. Though this seems
intuitive it is a flawed assumption”, I would like to defend the assumption
that in a relevant number of cases it is reasonable to assume that the plot
of observations (y-axis) versus predictions (x-axis) is expected to have a
regression line going through unity.

First, to be clear, I do not disagree with anything said in the classic
Karlsson and Savic 2007 paper. With any model where random effects enter
into the model nonlinearly, the plot of observations (y-axis) versus PRED
(x-axis) can have trends which look like model misspecification, even if
the data-generating model for observations has exactly the same parameter
values as the diagnostics-generating model. This is because the PRED
signifies the prediction for the median individual with random effect
values at zero, which is different from the mean prediction. And the local
regression line, as the name implies, trends around the local mean of the
observed data.

So basically we need a PRED-like data item that reflects the expected mean
population prediction, integrated over the possible individual random
effects values. Lucky for us, this data item exists, and is called EPRED.
The data item was not available at the time the Savic and Karlsson paper
was published. It is available now. The EPRED solves the problems caused by
model nonlinearity and high inter-individual variability by integrating
over the random effects, given the parameter estimates. I do note that it
does not solve the problems of censoring and dose adaptation.

So, because the local regression line reflects the mean values and the
EPRED data item reflects mean values, I contend that in the absence of
censoring and dose adaptation, the plot of observations (y-axis) versus
EPRED (x-axis) can be expected to have a regression line that mostly agrees
with the line of unity, with some caveats (see below). This expectation
holds even if the observations are not symmetrically distributed over the
mean, because the local regression simply follows the mean. Moreover, if
the model accounts for censoring and dose adaptation, then it would be
possible to manually code and calculate the simulation-predicted population
mean values (e.g. simulating 1000 datasets, and for each observation taking
the mean simulated value that accounts for censoring and dose adaptation),
and use those in x-axis. Also to note, in this NMusers message group I
focused on the EPRED data item because it is NONMEM-specific, but the
general concept is software-agnostic: Having Monte Carlo-generated
population mean predictions on the x-axis should result in the plot of
observations (y-axis) versus predictions (x-axis) trending through the line
of unity.

Caveat 1: Because of random variability, it cannot be expected that the
regression line always goes perfectly through the line of unity. This
should come as no surprise, e.g. it is also not expected that a VPC will
have observed data percentiles always perfectly in the middle of the
simulation-generated confidence intervals for prediction intervals.

Caveat 2: For small datasets, it is possible that there be additional bias
in the plot of observations (y-axis) versus EPRED (x-axis) if the
data-generating model is exactly the same as the diagnostics-generating
model, because the data-generating model is not necessarily the one that
best agrees with the data. Illustrative example: Suppose we simulate a
dataset of 10 individual concentration-profiles at steady-state with high
drug accumulation, thus the concentrations will be highly dependent on the
clearance parameter. It is entirely possible that the 10 simulated
clearance random effects (eta) values will have a mean that is either above
or below zero to some relevant extent, thus greatly affecting the
steady-state predictions. Thus, as a result there could be an apparent,
systematic disagreement between the simulated data (observations, y-axis)
and the EPRED (x-axis) because of clearance random effects trending above
or below zero due to random variability. This problem could be remedied by
fitting a model to the simulated data, and using that model for generating
the diagnostics. At larger dataset sizes, the problem disappears because it
becomes less and less likely for the mean of the random effects to deviate
from zero to a relevant extent. This same caveat also exists for the VPC
diagnostic; if one simulates a small dataset as observations, and then
produces a VPC from the same simulation model (without fitting the model to
the previously simulated data), then there may be apparent misspecification
in the resulting VPC figure.

Supplemental remark 1: To illustrate how the loess follows mean even if the
data are not symmetrically distributed, the following R code snippet may be
relevant. It simulates 100 observations from lognormal distribution, and
then compares the smoothing curves from "loess" and "mgcv::gam" functions
to the theoretically expected mean value. There is a close agreement
between the loess curves and the analytically calculated mean value.
library(tidyverse)
with(list(omega=0.6),
     map_dfr(1:100,~tibble(x=1:10,y=exp(rnorm(10,0,omega)))) %>%
     mutate(theoretical=exp(omega^2/2)) %>%
     ggplot(aes(x,y))+geom_point()+
     geom_smooth(method="loess",col=3)+geom_smooth(method=mgcv::gam),col=4)+
    geom_line(aes(y=theoretical),col=2)

ps. The usual disclaimer, the opinions expressed in this message are mine
alone, and not necessarily those of my employer.

Best wishes,
Pyry Välitalo
PK Assessor at Finnish Medicines Agency

On Fri, 18 Aug 2023 at 10:59, Martin Bergstrand <
martin.bergstr...@pharmetheus.com> wrote:

> Dear Joga and all,
>
> Joga makes a valuable point that all pharmacometricians should be aware
> of. Standard methodology for regression assumes that the x-variable is
> without error (loess, linear regression etc.). Note that it is the same for
> NLME models i.e. we generally assume that our independent variables e.g.
> time, covariates etc. are without error.
>
> For DV vs. PRED plots it is common practice, even among those that do not
> know why, to plot PRED on the x-axis and DV on the y-axis. A greater
> problem with these plots is the commonly held expectation that for a "good
> model" a smooth or regression line should align with the line of unity.
> Though this seems intuitive it is a flawed assumption. This issue was
> clearly pointed out by Mats Karlsson and Rada Savic in their 2007 paper
> titled "Diagnosing Model Diagnostics''. For simple well-behaved examples
> you will see an alignment around the line of unity for DV vs. PRED plots.
> However, there are several factors that contribute to an expected deviation
> from this expectation:
> (1) Censoring (e.g. censoring of observations < LLOQ)
>  - In this case DVs are capped at LLOQ but PRED values are not.  This
> makes it perfectly expected that there will be a deviation from alignment
> around the line of unity in the lower range.
> (2) Strong non-linearities
> - The more nonlinear the modelled system is, the greater the expected
> deviation from the line of unity. Especially in combination with
> significant ETA correlations.
> (3) High variability
> - With higher between/within subject variability (e.g. IIV and RUV) that
> isn't normally distributed (e.g. exponential distributions) will result in
> an expected deviation from the line of unity. Note: this is a form of
> non-linearity so it may fall under the above category.
> (4) Adaptive designs (e.g. TDM dosing)
> - Listed in the original paper by Karlsson & Savic but I have not been
> able to recreate an issue in this case.
>
> I am rather sure that many thousands of hours have been spent on modeling
> trying to correct for perceived model misspecifications that are not really
> there. This is why I recommend relying primarily on simulation-based model
> diagnostics (e.g. VPCs) and as far as possible account for censoring that
> affects the original dataset. As pointed out by Karlsson & Savic a
> simulation/re-estimation based approach can also be used to investigate the
> expected behavior for DV vs. PRED plots for a particular model and dataset
> (e.g. mirror plots in Xpose). Note that to my knowledge there is yet
> no automated way to handle censoring in this context (clearly doable if
> anyone wants to develop a nifty implementation of that).
>
> If we leave the DV vs. PRED plot case, there are many other instances
> where we use scatter plots where it is much less clear what can be
> considered the independent variable and yet other cases where the
> assumption that the x-variable is without error is violated in a way that
> makes the results hard to interpret. One instance of the latter is when
> exposure-response is studied by plotting observed PD response versus
> observed trough plasma concentrations. This is already a way too long email
> so I will not deep dive into that problem as well.
>
> Best regards,
>
>
> Martin Bergstrand, Ph.D.
>
> Principal Consultant
>
> Pharmetheus AB
>
> martin.bergstr...@pharmetheus.com
>
> www.pharmetheus.com
>
>
> On Thu, Aug 17, 2023 at 12:44 PM Gobburu, Joga <jgobb...@rx.umaryland.edu>
> wrote:
>
>> Dear Friends – Observations versus population predicted is considered a
>> standard diagnostic plot in our field. I used to place observations on the
>> x-axis and predictions on the yaxis. Then I was pointed to a publication
>> from ISOP (
>> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5321813/figure/psp412161-fig-0001/)
>> which recommended plotting predictions on the xaxis and observations on the
>> yaxis. To the best of my knowledge, there was no justification provided. It
>> did question my decades old practice, so I did some thinking and digging.
>> Thought to share it here so others might benefit from it. If this is
>> obvious to you all, then I can say I am caught up!
>>
>>
>>
>>    1. We write our models as observed = predicted + random error; which
>>    can be interpreted to be in the form: y = f(x) + random error. It is
>>    technically not though. Hence predicted goes on the xaxis, as it is free 
>> of
>>    random error. It is considered a correlation plot, which makes plotting
>>    either way acceptable. This is not so critical as the next one.
>>    2. However, there is a statistical reason why it is important to keep
>>    predictions on the xaxis. Invariably we always add a loess trend line for
>>    these diagnostic plots. To demonstrate the impact, I took a simple iv 
>> bolus
>>    single dose dataset and compared both approaches. The results are 
>> available
>>    at this link:
>>    https://github.com/jgobburu/public_didactic/blob/main/iv_sd.html.pdf.
>>    I used Pumas software, but the scientific underpinning is agnostic to
>>    software. See the two plots on Pages 5 and 6. The interpretation of the
>>    bias between the two approaches is different. This is the statistical
>>    reason why it matters to plot predictions on the xaxis.
>>
>>
>>
>> Joga Gobburu
>>
>> University of Maryland
>>
>
> *This communication is confidential and is only intended for the use of
> the individual or entity to which it is directed. It may contain
> information that is privileged and exempt from disclosure under applicable
> law. If you are not the intended recipient please notify us immediately.
> Please do not copy it or disclose its contents to any other person.*
> *Any personal data will be processed in accordance with Pharmetheus'
> privacy notice, available here <https://pharmetheus.com/privacy-policy/>.*
>

Reply via email to