Hi Joga, Wilbert,
It indeed is an interesting aspect. I was triggered to think about this
during my masters research (with dense time series), and for me it was
helpful to think about orthogonal regression. One can find and compare
the expressions in the wikipedia entries at
https://en.wikipedia.org/wiki/Deming_regression#Orthogonal_regression
and https://en.wikipedia.org/wiki/Simple_linear_regression to see how it
impacts. [ Side note: As the authors in the paper you referenced
Wilbert, express, the correlation coefficient r is symmetric for x and y
and is not impacted.] A good example of how it changes the fit can be
found in the last figure of this blog
<https://www.r-bloggers.com/2018/10/about-a-curious-feature-and-interpretation-of-linear-regressions/>:
basically linear regression goes through the middle of the cloud at the
edges in the y-direction, while orthogonal goes through them balanced
perpendicular to the linear relation.
But in the end it also goes down to the general expectation in
regression to put the independent variable without error on the x-axis
and the dependent variable on the y-axis. From this we can derive it is
best to put the observations on the y-axis (*). Therefore we have two
reasons to adhere to the approach of putting observed on the y-axis and
predicted on the x-axis.
Hope this helps,
Jeroen
(*) Whether or not the predictions are without (residual) error is a
matter of debate and situation. If we go from PRED predictions to PRED
when the model has a covariate, to post-hoc predictions, the amount of
randomness increases. The observed values nevertheless will retain most
randomness and therefore are expected on the y-axis.
http://pd-value.com
[email protected]
@PD_value
+31 6 23118438
-- More value out of your data!
On 18-08-2023 08:07, Wilbert de Witte wrote:
Hi Joga,
Fully agree on this, unfortunately it is still often shown the other
way around which is at least confusing.
There is a publication on this very topic here
<https://www.sciencedirect.com/science/article/abs/pii/S0304380008002305> that
arrives at the same conclusion and can be helpful.
Best,
Wilbert
Op do 17 aug 2023 om 19:47 schreef Gobburu, Joga
<[email protected]>:
Dear James – how have you been?
Yes, you said it most eloquently. Its not about plotting per se
but “the problem is really that the loess line is fitting noise in
the wrong direction if the observed is actually on the x-axis”.
Thank you…J
*From: *James G Wright <[email protected]>
*Date: *Thursday, August 17, 2023 at 7:16 AM
*To: *Gobburu, Joga <[email protected]>,
[email protected] <[email protected]>
*Subject: *Re: [NMusers] Observed (yaxis) vs Predicted (xaxis)
Diagnostic Plot - Scientific basis.
You don't often get email from [email protected]. Learn why
this is important <https://aka.ms/LearnAboutSenderIdentification>
*CAUTION: *This message originated from a non-UMB email system.
Hover over any links before clicking and use caution opening
attachments.
So whichever axis the observed data is plotted on is parallel to
the direction of noise (random residual error). When you fit the
loess line, I think it will generally assume noise is vertical
i.e. parallel to the y-axis. So the problem is really that the
loess line is fitting noise in the wrong direction if the observed
is actually on the x-axis ... which means you are right, the
observed needs to go on the y-axis and deviations need to be
interpreted parallel to the y-axis.
Kind regards, James
https://product.popypkpd.com/
PS Of course, if you were to fit a loess line with horizontal
noise and observed data on the x-axis, you should reach identical
conclusions to the conventional vertical noise and observed data
on the y-axis.
On 17/08/2023 11:35, Gobburu, Joga wrote:
Dear Friends – Observations versus population predicted is
considered a standard diagnostic plot in our field. I used to
place observations on the x-axis and predictions on the yaxis.
Then I was pointed to a publication from ISOP
(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5321813/figure/psp412161-fig-0001/)
which recommended plotting predictions on the xaxis and
observations on the yaxis. To the best of my knowledge, there
was no justification provided. It did question my decades old
practice, so I did some thinking and digging. Thought to share
it here so others might benefit from it. If this is obvious to
you all, then I can say I am caught up!
1. We write our models as observed = predicted + random
error; which can be interpreted to be in the form: y =
f(x) + random error. It is technically not though. Hence
predicted goes on the xaxis, as it is free of random
error. It is considered a correlation plot, which makes
plotting either way acceptable. This is not so critical as
the next one.
2. However, there is a statistical reason why it is important
to keep predictions on the xaxis. Invariably we always add
a loess trend line for these diagnostic plots. To
demonstrate the impact, I took a simple iv bolus single
dose dataset and compared both approaches. The results are
available at this link:
https://github.com/jgobburu/public_didactic/blob/main/iv_sd.html.pdf.
I used Pumas software, but the scientific underpinning is
agnostic to software. See the two plots on Pages 5 and 6.
The interpretation of the bias between the two approaches
is different. This is the statistical reason why it
matters to plot predictions on the xaxis.
Joga Gobburu
University of Maryland
--
James G Wright PhD,
Scientist, Wright Dose Ltd
Tel: UK (0)772 5636914