Dear all,

We just identified that the cause of the problem is model misspecification, 
which happens for small values of x near zero for a logarithmic function. We 
managed to solve the problem by using a shift of the x-axis by using this:
C=THETA(1)
B=THETA(2)
                S=THETA(3)
F=C+B*LOG(FACTOR1+S)

Thanks!
Matthew


From: HUI, Ka Ho
Sent: Thursday, May 19, 2016 4:18 PM
To: [email protected]
Subject: Failure to arrive at expected parameter estimates

Dear all,

I have some data x (input) and y (output), with 'inverse' heteroscedasticity, 
where variance is greater for smaller x.
The data file is attached (data.txt).
After filtering off all data with FILTER1=1 and FILTER2=1, the binned data plot 
looks like this (Question.jpg).
Most data points are at small x (43.3% are between 0-10, 12.9% are between 
10-20, 9% are between 20-30, 34.8% for the rest, data are more sparse at larger 
x)

Blue points are the mean, red and purple points show the 5th and 95th 
percentiles in each bin. Green points are the SD in each bin. Curve estimations 
has been done and the equation for the means are shown as equation (1) and that 
for the SDs are shown at the bottom.

Here is a template for our first control stream, written according to the 
results of curve estimation for means:
$INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV
$DATA data.txt IGNORE=@ IGNORE=(FILTER1.EQ.1,FILTER2.EQ.1)
$PRED
                C=THETA(1)
                B=THETA(2)
                F=C+B*LOG(FACTOR1) ;Relationship as shown in equation (1)
                Y=F+EPS(1)
                DUMMY=ETA(1)
$THETA
                (-20, -0.5, 20) ;C, curve estimation result is -0.4465
                (-20, 1, 20) ;B, curve estimation result is 1.0266
$OMEGA
                0 FIXED
$SIGMA
                2
$EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1
$COV
$TABLE ...

The fitted parameters are illustrated by equation (3), which is obviously 
biased below for x > 100. The bias was also observed in residual plots.

To explain also for the heteroscedasticity, we tried another control stream, 
written according to the results of curve estimation for SD:
$INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV
$DATA data.txt IGNORE=@ IGNORE=(FILTER1.EQ.1,FILTER2.EQ.1)
$PRED
                C=THETA(1)
                B=THETA(2)
                C_SD=THETA(3)
                B_SD=THETA(4)
                W=C_SD*B_SD**FACTOR1 ;Relationship as shown in the equation at 
the bottom
                F=C+B*LOG(FACTOR1)
                Y=F+(W*EPS(1)) ;Variance depends on FACTOR1
                DUMMY=ETA(1)
$THETA
                (-20, -0.5, 20) ;C, curve estimation result is -0.4465
                (-20, 1, 20) ;B, curve estimation result is 1.0266
                (-20, 0.72, 20) ;C_SD, curve estimation result is 0.7529
                (-20, 1, 20) ;B_DD, curve estimation result is 0.9962
$OMEGA
                0 FIXED
$SIGMA
                1 FIXED
$EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1
$COV
$TABLE ...

The fitted parameters are illustrated by equation (2), which is still biased.

Despite the fact that most data points concentrate at small x, which may have 
contributed to the bias at large x, we observed the fitted parameters (equation 
(2)/equation(3)) and note that these two equations are in fact over-estimating 
the means even at small x, and therefore we have no idea why these two 
equations resulted. We tried different initial estimates but in vain.

It would be great if someone can give any advice!

Thanks!
Matthew

Reply via email to