Dear R Help Team. I get some weird results when I use the lm function with weight. The issue can be reproduced by the example below:
The input data is (weights are intentionally designed to reflect some structures in the data) > df y x weight 1.51156139 0.55209240 2.117337e-34 -0.63653132 -0.12599316 2.117337e-34 0.37782776 0.42095384 4.934135e-31 3.03792318 1.40315446 2.679495e-24 1.53646523 0.46076858 2.679495e-24 -2.37727874 -0.73963576 6.244160e-21 0.37183065 0.20407468 1.455107e-17 -1.53917553 -0.95519361 1.455107e-17 1.10926675 0.03897129 3.390908e-14 -0.37786333 -0.17523593 3.390908e-14 2.43973603 0.97970095 7.902000e-11 -0.35432394 -0.03742559 7.902000e-11 2.19296613 1.00355263 4.289362e-04 0.49845532 0.34816207 4.289362e-04 1.25005260 0.76306225 5.000000e-01 0.84360691 0.45152356 5.000000e-01 0.29565993 0.53880068 5.000000e-01 -0.54081334 -0.28104525 5.000000e-01 0.83612836 -0.12885659 9.995711e-01 -1.42526769 -0.87107631 9.999998e-01 0.10204789 -0.11649899 1.000000e+00 1.14292898 0.37249631 1.000000e+00 -3.02942081 -1.28966997 1.000000e+00 -1.37549764 -0.74676145 1.000000e+00 -2.00118016 -0.55182759 1.000000e+00 -4.24441674 -1.94603608 1.000000e+00 1.17168144 1.00868008 1.000000e+00 2.64007761 1.26333069 1.000000e+00 1.98550114 1.18509599 1.000000e+00 -0.58941683 -0.61972416 9.999998e-01 -4.57559611 -2.30914920 9.995711e-01 -0.82610544 -0.39347576 9.995711e-01 -0.02768220 0.20076910 9.995711e-01 0.78186399 0.25690215 9.995711e-01 -0.88314153 -0.20200148 5.000000e-01 -4.17076452 -2.03547588 5.000000e-01 0.93373070 0.54190626 4.289362e-04 -0.08517734 0.17692491 4.289362e-04 -4.47546619 -2.14876688 4.289362e-04 -1.65509103 -0.76898087 4.289362e-04 -0.39403030 -0.12689705 4.289362e-04 0.01203300 -0.18689898 1.841442e-07 -4.82762639 -2.31391121 1.841442e-07 -0.72658380 -0.39751171 3.397282e-14 -2.35886866 -1.01082109 0.000000e+00 -2.03762707 -0.96439902 0.000000e+00 0.90115123 0.60172286 0.000000e+00 1.55999194 0.83433953 0.000000e+00 3.07994058 1.30942776 0.000000e+00 1.78871462 1.10605530 0.000000e+00 Running simple linear model returns: > lm(y~x,data=df) Call: lm(formula = y ~ x, data = df) Coefficients: (Intercept) x -0.04173 2.03790 and > max(resid(lm(y~x,data=df))) [1] 1.14046 *HOWEVER if I use the weighted model then:* lm(formula = y ~ x, data = df, weights = df$weights) Coefficients: (Intercept) x -0.05786 1.96087 and > max(resid(lm(y~x,data=df,weights=df$weights))) [1] 60.91888 as you see, the estimation of the coefficients are nearly the same but the resid() function returns a giant residual (I have some cases where the value is much much higher). Further, if I calculate the residuals by simply predict(lm(y~x,data=df,weights=df$weights))-df$y then I get the true value for the residuals. Thanks. Please do not hesitate to contact me for more details. Regards, Hamed. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.