[
https://issues.apache.org/jira/browse/MATH-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150921#comment-17150921
]
David Hudson commented on MATH-1428:
------------------------------------
I encountered this issue recently, also with a dataset having multiple dummy
variables.
Turned out the the columns were not linearly independent. After removing one of
the dummy variables, different column orders produced a stable output as
expected.
It's worth noting that I compared the results against some python libraries
(sklearn/statsmodels) and these gave the correct results for things like the
intercept and regular varibales even with the dependent columns.
> OLSMultipleLinearRegression estimates different residuals with different
> order of input
> ----------------------------------------------------------------------------------------
>
> Key: MATH-1428
> URL: https://issues.apache.org/jira/browse/MATH-1428
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 3.4.1
> Environment: win7 64bit jdk1.8 intelljidea
> Reporter: butchild
> Priority: Major
> Labels: ols, regression, residuals
>
> I have a regression job with 31 X ,which 30 of them are dummys .
> And the length of data is 800+ .
> I'm using OLSMultipleLinearRegression to do regression.
> I found if I change the order of the 800+ data, the residuals I got from
> ols.estimateResiduals()
> are differents ,and the correlation of the two differet rersiduals is near
> 100%,like 99.8%.
> My data is below in Docs Text area.
> The fields of each Column is :
> sig,y,x1,x2,........xn
--
This message was sent by Atlassian Jira
(v8.3.4#803005)