Robust locally weighted regression (Loess / Lowess)
---------------------------------------------------

                 Key: MATH-278
                 URL: https://issues.apache.org/jira/browse/MATH-278
             Project: Commons Math
          Issue Type: New Feature
            Reporter: Eugene Kirpichov


Attached is a patch that implements the robust Loess procedure for smoothing 
univariate scatterplots with local linear regression ( 
http://en.wikipedia.org/wiki/Local_regression) described by William Cleveland 
in http://www.math.tau.ac.il/~yekutiel/MA%20seminar/Cleveland%201979.pdf , with 
tests.

(Also, the patch fixes one missing-javadoc checkstyle warning in the 
AbstractIntegrator class: I wanted to make it so that the code with my patch 
does not generate any checkstyle warnings at all)

I propose to include the procedure into commons-math because commons-math, as 
of now, does not possess a method for robust smoothing of noisy data: there is  
interpolation (which virtually can't be used for noisy data at all) and there's 
regression, which has quite different goals. 
Loess allows one to build a smooth curve with a controllable degree of 
smoothness that approximates the overall shape of the data.

I tried to follow the code requirements as strictly as possible: the tests 
cover the code completely, there are no checkstyle warnings, etc. The code is 
completely written by myself from scratch, with no borrowings of third-party 
licensed code.


The method is pretty computationally intensive (10000 points with a bandwidth 
of 0.3 and 4 robustness iterations take about 3.7sec on my machine; generally 
the complexity is O(robustnessIters * n^2 * bandwidth)), but I don't know how 
to optimize it further; all implementations that I have found use exactly the 
same algorithm as mine for the unidimensional case.

Some TODOs, in vastly increasing order of complexity:
 - Make the weight function customizable: according to Cleveland, this is 
needed in some exotic cases only, like, where the desired approximation is 
non-continuous, for example.
 - Make the degree of the locally fitted polynomial customizable: currently the 
algorithm does only a linear local regression; it might be useful to make it 
also use quadratic regression. Higher degrees are not worth it, according to 
Cleveland.
 - Generalize the algorithm to the multidimensional case: this will require A 
LOT of hard work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to