You're right, the a and b were switched in computing the error term when I copied this to the PR. This meant that significantly more points were considered outliers (but enough retained to typically give a reasonable regression). Unfortunately this fix means that it's still pretty sensitive to multiple outliers...
I'm trying a simpler approach: just assume the top quantile is outliers. We have enough data to make this pretty robust. Running experiments now. (As for computing h, I used sagemath.) [ Full content available at: https://github.com/apache/beam/pull/6375 ] This message was relayed via gitbox.apache.org for [email protected]
