[
https://issues.apache.org/jira/browse/HIVEMALL-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714425#comment-15714425
]
ASF GitHub Bot commented on HIVEMALL-22:
----------------------------------------
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/6
[WIP] Implement SST-based change-point detector
This PR is based on [a pending
PR](https://github.com/myui/hivemall/pull/356) by @takuti that is sent before
Hivemall entered Apache Incubator.
See [JIRA](https://issues.apache.org/jira/browse/HIVEMALL-22
) for tracking the status of this issue.
---
### Sample table
| time | x |
| --: | --: |
| 1 | 182.478 |
| 2 | 176.231 |
| 3 | 183.917 |
| 4 | 177.798 |
| 5 | 165.469 |
| ... | ... |
(14398 points from [twitter
data](https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series))
### Usage
``` sql
create temporary function sst as
'hivemall.anomaly.SingularSpectrumTransformUDF';
```
``` sql
SELECT
time,
-- x is double or array<double>
-- sst(x) AS res
sst(x, "-th 0.005") AS res
FROM
twitter_timeseries
ORDER BY time ASC
;
```
### Results
```
7551 {"changepoint_score":0.00453049288071683,"is_changepoint":false}
7552 {"changepoint_score":0.004711244102524104,"is_changepoint":false}
7553 {"changepoint_score":0.004814871928978115,"is_changepoint":false}
7554 {"changepoint_score":0.004968089640799422,"is_changepoint":false}
7555 {"changepoint_score":0.005709056330104878,"is_changepoint":true}
7556 {"changepoint_score":0.0044279766655132,"is_changepoint":false}
7557 {"changepoint_score":0.0034694956722586268,"is_changepoint":false}
7558 {"changepoint_score":0.002549056569322694,"is_changepoint":false}
7559 {"changepoint_score":0.0017395109108403473,"is_changepoint":false}
7560 {"changepoint_score":0.0010629833145070489,"is_changepoint":false}
```
On the naive SVD-based implementation, elapsed time was about 20 sec. for
the 14398 samples (vs. 10 sec on ChangeFinder)
### Observations
The change-point scores are much more stable compared to ChangeFinder, and
change-point scores are always in [0, 1]. However, since the scores are quite
noisy, too many change-points are detected. Smoothing scores like ChangeFinder
is practically important.
In terms of running time, the naive SVD-based implementation is clearly
inefficient. So, the Lanczos-based efficient variant should be supported.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-hivemall JIRA-22/pr-356
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hivemall/pull/6.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6
----
commit 3ebd771ee4bebf14769b7c240f8b28b9d5d10e86
Author: Takuya Kitazawa <[email protected]>
Date: 2016-09-26T08:12:01Z
Implement initial SST-based change-point detector
commit bde06e0952445bf60a9aef4bca182c0afe87e250
Author: Takuya Kitazawa <[email protected]>
Date: 2016-09-27T05:06:20Z
Rename SSTChangePoint -> SingularSpectrumTransform
commit 2bfd1270b1e9b79185a41cbe2568f2ce968d4a71
Author: Takuya Kitazawa <[email protected]>
Date: 2016-09-28T02:16:56Z
Add references for the original SST papers
commit 998203d5e8623d6282c2b187df24e4da7d41c16b
Author: Takuya Kitazawa <[email protected]>
Date: 2016-09-28T10:49:48Z
Support implicit-Krylov-approximation-based efficient SST
commit cc34435155e86718acb49fa42208aff730bb756c
Author: myui <[email protected]>
Date: 2016-12-02T07:55:23Z
Merge branch 'sst-changepoint' of https://github.com/takuti/hivemall into
JIRA-22/pr-356
----
> Review and merge pending Pull Requests before entering Incubator
> ----------------------------------------------------------------
>
> Key: HIVEMALL-22
> URL: https://issues.apache.org/jira/browse/HIVEMALL-22
> Project: Hivemall
> Issue Type: New Feature
> Reporter: Makoto Yui
> Assignee: Makoto Yui
>
> Need to review and merge pending Pull Requests in
> https://github.com/myui/hivemall/pulls
> * Feature Selection
> https://github.com/myui/hivemall/pull/385
> * SST change point detection
> https://github.com/myui/hivemall/pull/356
> * Checkstyle
> https://github.com/myui/hivemall/pull/343
> * System Test
> https://github.com/myui/hivemall/pull/336
> * Kernelized Passive Aggressive
> https://github.com/myui/hivemall/pull/304
> * Separate Optimizer
> https://github.com/myui/hivemall/pull/285
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)