[ 
https://issues.apache.org/jira/browse/HIVEMALL-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15724925#comment-15724925
 ] 

ASF GitHub Bot commented on HIVEMALL-22:
----------------------------------------

GitHub user myui reopened a pull request:

    https://github.com/apache/incubator-hivemall/pull/11

    [WIP] Support Feature Selection UDFs

    This PR is based on [a pending 
PR](https://github.com/myui/hivemall/pull/356) by @takuti that is sent before 
Hivemall entered Apache Incubator.
    
    See [JIRA](https://issues.apache.org/jira/browse/HIVEMALL-22
    ) for tracking the status of this issue.
    
    ---
    
    ### Sample table
    
    | time | x |
    | --: | --: |
    | 1 | 182.478 |
    | 2 | 176.231 |
    | 3 | 183.917 |
    | 4 | 177.798 |
    | 5 | 165.469 |
    | ... | ... |
    
    (14398 points from [twitter 
data](https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series))
    ### Usage
    
    ``` sql
    create temporary function sst as 
'hivemall.anomaly.SingularSpectrumTransformUDF';
    ```
    
    ``` sql
    SELECT
      time,
      -- x is double or array<double>
      -- sst(x) AS res
      sst(x, "-th 0.005") AS res
    FROM
      twitter_timeseries
    ORDER BY time ASC
    ;
    ```
    ### Results
    
    ```
    7551    {"changepoint_score":0.00453049288071683,"is_changepoint":false}
    7552    {"changepoint_score":0.004711244102524104,"is_changepoint":false}
    7553    {"changepoint_score":0.004814871928978115,"is_changepoint":false}
    7554    {"changepoint_score":0.004968089640799422,"is_changepoint":false}
    7555    {"changepoint_score":0.005709056330104878,"is_changepoint":true}
    7556    {"changepoint_score":0.0044279766655132,"is_changepoint":false}
    7557    {"changepoint_score":0.0034694956722586268,"is_changepoint":false}
    7558    {"changepoint_score":0.002549056569322694,"is_changepoint":false}
    7559    {"changepoint_score":0.0017395109108403473,"is_changepoint":false}
    7560    {"changepoint_score":0.0010629833145070489,"is_changepoint":false}
    ```
    
    On the naive SVD-based implementation, elapsed time was about 20 sec. for 
the 14398 samples (vs. 10 sec on ChangeFinder)
    ### Observations
    
    The change-point scores are much more stable compared to ChangeFinder, and 
change-point scores are always in [0, 1]. However, since the scores are quite 
noisy, too many change-points are detected. Smoothing scores like ChangeFinder 
is practically important.
    
    In terms of running time, the naive SVD-based implementation is clearly 
inefficient. So, the Lanczos-based efficient variant should be supported.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/myui/incubator-hivemall JIRA-22/pr-356

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/11.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11
    
----
commit 3ebd771ee4bebf14769b7c240f8b28b9d5d10e86
Author: Takuya Kitazawa <[email protected]>
Date:   2016-09-26T08:12:01Z

    Implement initial SST-based change-point detector

commit bde06e0952445bf60a9aef4bca182c0afe87e250
Author: Takuya Kitazawa <[email protected]>
Date:   2016-09-27T05:06:20Z

    Rename SSTChangePoint -> SingularSpectrumTransform

commit 2bfd1270b1e9b79185a41cbe2568f2ce968d4a71
Author: Takuya Kitazawa <[email protected]>
Date:   2016-09-28T02:16:56Z

    Add references for the original SST papers

commit 998203d5e8623d6282c2b187df24e4da7d41c16b
Author: Takuya Kitazawa <[email protected]>
Date:   2016-09-28T10:49:48Z

    Support implicit-Krylov-approximation-based efficient SST

commit cc34435155e86718acb49fa42208aff730bb756c
Author: myui <[email protected]>
Date:   2016-12-02T07:55:23Z

    Merge branch 'sst-changepoint' of https://github.com/takuti/hivemall into 
JIRA-22/pr-356

----


> Review and merge pending Pull Requests before entering Incubator
> ----------------------------------------------------------------
>
>                 Key: HIVEMALL-22
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-22
>             Project: Hivemall
>          Issue Type: New Feature
>            Reporter: Makoto Yui
>            Assignee: Makoto Yui
>
> Need to review and merge pending Pull Requests in
> https://github.com/myui/hivemall/pulls
> * Feature Selection
> https://github.com/myui/hivemall/pull/385
> * SST change point detection
> https://github.com/myui/hivemall/pull/356
> * Checkstyle
> https://github.com/myui/hivemall/pull/343
> * System Test
> https://github.com/myui/hivemall/pull/336
> * Kernelized Passive Aggressive
> https://github.com/myui/hivemall/pull/304
> * Separate Optimizer
> https://github.com/myui/hivemall/pull/285



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to