[ 
https://issues.apache.org/jira/browse/MADLIB-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407138#comment-16407138
 ] 

Jingyi Mei commented on MADLIB-1216:
------------------------------------


pca|pca.sql_in seems to be the slowest one regardless of platform/database 
version. This install check test calls pca_train and pca_sparse_train multiple 
times. It's not trivial to remove these tests and move them to tinc without 
losing code coverage.
We refactored pca_project today, and after refactoring, the runtime goes from 
~37s to ~23s
Modified decision tree to use a smaller array dataset, which reduced run time 
from ~30s to ~9s.
Modified random forest to use less trees, which reduced run time from ~14s to 
~9s
Modified elastic net to not test cross validation, which reduced the run time 
by ~20s.
Spiked on svm, it is also hard to cut down the runtime without losing code 
coverage.

> Fix slowest 3 Install Check on Greenplum
> ----------------------------------------
>
>                 Key: MADLIB-1216
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1216
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Infrastructure: Automated Tests
>            Reporter: Jingyi Mei
>            Assignee: Jingyi Mei
>            Priority: Major
>             Fix For: v1.14
>
>
> We want to find out which are the slowest n install check tests (say n=3) on 
> Greenplum, so that we can reduce the total install check runtime.
> Acceptance
> 1) Run install check on greenplum and it runs faster than before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to