[jira] [Commented] (MADLIB-974) Path - performance testing

Jim Nasby (JIRA) Mon, 22 Aug 2016 16:50:46 -0700

    [ 
https://issues.apache.org/jira/browse/MADLIB-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431805#comment-15431805
 ]


Jim Nasby commented on MADLIB-974:
----------------------------------

BTW, it would be nice to show time per each independent variable (such as row 
count) to get a better idea of how performance scales.

> Path - performance testing
> --------------------------
>
>                 Key: MADLIB-974
>                 URL: https://issues.apache.org/jira/browse/MADLIB-974
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Assignee: Xiaocheng Tang
>             Fix For: v1.9.1
>
>         Attachments: Benchmarking Param Design Doc - PATH.pdf, Ecommerce data 
> set for path test 3.csv
>
>
> Story
> As a developer, I want to do performance testing on the Path algorithm so 
> that I can understand and communicate scale effects to users.
> The proposed matrix for the 1st set of tests is:
> 1) overall data size, i.e., number of rows in data sets = 1M, 10M, 100M
> 2) number of partitions = 1k, 10k, 100k
> 3) number of matches per partition = 1k, 10k, 100k
> The proposed matrix for the 2nd set of tests is:
> 4) match "thickness", i.e., number of rows in match = 1, 1k, 10k
> 5) number of symbols =  5, 15, 25
> Acceptance
> 1) Please plot performance curves.  Do not need to run all permutations to 
> keep the size of the test matrix reasonable. 
> E.g., when plotting the effect of number of partitions (#2 above), can fix 
> data size at 10M (say) and number of matches per partition to 1k (say).
> Other
> 1) Can use attached data set as a baseline for duplication/fabrication.
> 2) Another useful data set is at 
> http://csr.lanl.gov/data/auth/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MADLIB-974) Path - performance testing

Reply via email to