[
https://issues.apache.org/jira/browse/MADLIB-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431805#comment-15431805
]
Jim Nasby commented on MADLIB-974:
----------------------------------
BTW, it would be nice to show time per each independent variable (such as row
count) to get a better idea of how performance scales.
> Path - performance testing
> --------------------------
>
> Key: MADLIB-974
> URL: https://issues.apache.org/jira/browse/MADLIB-974
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Assignee: Xiaocheng Tang
> Fix For: v1.9.1
>
> Attachments: Benchmarking Param Design Doc - PATH.pdf, Ecommerce data
> set for path test 3.csv
>
>
> Story
> As a developer, I want to do performance testing on the Path algorithm so
> that I can understand and communicate scale effects to users.
> The proposed matrix for the 1st set of tests is:
> 1) overall data size, i.e., number of rows in data sets = 1M, 10M, 100M
> 2) number of partitions = 1k, 10k, 100k
> 3) number of matches per partition = 1k, 10k, 100k
> The proposed matrix for the 2nd set of tests is:
> 4) match "thickness", i.e., number of rows in match = 1, 1k, 10k
> 5) number of symbols = 5, 15, 25
> Acceptance
> 1) Please plot performance curves. Do not need to run all permutations to
> keep the size of the test matrix reasonable.
> E.g., when plotting the effect of number of partitions (#2 above), can fix
> data size at 10M (say) and number of matches per partition to 1k (say).
> Other
> 1) Can use attached data set as a baseline for duplication/fabrication.
> 2) Another useful data set is at
> http://csr.lanl.gov/data/auth/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)