[
https://issues.apache.org/jira/browse/MADLIB-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-974:
-----------------------------------
Description:
Story
As a developer, I want to do performance testing on the Path algorithm so that
I can understand and communicate scale effects to users.
The proposed matrix for the 1st set of tests is:
1) overall data size, i.e., number of rows in data sets = 1M, 10M, 100M
2) number of partitions = 1k, 10k, 100k
3) number of matches per partition = 1k, 10k, 100k
The proposed matrix for the 2nd set of tests is:
4) match "thickness", i.e., number of rows in match = 1, 1k, 10k
5) number of symbols = 5, 15, 25
Acceptance
1) Please plot performance curves. Do not need to run all permutations to keep
the size of the test matrix reasonable.
E.g., when plotting the effect of number of partitions (#2 above), can fix data
size at 10M (say) and number of matches per partition to 1k (say).
Other
1) Can use attached data set as a baseline for duplication/fabrication.
2) Another useful data set is at
http://csr.lanl.gov/data/auth/
was:
Story
As a user, I want to define symbols so that I can define a regular expression
of symbols to identify sequences of events that I care about.
Partition:
1) Limited to 1 match per partition in this story.
2) Note that the match in the data might not span the whole partition, that is,
that matched rows could just be a subset of the rows in the partition.
Window:
1) Support multiple windows per partition.
Other:
1) Need to define interface for this feature.
2) If the story https://issues.apache.org/jira/browse/MADLIB-917 is done first,
then this story could actually be called: "Path - pattern match (multiple
matches per partition, multiple windows per match)
> Path - performance testing
> --------------------------
>
> Key: MADLIB-974
> URL: https://issues.apache.org/jira/browse/MADLIB-974
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Assignee: Rahul Iyer
> Fix For: v1.9
>
>
> Story
> As a developer, I want to do performance testing on the Path algorithm so
> that I can understand and communicate scale effects to users.
> The proposed matrix for the 1st set of tests is:
> 1) overall data size, i.e., number of rows in data sets = 1M, 10M, 100M
> 2) number of partitions = 1k, 10k, 100k
> 3) number of matches per partition = 1k, 10k, 100k
> The proposed matrix for the 2nd set of tests is:
> 4) match "thickness", i.e., number of rows in match = 1, 1k, 10k
> 5) number of symbols = 5, 15, 25
> Acceptance
> 1) Please plot performance curves. Do not need to run all permutations to
> keep the size of the test matrix reasonable.
> E.g., when plotting the effect of number of partitions (#2 above), can fix
> data size at 10M (say) and number of matches per partition to 1k (say).
> Other
> 1) Can use attached data set as a baseline for duplication/fabrication.
> 2) Another useful data set is at
> http://csr.lanl.gov/data/auth/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)