[ 
https://issues.apache.org/jira/browse/MADLIB-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-974:
-----------------------------------
    Description: 
Story

As a developer, I want to do performance testing on the Path algorithm so that 
I can understand and communicate scale effects to users.

The proposed matrix for the 1st set of tests is:

1) overall data size, i.e., number of rows in data sets = 1M, 10M, 100M
2) number of partitions = 1k, 10k, 100k
3) number of matches per partition = 1k, 10k, 100k

The proposed matrix for the 2nd set of tests is:

4) match "thickness", i.e., number of rows in match = 1, 1k, 10k
5) number of symbols =  5, 15, 25

Acceptance

1) Please plot performance curves.  Do not need to run all permutations to keep 
the size of the test matrix reasonable. 
E.g., when plotting the effect of number of partitions (#2 above), can fix data 
size at 10M (say) and number of matches per partition to 1k (say).

Other

1) Can use attached data set as a baseline for duplication/fabrication.

2) Another useful data set is at 
http://csr.lanl.gov/data/auth/



  was:
Story

As a user, I want to define symbols so that I can define a regular expression 
of symbols to identify sequences of events that I care about.  

Partition:
1) Limited to 1 match per partition in this story.  

2) Note that the match in the data might not span the whole partition, that is, 
that matched rows could just be a subset of the rows in the partition.

Window:
1) Support multiple windows per partition.

Other:
1) Need to define interface for this feature.
2) If the story https://issues.apache.org/jira/browse/MADLIB-917 is done first, 
then this story could actually be called: "Path - pattern match (multiple 
matches per partition, multiple windows per match)


> Path - performance testing
> --------------------------
>
>                 Key: MADLIB-974
>                 URL: https://issues.apache.org/jira/browse/MADLIB-974
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Assignee: Rahul Iyer
>             Fix For: v1.9
>
>
> Story
> As a developer, I want to do performance testing on the Path algorithm so 
> that I can understand and communicate scale effects to users.
> The proposed matrix for the 1st set of tests is:
> 1) overall data size, i.e., number of rows in data sets = 1M, 10M, 100M
> 2) number of partitions = 1k, 10k, 100k
> 3) number of matches per partition = 1k, 10k, 100k
> The proposed matrix for the 2nd set of tests is:
> 4) match "thickness", i.e., number of rows in match = 1, 1k, 10k
> 5) number of symbols =  5, 15, 25
> Acceptance
> 1) Please plot performance curves.  Do not need to run all permutations to 
> keep the size of the test matrix reasonable. 
> E.g., when plotting the effect of number of partitions (#2 above), can fix 
> data size at 10M (say) and number of matches per partition to 1k (say).
> Other
> 1) Can use attached data set as a baseline for duplication/fabrication.
> 2) Another useful data set is at 
> http://csr.lanl.gov/data/auth/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to