[ 
https://issues.apache.org/jira/browse/MADLIB-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-995:
-----------------------------------
    Description: 
Story

As a data scientist, I want to be able to define multiple symbols that result 
in overlapping partitions.

See
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
for a description of what a symbol is.

Currently in 1.9, overlapping partitions are not supported. The default is 
non-overlapping, where the path algo begins the next pattern search at the row 
that follows the last pattern match (like how grep works in UNIX).

In the case of overlapping, the path algo needs to find every occurrence of the 
pattern in the partition, regardless of whether it might have been part of a 
previously found match. This means one row can match multiple symbols in a 
given matched pattern so there is a dependency on 
https://issues.apache.org/jira/browse/MADLIB-943

Need to add an optional BOOLEAN parameter to the interface called 
"overlapping_patterns".  Default is FALSE.

(While you are at it please fix the docs to indicate that the "persist_rows" 
param is optional with default FALSE.)

Acceptance



  was:
Story

As a data scientist, I want to be able to define multiple symbols that result 
in overlapping partitions.

See
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
for a description of what a symbol is.

Currently in 1.9, overlapping partitions are not supported. The default is 
non-overlapping, where the path algo begins the next pattern search at the row 
that follows the last pattern match (like how grep works in UNIX).

In the case of overlapping, the path algo needs to find every occurrence of the 
pattern in the partition, regardless of whether it might have been part of a 
previously found match. This means one row can match multiple symbols in a 
given matched pattern so there is a dependency on 
https://issues.apache.org/jira/browse/MADLIB-943

Acceptance




> Path - overlapping partitions
> -----------------------------
>
>                 Key: MADLIB-995
>                 URL: https://issues.apache.org/jira/browse/MADLIB-995
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to be able to define multiple symbols that result 
> in overlapping partitions.
> See
> http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
> for a description of what a symbol is.
> Currently in 1.9, overlapping partitions are not supported. The default is 
> non-overlapping, where the path algo begins the next pattern search at the 
> row that follows the last pattern match (like how grep works in UNIX).
> In the case of overlapping, the path algo needs to find every occurrence of 
> the pattern in the partition, regardless of whether it might have been part 
> of a previously found match. This means one row can match multiple symbols in 
> a given matched pattern so there is a dependency on 
> https://issues.apache.org/jira/browse/MADLIB-943
> Need to add an optional BOOLEAN parameter to the interface called 
> "overlapping_patterns".  Default is FALSE.
> (While you are at it please fix the docs to indicate that the "persist_rows" 
> param is optional with default FALSE.)
> Acceptance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to