[
https://issues.apache.org/jira/browse/MADLIB-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-995:
-----------------------------------
Description:
Story
As a data scientist, I want to be able to define multiple symbols that result
in overlapping partitions.
See
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
for a description of what a symbol is.
Currently in 1.9, overlapping partitions are not supported. The default is
non-overlapping, where the path algo begins the next pattern search at the row
that follows the last pattern match (like how grep works in UNIX).
In the case of overlapping, the path algo needs to find every occurrence of the
pattern in the partition, regardless of whether it might have been part of a
previously found match. This means one row can match multiple symbols in a
given matched pattern so there is a dependency on
https://issues.apache.org/jira/browse/MADLIB-943 . There is (small) chance
that this story is a no-op once
https://issues.apache.org/jira/browse/MADLIB-943 is done.
Need to add a new optional BOOLEAN parameter to the interface called
"overlapping_patterns". Default is FALSE.
(While you are at it please fix the docs to indicate that the "persist_rows"
param is optional with default FALSE.)
Acceptance
was:
Story
As a data scientist, I want to be able to define multiple symbols that result
in overlapping partitions.
See
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
for a description of what a symbol is.
Currently in 1.9, overlapping partitions are not supported. The default is
non-overlapping, where the path algo begins the next pattern search at the row
that follows the last pattern match (like how grep works in UNIX).
In the case of overlapping, the path algo needs to find every occurrence of the
pattern in the partition, regardless of whether it might have been part of a
previously found match. This means one row can match multiple symbols in a
given matched pattern so there is a dependency on
https://issues.apache.org/jira/browse/MADLIB-943
Need to add an optional BOOLEAN parameter to the interface called
"overlapping_patterns". Default is FALSE.
(While you are at it please fix the docs to indicate that the "persist_rows"
param is optional with default FALSE.)
Acceptance
> Path - overlapping partitions
> -----------------------------
>
> Key: MADLIB-995
> URL: https://issues.apache.org/jira/browse/MADLIB-995
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to be able to define multiple symbols that result
> in overlapping partitions.
> See
> http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
> for a description of what a symbol is.
> Currently in 1.9, overlapping partitions are not supported. The default is
> non-overlapping, where the path algo begins the next pattern search at the
> row that follows the last pattern match (like how grep works in UNIX).
> In the case of overlapping, the path algo needs to find every occurrence of
> the pattern in the partition, regardless of whether it might have been part
> of a previously found match. This means one row can match multiple symbols in
> a given matched pattern so there is a dependency on
> https://issues.apache.org/jira/browse/MADLIB-943 . There is (small) chance
> that this story is a no-op once
> https://issues.apache.org/jira/browse/MADLIB-943 is done.
> Need to add a new optional BOOLEAN parameter to the interface called
> "overlapping_patterns". Default is FALSE.
> (While you are at it please fix the docs to indicate that the "persist_rows"
> param is optional with default FALSE.)
> Acceptance
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)