[ 
https://issues.apache.org/jira/browse/MADLIB-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399790#comment-15399790
 ] 

Frank McQuillan commented on MADLIB-995:
----------------------------------------

This seems to work from my testing.  For the example in the attachment:

{code:sql}
DROP TABLE IF EXISTS path_output, path_output_tuples;

SELECT madlib.path(                                                             
      
     'weblog',                          -- Name of the table                    
                       
     'path_output',             -- Table name to store the path results         
                
      NULL,     -- No partitions                 
     'event_timestamp ASC',             -- Time asc        
     $$ FEMALE:=gender='Female',
        UNKNOWN:=gender='Unknown',
        MALE:=gender='Male'
     $$,        -- Definition of various symbols used in the pattern definition 
     '(UNKNOWN)(FEMALE)(UNKNOWN)', 
     NULL,      -- No agg
     TRUE,  -- Persist matches
    TRUE -- overlapping patterns
     );

SELECT * FROM path_output_tuples ORDER BY match_id, event_timestamp ASC;
{code}

produces:

{code}
   event_timestamp   | user_id | age_group | income_group | gender  | region  | 
household_size | click_event | purchase_event | revenue | margin | symbol  | 
match_id 
---------------------+---------+-----------+--------------+---------+---------+----------------+-------------+----------------+---------+--------+---------+----------
 2012-04-15 07:02:00 |  100821 |         1 |            4 | Unknown | West    | 
             3 |           1 |              1 |     118 |     39 | UNKNOWN |    
    1
 2012-04-15 08:51:00 |  102201 |         3 |            3 | Female  | East    | 
             3 |           0 |              0 |       0 |      0 | FEMALE  |    
    1
 2012-04-15 09:28:00 |  101121 |         2 |            2 | Unknown | West    | 
             4 |           1 |              1 |     103 |     32 | UNKNOWN |    
    1
 2012-04-15 09:28:00 |  101121 |         2 |            2 | Unknown | West    | 
             4 |           1 |              1 |     103 |     32 | UNKNOWN |    
    2
 2012-04-15 10:19:00 |  103711 |         4 |            3 | Female  | Central | 
             5 |           0 |              0 |       0 |      0 | FEMALE  |    
    2
 2012-04-15 11:40:00 |  100821 |         1 |            4 | Unknown | West    | 
             3 |           0 |              0 |       0 |      0 | UNKNOWN |    
    2
 2012-04-16 02:12:00 |  100821 |         1 |            4 | Unknown | West    | 
             3 |           1 |              1 |     153 |     26 | UNKNOWN |    
    3
 2012-04-16 04:20:00 |  102201 |         3 |            3 | Female  | East    | 
             3 |           0 |              0 |       0 |      0 | FEMALE  |    
    3
 2012-04-16 05:38:00 |  101121 |         2 |            2 | Unknown | West    | 
             4 |           1 |              0 |       0 |      0 | UNKNOWN |    
    3
 2012-04-16 20:46:00 |  101121 |         2 |            2 | Unknown | West    | 
             4 |           1 |              1 |     131 |     28 | UNKNOWN |    
    4
 2012-04-16 21:11:00 |  101331 |         2 |            4 | Female  | East    | 
             5 |           1 |              1 |     127 |     27 | FEMALE  |    
    4
 2012-04-16 22:35:00 |  101121 |         2 |            2 | Unknown | West    | 
             4 |           0 |              0 |       0 |      0 | UNKNOWN |    
    4
(12 rows)
{code}

as expected.

> Path - overlapping partitions
> -----------------------------
>
>                 Key: MADLIB-995
>                 URL: https://issues.apache.org/jira/browse/MADLIB-995
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v1.9.1
>
>         Attachments: Ecommerce data set for path test 3.csv, 
> path-overlapping-patterns.ipynb
>
>
> Story
> As a data scientist, I want to be able to define multiple symbols that result 
> in overlapping partitions.
> See
> http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
> for a description of what a symbol is.
> Currently in 1.9, overlapping partitions are not supported. The default is 
> non-overlapping, where the path algo begins the next pattern search at the 
> row that follows the last pattern match (like how grep works in UNIX).
> In the case of overlapping, the path algo needs to find every occurrence of 
> the pattern in the partition, regardless of whether it might have been part 
> of a previously found match. This means one row can match multiple symbols in 
> a given matched pattern so there is a dependency on 
> https://issues.apache.org/jira/browse/MADLIB-943 .  There is (small) chance 
> that this story is a no-op once 
> https://issues.apache.org/jira/browse/MADLIB-943 is done.
> Need to add a new optional BOOLEAN parameter to the interface called 
> "overlapping_patterns".  Default is FALSE.
> (While you are at it please fix the docs to indicate that the "persist_rows" 
> param is optional with default FALSE.)
> Acceptance
> The attached data set and query should should produce the following output:
> Event Timestamp       User ID Age Group       Income Group    Gender  Region  
> Household Size  Click Event     Purchase Event  Revenue Margin  Match ID
> 4/15/12 7:02  100821  1       4       Unknown West    3       1       1       
> 118     39      1
> 4/15/12 8:51  102201  3       3       Female  East    3       0       0       
> 0       0       1
> 4/15/12 9:28  101121  2       2       Unknown West    4       1       1       
> 103     32      1,2
> 4/15/12 10:19 103711  4       3       Female  Central 5       0       0       
> 0       0       2
> 4/15/12 11:40 100821  1       4       Unknown West    3       0       0       
> 0       0       2
> 4/16/12 2:12  100821  1       4       Unknown West    3       1       1       
> 153     26      3
> 4/16/12 4:20  102201  3       3       Female  East    3       0       0       
> 0       0       3
> 4/16/12 5:38  101121  2       2       Unknown West    4       1       0       
> 0       0       3
> 4/16/12 20:46 101121  2       2       Unknown West    4       1       1       
> 131     28      4
> 4/16/12 21:11 101331  2       4       Female  East    5       1       1       
> 127     27      4
> 4/16/12 22:35 101121  2       2       Unknown West    4       0       0       
> 0       0       4
> There are 4 pattern matches.  The 1st and the 2nd overlap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to