Looked at the paper: while we can argue on the performance side, I think semantically the Scala pattern matching is much more expressive. The time will decide.
On Tue, Mar 1, 2016 at 9:07 AM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Alex, > > We went through this path already :) This is the reason we try other > approaches. The recursion makes it very inefficient for some cases. > For details, this paper describes it very well: > https://people.cs.umass.edu/%7Eyanlei/publications/sase-sigmod08.pdf > which is the same paper references in Flink ticket. > > Please let me know if I overlook something. Thank you for sharing this! > > Best Regards, > > Jerry > > On Tue, Mar 1, 2016 at 11:58 AM, Alex Kozlov <ale...@gmail.com> wrote: > >> For the purpose of full disclosure, I think Scala offers a much more >> efficient pattern matching paradigm. Using nPath is like using assembler >> to program distributed systems. Cannot tell much here today, but the >> pattern would look like: >> >> | def matchSessions(h: Seq[Session[PageView]], id: String, p: >> Seq[PageView]) : >> >> Seq[Session[PageView]] = { | p match { >> >> | case Nil => Nil >> >> | case PageView(ts1, "company.com>homepage") :: >> PageView(ts2, >> >> "company.com>plus>products landing") :: tail if ts2 > ts1 + 600 => >> >> | matchSessions(h, id, tail).+:(new Session(id, p)) >> >> | case _ => matchSessions(h, id, p.tail) >> >> | } >> >> Look for Scala case statements with guards and upcoming book releases. >> >> http://docs.scala-lang.org/tutorials/tour/pattern-matching >> >> https://www.safaribooksonline.com/library/view/scala-cookbook/9781449340292/ch03s14.html >> >> On Tue, Mar 1, 2016 at 8:34 AM, Henri Dubois-Ferriere <henr...@gmail.com> >> wrote: >> >>> fwiw Apache Flink just added CEP. Queries are constructed >>> programmatically rather than in SQL, but the underlying functionality is >>> similar. >>> >>> https://issues.apache.org/jira/browse/FLINK-3215 >>> >>> On 1 March 2016 at 08:19, Jerry Lam <chiling...@gmail.com> wrote: >>> >>>> Hi Herman, >>>> >>>> Thank you for your reply! >>>> This functionality usually finds its place in financial services which >>>> use CEP (complex event processing) for correlation and pattern matching. >>>> Many commercial products have this including Oracle and Teradata Aster Data >>>> MR Analytics. I do agree the syntax a bit awkward but after you understand >>>> it, it is actually very compact for expressing something that is very >>>> complex. Esper has this feature partially implemented ( >>>> http://www.espertech.com/esper/release-5.1.0/esper-reference/html/match-recognize.html >>>> ). >>>> >>>> I found the Teradata Analytics documentation best to describe the usage >>>> of it. For example (note npath is similar to match_recognize): >>>> >>>> SELECT last_pageid, MAX( count_page80 ) >>>> FROM nPath( >>>> ON ( SELECT * FROM clicks WHERE category >= 0 ) >>>> PARTITION BY sessionid >>>> ORDER BY ts >>>> PATTERN ( 'A.(B|C)*' ) >>>> MODE ( OVERLAPPING ) >>>> SYMBOLS ( pageid = 50 AS A, >>>> pageid = 80 AS B, >>>> pageid <> 80 AND category IN (9,10) AS C ) >>>> RESULT ( LAST ( pageid OF ANY ( A,B,C ) ) AS last_pageid, >>>> COUNT ( * OF B ) AS count_page80, >>>> COUNT ( * OF ANY ( A,B,C ) ) AS count_any ) >>>> ) >>>> WHERE count_any >= 5 >>>> GROUP BY last_pageid >>>> ORDER BY MAX( count_page80 ) >>>> >>>> The above means: >>>> Find user click-paths starting at pageid 50 and passing exclusively >>>> through either pageid 80 or pages in category 9 or category 10. Find the >>>> pageid of the last page in the path and count the number of times page 80 >>>> was visited. Report the maximum count for each last page, and sort the >>>> output by the latter. Restrict to paths containing at least 5 pages. Ignore >>>> pages in the sequence with category < 0. >>>> >>>> If this query is written in pure SQL (if possible at all), it requires >>>> several self-joins. The interesting thing about this feature is that it >>>> integrates SQL+Streaming+ML in one (perhaps potentially graph too). >>>> >>>> Best Regards, >>>> >>>> Jerry >>>> >>>> >>>> On Tue, Mar 1, 2016 at 9:39 AM, Herman van Hövell tot Westerflier < >>>> hvanhov...@questtec.nl> wrote: >>>> >>>>> Hi Jerry, >>>>> >>>>> This is not on any roadmap. I (shortly) browsed through this; and this >>>>> looks like some sort of a window function with very awkward syntax. I >>>>> think >>>>> spark provided better constructs for this using dataframes/datasets/nested >>>>> data... >>>>> >>>>> Feel free to submit a PR. >>>>> >>>>> Kind regards, >>>>> >>>>> Herman van Hövell >>>>> >>>>> 2016-03-01 15:16 GMT+01:00 Jerry Lam <chiling...@gmail.com>: >>>>> >>>>>> Hi Spark developers, >>>>>> >>>>>> Will you consider to add support for implementing "Pattern matching >>>>>> in sequences of rows"? More specifically, I'm referring to this: >>>>>> http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf >>>>>> >>>>>> This is a very cool/useful feature to pattern matching over live >>>>>> stream/archived data. It is sorted of related to machine learning because >>>>>> this is usually used in clickstream analysis or path analysis. Also it is >>>>>> related to streaming because of the nature of the processing (time series >>>>>> data mostly). It is SQL because there is a good way to express and >>>>>> optimize >>>>>> the query. >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Jerry >>>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> Alex Kozlov >> (408) 507-4987 >> (650) 887-2135 efax >> ale...@gmail.com >> > > -- Alex Kozlov (408) 507-4987 (650) 887-2135 efax ale...@gmail.com