There are definitely pros and cons for Scala vs SQL-style CEP. Scala might be more powerful, but the target audience is very different.
How much usage is there for a CEP style SQL syntax in practice? I've never seen it coming up so far. On Tue, Mar 1, 2016 at 9:35 AM, Alex Kozlov <ale...@gmail.com> wrote: > Looked at the paper: while we can argue on the performance side, I think > semantically the Scala pattern matching is much more expressive. The time > will decide. > > On Tue, Mar 1, 2016 at 9:07 AM, Jerry Lam <chiling...@gmail.com> wrote: > >> Hi Alex, >> >> We went through this path already :) This is the reason we try other >> approaches. The recursion makes it very inefficient for some cases. >> For details, this paper describes it very well: >> https://people.cs.umass.edu/%7Eyanlei/publications/sase-sigmod08.pdf >> which is the same paper references in Flink ticket. >> >> Please let me know if I overlook something. Thank you for sharing this! >> >> Best Regards, >> >> Jerry >> >> On Tue, Mar 1, 2016 at 11:58 AM, Alex Kozlov <ale...@gmail.com> wrote: >> >>> For the purpose of full disclosure, I think Scala offers a much more >>> efficient pattern matching paradigm. Using nPath is like using assembler >>> to program distributed systems. Cannot tell much here today, but the >>> pattern would look like: >>> >>> | def matchSessions(h: Seq[Session[PageView]], id: String, p: >>> Seq[PageView]) : >>> >>> Seq[Session[PageView]] = { | p match { >>> >>> | case Nil => Nil >>> >>> | case PageView(ts1, "company.com>homepage") :: >>> PageView(ts2, >>> >>> "company.com>plus>products landing") :: tail if ts2 > ts1 + 600 => >>> >>> | matchSessions(h, id, tail).+:(new Session(id, p)) >>> >>> | case _ => matchSessions(h, id, p.tail) >>> >>> | } >>> >>> Look for Scala case statements with guards and upcoming book releases. >>> >>> http://docs.scala-lang.org/tutorials/tour/pattern-matching >>> >>> https://www.safaribooksonline.com/library/view/scala-cookbook/9781449340292/ch03s14.html >>> >>> On Tue, Mar 1, 2016 at 8:34 AM, Henri Dubois-Ferriere <henr...@gmail.com >>> > wrote: >>> >>>> fwiw Apache Flink just added CEP. Queries are constructed >>>> programmatically rather than in SQL, but the underlying functionality is >>>> similar. >>>> >>>> https://issues.apache.org/jira/browse/FLINK-3215 >>>> >>>> On 1 March 2016 at 08:19, Jerry Lam <chiling...@gmail.com> wrote: >>>> >>>>> Hi Herman, >>>>> >>>>> Thank you for your reply! >>>>> This functionality usually finds its place in financial services which >>>>> use CEP (complex event processing) for correlation and pattern matching. >>>>> Many commercial products have this including Oracle and Teradata Aster >>>>> Data >>>>> MR Analytics. I do agree the syntax a bit awkward but after you understand >>>>> it, it is actually very compact for expressing something that is very >>>>> complex. Esper has this feature partially implemented ( >>>>> http://www.espertech.com/esper/release-5.1.0/esper-reference/html/match-recognize.html >>>>> ). >>>>> >>>>> I found the Teradata Analytics documentation best to describe the >>>>> usage of it. For example (note npath is similar to match_recognize): >>>>> >>>>> SELECT last_pageid, MAX( count_page80 ) >>>>> FROM nPath( >>>>> ON ( SELECT * FROM clicks WHERE category >= 0 ) >>>>> PARTITION BY sessionid >>>>> ORDER BY ts >>>>> PATTERN ( 'A.(B|C)*' ) >>>>> MODE ( OVERLAPPING ) >>>>> SYMBOLS ( pageid = 50 AS A, >>>>> pageid = 80 AS B, >>>>> pageid <> 80 AND category IN (9,10) AS C ) >>>>> RESULT ( LAST ( pageid OF ANY ( A,B,C ) ) AS last_pageid, >>>>> COUNT ( * OF B ) AS count_page80, >>>>> COUNT ( * OF ANY ( A,B,C ) ) AS count_any ) >>>>> ) >>>>> WHERE count_any >= 5 >>>>> GROUP BY last_pageid >>>>> ORDER BY MAX( count_page80 ) >>>>> >>>>> The above means: >>>>> Find user click-paths starting at pageid 50 and passing exclusively >>>>> through either pageid 80 or pages in category 9 or category 10. Find the >>>>> pageid of the last page in the path and count the number of times page 80 >>>>> was visited. Report the maximum count for each last page, and sort the >>>>> output by the latter. Restrict to paths containing at least 5 pages. >>>>> Ignore >>>>> pages in the sequence with category < 0. >>>>> >>>>> If this query is written in pure SQL (if possible at all), it requires >>>>> several self-joins. The interesting thing about this feature is that it >>>>> integrates SQL+Streaming+ML in one (perhaps potentially graph too). >>>>> >>>>> Best Regards, >>>>> >>>>> Jerry >>>>> >>>>> >>>>> On Tue, Mar 1, 2016 at 9:39 AM, Herman van Hövell tot Westerflier < >>>>> hvanhov...@questtec.nl> wrote: >>>>> >>>>>> Hi Jerry, >>>>>> >>>>>> This is not on any roadmap. I (shortly) browsed through this; and >>>>>> this looks like some sort of a window function with very awkward syntax. >>>>>> I >>>>>> think spark provided better constructs for this using >>>>>> dataframes/datasets/nested data... >>>>>> >>>>>> Feel free to submit a PR. >>>>>> >>>>>> Kind regards, >>>>>> >>>>>> Herman van Hövell >>>>>> >>>>>> 2016-03-01 15:16 GMT+01:00 Jerry Lam <chiling...@gmail.com>: >>>>>> >>>>>>> Hi Spark developers, >>>>>>> >>>>>>> Will you consider to add support for implementing "Pattern matching >>>>>>> in sequences of rows"? More specifically, I'm referring to this: >>>>>>> http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf >>>>>>> >>>>>>> This is a very cool/useful feature to pattern matching over live >>>>>>> stream/archived data. It is sorted of related to machine learning >>>>>>> because >>>>>>> this is usually used in clickstream analysis or path analysis. Also it >>>>>>> is >>>>>>> related to streaming because of the nature of the processing (time >>>>>>> series >>>>>>> data mostly). It is SQL because there is a good way to express and >>>>>>> optimize >>>>>>> the query. >>>>>>> >>>>>>> Best Regards, >>>>>>> >>>>>>> Jerry >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Alex Kozlov >>> (408) 507-4987 >>> (650) 887-2135 efax >>> ale...@gmail.com >>> >> >> > > > -- > Alex Kozlov > (408) 507-4987 > (650) 887-2135 efax > ale...@gmail.com >