[ 
https://issues.apache.org/jira/browse/MRQL-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292588#comment-14292588
 ] 

Hudson commented on MRQL-63:
----------------------------

SUCCESS: Integrated in mrql-master-snapshot #11 (See 
[https://builds.apache.org/job/mrql-master-snapshot/11/])
MRQL-63: Add support for MRQL streaming in spark streaming mode (fegaras: 
https://git-wip-us.apache.org/repos/asf?p=incubator-mrql.git&a=commit&h=e15285c2c97c7b14c25b660b85dee9ad52abf429)
* conf/mrql-env.sh
* core/src/main/java/org/apache/mrql/Streaming.gen
* flink/src/main/java/org/apache/mrql/FlinkStreaming.gen
* spark/pom.xml
* core/src/main/java/org/apache/mrql/Evaluator.java
* core/src/main/java/org/apache/mrql/Config.java
* flink/pom.xml
* queries/streaming.mrql
* core/src/main/java/org/apache/mrql/TopLevel.gen
* pom.xml
* core/src/main/java/org/apache/mrql/PlanGeneration.gen
* core/src/main/java/org/apache/mrql/TypeInference.gen
* spark/src/main/java/org/apache/mrql/SparkFileInputStream.java
* spark/src/main/java/org/apache/mrql/SparkStreaming.gen
* flink/src/main/java/org/apache/mrql/FlinkEvaluator.gen
* spark/src/main/java/org/apache/mrql/SparkEvaluator.gen
* core/src/main/java/org/apache/mrql/DataSetFunction.java
* core/src/main/java/org/apache/mrql/Interpreter.gen


> Add support for MRQL streaming in spark streaming mode
> ------------------------------------------------------
>
>                 Key: MRQL-63
>                 URL: https://issues.apache.org/jira/browse/MRQL-63
>             Project: MRQL
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.9.4
>            Reporter: Leonidas Fegaras
>            Assignee: Leonidas Fegaras
>         Attachments: MRQL-63.patch
>
>
> This patch introduces a major extension to MRQL, called MRQL streaming.
> We can now run continuous MRQL queries on streams of data.
> Currently, it works on Spark Streaming only but we may add support for Flink 
> Streaming and/or Storm in the future.
> It has been tested in Spark local mode and in Spark distributed mode on a 
> Yarn cluster.
> MRQL now supports window-based streaming based on a sliding window during a 
> certain time interval. To support MRQL streaming, you need to add the 
> parameter "-stream t" to the mrql command, where t is the time interval in 
> milliseconds. Then MRQL will processes the new batch of data in the input 
> streams every t milliseconds.
> A stream source in MRQL takes the form stream(...), which has the same 
> parameters as the source(...) form. For example:
> {code:SQL}
> select (k,avg(p.Y))
> from p in stream(binary,"tmp/points.bin")
> group by k: p.X;
> {code}
> This query process all sequence files in the directory tmp/points.bin and 
> then checks this directory every t milliseconds for new files. When a new 
> file is inserted in the directory (or if the modification time of an existing 
> file changes), it processes the new files. One may work on multiple files and 
> the query may contain both stream and regular data sources. If there is at 
> least one stream source, the query becomes continuous (never stops). One may 
> dump the output stream to binary or CVS files using the existing MRQL syntax:
> {code:SQL}
> store "tmp/out" from e
> {code}
> This dumps the output of the continuous query e into tmp/out/f1, tmp/out/f2, 
> ... etc.
> Example for testing:
> First create data:
> {quote}
> mrql.spark -local queries/points.mrql 100
> {quote}
> Then run the continuous query:
> {quote}
> mrql.spark -local -stream 1000 queries/streaming.mrql
> {quote}
> On a separate terminal, you can type:
> {quote}
> touch tmp/points.bin/part-00000
> {quote}
> to process a new batch of data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to