[ 
https://issues.apache.org/jira/browse/MRQL-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714708#comment-13714708
 ] 

Leonidas Fegaras commented on MRQL-12:
--------------------------------------

I have only tested the code for correctness in local mode using the MRQL 
testbed (use 'make spark' to compile it and 'make validate_spark' to run the 
testbed queries). It needs some fine-tuning. It will be interesting to compare 
it with the Hama for my benchmark MRQL queries (kmeans & pagerank) on a real 
cluster.
By the way, they have already provided an implementation of Giraph on top of 
Spark. So I am sure that one can implement the Hama user interface using Spark 
too, although I am guessing it would not be as fast as the Hama implementation.

                
> Support query evaluation in Spark mode
> --------------------------------------
>
>                 Key: MRQL-12
>                 URL: https://issues.apache.org/jira/browse/MRQL-12
>             Project: MRQL
>          Issue Type: Improvement
>          Components: Run-Time Data
>    Affects Versions: 0.9.0
>         Environment: Apache Spark http://spark-project.org/
>            Reporter: Leonidas Fegaras
>            Assignee: Leonidas Fegaras
>         Attachments: Evaluator.gen, MRQL-12.patch
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Spark provides primitives for in-memory cluster computing 
> (http://spark-project.org/). It has been developed at UC Berkeley and has 
> recently accepted as an ASF incubating project. It has already attracted many 
> developers and I think it will play a major role in the hadoop ecosystem. So, 
> I thought it will be nice to be able to evaluate MRQL queries in a Spark 
> cluster. Spark already supports Hive (called Shark). Like Hama, Spark can 
> evaluate queries in memory but unlike Hama, it supports full fault-tolerance. 
> I have already written all the code but I have only tested it in local mode 
> (on a single multi-core node). This task turned out to be easier than I 
> thought because MRQL plans are similar to Spark operations. The only 
> annoyance was that I had to make all data structures Serializable. I also had 
> to include the Gen source code (the Java preprocessor), with ASF licence, 
> which will make the transition to maven easier.
> I am attaching the patch below. The actual code that contains the Spark 
> evaluator is the file Evaluator.gen which is attached separately. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to