Re: MRQL on Flink

Robert Metzger Thu, 28 Aug 2014 05:33:59 -0700

Amazing.
In my opinion, we should cross-link our projects on the websites. Maybe we
should add a section on our website where we list projects we depend on and
projects depending on us.
A little blog post / news on our website (once a MRQL release with Flink
support is out) can also draw some attention to this great work!


I've tried following your instructions and found one issue with Java 8 on
the way: https://issues.apache.org/jira/browse/MRQL-46
I think the classpath setup of the mrql scripts assumes that the user has a
flink yarn uberjar file (one fat-jar with everything). I've first tried it
with a regular "hadoop2" build of flink.
We should probably generalize the classpath setup there a bit (to include
all "flink-" prefixed jar files into the classpath).

After I've sorted out these issues, mrql was working.
Is the local mode actually using Flink's local execution?
The output said:
Apache MRQL version 0.9.4 (interpreted local Flink mode using 2 tasks)
Query type: ( int, int, int, int ) -> ( int, int )
Query type: !bag(( int, int ))
Physical plan:
MapReduce:
   input: Generator

In particular the "MapReduce" there was confusing me.
I hope to find some more time soon to look closer into the MRQL query
language.

Robert



On Thu, Aug 28, 2014 at 10:58 AM, Fabian Hueske <[email protected]> wrote:

> That's really cool!
>
> I'm also curious about your experience with Flink. Did you find major
> obstacles that you needed to overcome for the integration?
> Is there some write-up / report available somewhere (maybe in JIRA) that
> discusses the integration? Are you using Flink's full operator set or do
> you compile everything into Map and Reduce?
>
> Best, Fabian
>
>
> 2014-08-28 7:37 GMT+02:00 Aljoscha Krettek <[email protected]>:
>
> > Very nice indeed! How well is this tested? Can it already run all the
> > example queries you have? Can you say anything about the performance
> > of the different underlying execution engines?
> >
> > On Thu, Aug 28, 2014 at 12:58 AM, Stephan Ewen <[email protected]> wrote:
> > > Wow, that is impressive!
> > >
> > >
> > > On Thu, Aug 28, 2014 at 12:06 AM, Ufuk Celebi <[email protected]> wrote:
> > >
> > >> Awesome, indeed! Looking forward to trying it out. :)
> > >>
> > >>
> > >> On Wed, Aug 27, 2014 at 10:52 PM, Sebastian Schelter <[email protected]>
> > >> wrote:
> > >>
> > >> > Awesome!
> > >> >
> > >> >
> > >> > 2014-08-27 13:49 GMT-07:00 Leonidas Fegaras <[email protected]>:
> > >> >
> > >> > > Hello,
> > >> > > I would like to let you know that Apache MRQL can now run queries
> on
> > >> > Flink.
> > >> > > MRQL is a query processing and optimization system for
> large-scale,
> > >> > > distributed data analysis, built on top of Apache
> Hadoop/map-reduce,
> > >> > > Hama, Spark, and now Flink. MRQL queries are SQL-like but not SQL.
> > >> > > They can work on complex, user-defined data (such as JSON and XML)
> > and
> > >> > > can express complex queries (such as pagerank and matrix
> > >> factorization).
> > >> > >
> > >> > > MRQL on Flink has been tested on local mode and on a small Yarn
> > >> cluster.
> > >> > >
> > >> > > Here are the directions on how to build the latest MRQL snapshot:
> > >> > >
> > >> > > git clone
> > https://git-wip-us.apache.org/repos/asf/incubator-mrql.git
> > >> > mrql
> > >> > > cd mrql
> > >> > > mvn -Pyarn clean install
> > >> > >
> > >> > > To make it run on your cluster, edit conf/mrql-env.sh and set the
> > >> > > Java, the Hadoop, and the Flink installation directories.
> > >> > >
> > >> > > Here is how to run PageRank. First, you need to generate a random
> > >> > > graph and store it in a file using the MRQL query RMAT.mrql:
> > >> > >
> > >> > > bin/mrql.flink -local queries/RMAT.mrql 1000 10000
> > >> > >
> > >> > > This will create a graph with 1K nodes and 10K edges using the
> RMAT
> > >> > > algorithm, will remove duplicate edges, and will store the graph
> in
> > >> > > the binary file graph.bin. Then, run PageRank on Flink mode using:
> > >> > >
> > >> > > bin/mrql.flink -local queries/pagerank.mrql
> > >> > >
> > >> > > To run MRQL/Flink on a Yarn cluster, first start the Flink
> container
> > >> > > on Yarn by running the script yarn-session.sh, such as:
> > >> > >
> > >> > > ${FLINK_HOME}/bin/yarn-session.sh -n 8
> > >> > >
> > >> > > This will print the name of the Flink JobManager, which can be
> used
> > in:
> > >> > >
> > >> > > export FLINK_MASTER=name-of-the-Flink-JobManager
> > >> > > bin/mrql.flink -dist -nodes 16 queries/RMAT.mrql 1000000 10000000
> > >> > >
> > >> > > This will create a graph with 1M nodes and 10M edges using RMAT on
> > 16
> > >> > > nodes (slaves). You can adjust these numbers to fit your cluster.
> > >> > > Then, run PageRank using:
> > >> > >
> > >> > > bin/mrql.flink -dist -nodes 16 queries/pagerank.mrql
> > >> > >
> > >> > > The MRQL project page is at: http://mrql.incubator.apache.org/
> > >> > >
> > >> > > Let me know if you have any questions.
> > >> > > Leonidas Fegaras
> > >> > >
> > >> > >
> > >> >
> > >>
> >
>

Re: MRQL on Flink

Reply via email to