Hi Arun, once we release v0.2.0-incubating, our next milestone will be releasing the documentation, hopefully both will be out soon. In order to run Amaterasu on YARN I suggest the following:
1. Download the binaries from https://dist.apache.org/repos/dist/dev/incubator/amaterasu/0.2.0rc2/ 2. Extract the tarball on a node in your YARN cluster where Spark is installed 3. Configure the following in apache-amaterasu-0.2.0-incubating-rc2/amaterasu.properties: 1. *zk:* should point to your zookeeper (comma delimited in case of an ensemble) 2. *mode*=yarn (this is the default value) 3. *spark.home*: should point to your spark2 home 4. *yarn.hadoop.home.dir:* should point to your HADOOP_HOME 4. submit your job using ama-start-yarn.sh This should be all in this release, but let us know how it goes. Cheers, Yaniv On Sun, Apr 22, 2018 at 1:45 AM, Arun Manivannan <a...@arunma.com> wrote: > Hi Yaniv and Eyal, > > Sorry about the hiatus. Day job has been hectic the last couple of months. > > I am really glad that we now have full blown YARN support. Thanks a lot !! > > Is there a place where I could find a rough document around how to submit > jobs on YARN. If you could respond to this thread, I am more than happy to > contribute to the docs. > > I would like to do a POC of sorts for one of my projects at work. A really > dumbed-down version of the application is at : > > https://github.com/arunma/ama_datapopulator > https://github.com/arunma/ama_reconciler > > The first Spark job populates the data in a bunch of Hive tables > The second Spark job runs pre-configured queries against these tables and > compares them against another data in another Hive table (reconciliation > table). > > > For now, we can safely assume that there's no data shared between these > dataframes. > > Greatly appreciate your response on the YARN job submission. > > Cheers, > Arun > -- Yaniv Rodenski +61 477 778 405 ya...@shinto.io