Hi Arun,

once we release v0.2.0-incubating, our next milestone will be releasing the
documentation, hopefully both will be out soon.
In order to run Amaterasu on YARN I suggest the following:

   1. Download the binaries from
   https://dist.apache.org/repos/dist/dev/incubator/amaterasu/0.2.0rc2/
   2. Extract the tarball on a node in your YARN cluster where Spark is
   installed
   3. Configure the following
   in apache-amaterasu-0.2.0-incubating-rc2/amaterasu.properties:
      1. *zk:* should point to your zookeeper (comma delimited in case of
      an ensemble)
      2. *mode*=yarn (this is the default value)
      3. *spark.home*: should point to your spark2 home
      4. *yarn.hadoop.home.dir:*  should point to your HADOOP_HOME
   4. submit your job using ama-start-yarn.sh

This should be all in this release, but let us know how it goes.

Cheers,
Yaniv

On Sun, Apr 22, 2018 at 1:45 AM, Arun Manivannan <a...@arunma.com> wrote:

> Hi Yaniv and Eyal,
>
> Sorry about the hiatus.  Day job has been hectic the last couple of months.
>
> I am really glad that we now have full blown YARN support. Thanks a lot !!
>
> Is there a place where I could find a rough document around how to submit
> jobs on YARN. If you could respond to this thread, I am more than happy to
> contribute to the docs.
>
> I would like to do a POC of sorts for one of my projects at work. A really
> dumbed-down version of the application is at :
>
> https://github.com/arunma/ama_datapopulator
> https://github.com/arunma/ama_reconciler
>
> The first Spark job populates the data in a bunch of Hive tables
> The second Spark job runs pre-configured queries against these tables and
> compares them against another data in another Hive table (reconciliation
> table).
>
>
> For now, we can safely assume that there's no data shared between these
> dataframes.
>
> Greatly appreciate your response on the YARN job submission.
>
> Cheers,
> Arun
>



-- 
Yaniv Rodenski

+61 477 778 405
ya...@shinto.io

Reply via email to