Re: Autoscaling of Spark YARN cluster

cs user Mon, 14 Dec 2015 01:19:28 -0800

Hi Mingyu,

I'd be interested in hearing about anything else you find which might meet
your needs for this.

One way perhaps this could be done would be to use Ambari. Ambari comes
with a nice api which you can use to add additional nodes into a cluster:

https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md

Once the node has been built, the ambari agent installed, you can then call
back to the management node via the api, tell it what you want the new node
to be, and it will connect, configure your new node and add it to the
cluster.

You could create a host group within the cluster blueprint with the minimal
components you need to install to have it operate as a yarn node.

As for the decision to scale, that is outside of the remit of Ambari. I
guess you could look into using aws autoscaling or you could look into a
product called scalr, which has an opensource version. We are using this to
install an ambari cluster using chef to configure the nodes up until the
point it hands over to Ambari.

Scalr allows you to write custom scaling metrics which you could use to
query the # of applications queued, # of resources available values and add
nodes when required.

Cheers!

On Mon, Dec 14, 2015 at 8:57 AM, Mingyu Kim <m...@palantir.com> wrote:

> Hi all,
>
> Has anyone tried out autoscaling Spark YARN cluster on a public cloud
> (e.g. EC2) based on workload? To be clear, I’m interested in scaling the
> cluster itself up and down by adding and removing YARN nodes based on the
> cluster resource utilization (e.g. # of applications queued, # of resources
> available), as opposed to scaling resources assigned to Spark applications,
> which is natively supported by Spark’s dynamic resource scheduling. I’ve
> found that Cloudbreak
> <http://sequenceiq.com/cloudbreak-docs/latest/periscope/#how-it-works> has
> a similar feature, but it’s in “technical preview”, and I didn’t find much
> else from my search.
>
> This might be a general YARN question, but wanted to check if there’s a
> solution popular in the Spark community. Any sharing of experience around
> autoscaling will be helpful!
>
> Thanks,
> Mingyu
>

Re: Autoscaling of Spark YARN cluster

Reply via email to