RE: [DISCUSS] Colo vs Cluster

Srikanth Sundarrajan Sat, 10 Jan 2015 08:24:07 -0800

I have some background on why this this is implemented in this form. We 
originally envisaged (and actually have such setup in my current organization 
where Falcon is being used) the possibility of using Falcon in geographically 
distributed setup where there is a falcon server in each physical location, 
with prism providing an over-arching global view. We also realized the 
possibility of multiple clusters in one single physical location. Since Falcon 
is delegating much of its orchestration to Oozie, we tried to avoid mandating 
one falcon instance per cluster (reducing instance overheads). This means that 
a single falcon server could technically work with all the clusters or more 
than one cluster. So the requirement is largely focused around deployment and 
operational simplicity.


The fact that CLI is using a colo as an option should be orthogonal to this. It 
should be fairly easy to use cluster based switch should that make sense from a 
end user perspective. This however has no bearing on how falcon server is 
deployed and what is the level of consolidation (with respect to hadoop 
clusters) in a single falcon server instance.

So to answer @Ajay's original query, since falcon server makes no assumption 
about what cluster it is operating on, the notion of "current cluster" doesn't 
exist.

Regards
Srikanth Sundarrajan

> Subject: Re: [DISCUSS] Colo vs Cluster
> From: [email protected]
> Date: Sat, 10 Jan 2015 01:09:41 +0530
> To: [email protected]
> 
> +1 & -1.
> I believe CLI also uses this 'colo' thing.
> Before any updates, my thought would be someone with full understanding of 
> falcon should take time to analyze the impact. 
> 
> > On 09-Jan-2015, at 11:12 pm, Seetharam Venkatesh <[email protected]> 
> > wrote:
> > 
> > Lets remove what is not necessary - cluster is what should be qualified
> > IMO. Colo is logical.
> > 
> >> On Fri, Jan 9, 2015 at 3:27 AM, Ajay Yadav <[email protected]> wrote:
> >> 
> >> Hi,
> >> 
> >>   1. Processes and feeds operate at a cluster level but when we operate on
> >>   instance command we have to give colo.
> >>   2. In falcon some commands use clusters e.g. entity-summary, and some
> >>   feed instance commands also accept a *sourceClusters* option (by the way
> >>   this is not documented)  while many instance commands use colo e.g.
> >>   schedule command
> >>   3. I can find the current colo but I can't find the current cluster for
> >>   a process instance. So in a single colo multi cluster setup  how do
> >>   I(user/developer) disambiguate single process instance if there is no
> >>   cluster option.
> >> 
> >> 
> >> Do we need to be aware of colocation of the two clusters? If yes, then do
> >> the benefits outweigh the confusion and other overhead like extra code? I
> >> have heard about some legacy reasons to do so but I am not sure if there
> >> are any currently relevant reasons as well. Would like to hear everyone's
> >> thoughts on this.
> >> 
> >> 
> >> 
> >> Regards
> >> Ajay Yadava
> > 
> > 
> > 
> > -- 
> > Regards,
> > Venkatesh
> > 
> > “Perfection (in design) is achieved not when there is nothing more to add,
> > but rather when there is nothing more to take away.”
> > - Antoine de Saint-Exupéry

RE: [DISCUSS] Colo vs Cluster

Reply via email to