Re: Keeping Compute Nodes seperate from the region server node-- pros and cons

Ninad Raut Sun, 17 May 2009 00:09:04 -0700

Hi Andy,
I am using EC2 cluster with large server grade machines. Hense, availability
cannot be determined as the cluster nodes can change ip overtime.
Yes, I am interested in replication. What should be a ideal design in this
case?


On Fri, May 15, 2009 at 9:33 PM, Andrew Purtell <[email protected]> wrote:

> Hi Ninad,
>
> I think scenario 1 is fine for your case, < 20 nodes up on EC2.
>
> Are you planning to deploy Hadoop+HBase clusters in more than one
> availability zone? Interested in or implementing replication between?
>
> Best regards,
>
>    - Andy
>
>
>
>
> ________________________________
> From: Ninad Raut <[email protected]>
> To: [email protected]
> Cc: Ranjit Nair <[email protected]>
> Sent: Thursday, May 14, 2009 11:30:05 PM
> Subject: Re: Keeping Compute Nodes seperate from the region server node--
> pros  and cons
>
> Hi Andy,
> Thanks for the tip.
> I have a EC2 cluster with 6 nodes. Each a server grade large instance. I
> have the mapred & regionservers running on all the nodes. Our deployment
> will not go beyond 20 clusters in the near future. What would you suggest
> me
> to have? Scenario 1 or 2 as u mentioned ?
>
> On Thu, May 14, 2009 at 10:44 PM, Andrew Purtell <[email protected]
> >wrote:
>
> > Hi Ninad,
> >
> > I think the answer depends on the anticipated scale of the deployment.
> >
> > For small clusters (up to a few racks, ~40 servers per rack) I don't
> think
> > there is any significant performance hit to separate storage and
> > computation. Presumably all servers will share the same large GigE switch
> --
> > or maybe a redundant L2 pair via bonded interfaces for fail over -- or a
> few
> > of them stacked with high speed interconnects. This would relieve the
> > storage nodes of RAM and CPU burden related to the computational tasks as
> > you are thinking, providing more headroom in exchange for some quite
> modest
> > performance penalty. (However, if your computation load is high and
> > therefore the nodes are overburdened and are not stable, there is no
> > alternative...) In the future this consideration might change if DFS
> clients
> > are given some capability to find blocks on local disk via some optimized
> > I/O path.
> >
> > In a large cluster there might well be significant performance impact. In
> a
> > common deployment scenario, there are rack-local switched fabrics and
> > another switched fabric for uplinks from the racks. So, a rack would have
> a
> > switched GigE backplane or similar, but inter-rack connections might be
> > single GigE uplinks, a ~40-to-1 reduction in capacity worst case; or
> maybe
> > 10 GigE uplinks, a ~10-1 reduction. Therefore it would be desirable to
> > distribute the computation into the racks where the data is located. When
> a
> > region is deployed to a region server the underlying blocks on DFS are
> not
> > immediately migrated, but always after a compaction -- a rewrite -- the
> > underlying blocks will be available on rack local data nodes, according
> to
> > my understanding of how DFS places replicas upon write. So, after a
> split,
> > daughter regions will have their blocks appropriately located in a timely
> > manner. For the rest I wonder if it would be beneficial to consider
> > scheduling major compaction more frequently than the 24 hour default for
> > datacenter scale deployments, something like 8 hours, and you might also
> > consider triggering a major compaction on important tables after cluster
> > (re)init. Region deployment in a system in steady state should have
> > relatively little churn so this will have the effect of optimizing block
> > placement for region store access.
> >
> > Submitted for your consideration,
> >
> >    - Andy
> >
> >
> >
> >
> >
> >
> > ________________________________
> > From: Ninad Raut <[email protected]>
> > To: hbase-user <[email protected]>
> > Cc: Ranjit Nair <[email protected]>
> > Sent: Thursday, May 14, 2009 2:56:04 AM
> > Subject: Keeping Compute Nodes seperate from the region server node--
> pros
> > and  cons
> >
> > Hi,
> > I want to get a design perspective here as to what will be the advantages
> > of
> > seperating region servers and compute node(to run mapreduce tasks)
> > Will seperating datanodes from computes node reduce the load on the
> servers
> > and avoid swapping problems?
> > Will this seperation make map reduce tasks less efficient , since we are
> > doing away with localization issues?
> > Regards,
> > Ninad
> >
> >
> >
> >
> >
>
>
>
>
>

Re: Keeping Compute Nodes seperate from the region server node-- pros and cons

Reply via email to