RE: the question of hadoop

褚鵬兵 Tue, 07 Sep 2010 18:59:30 -0700
hi stevel:
thanks for your reply.i have not tried to debug infiniband,although i only know 
it.
Now, My hadoop cluster is made with HDFS+MAPREDUCE, ,hive, derby server.i want  
to put HBASE into cluster.how can i do it .can you help me .
thanks.pengbing chu  
> Date: Mon, 6 Sep 2010 11:14:10 +0100
> From: ste...@apache.org
> To: common-user@hadoop.apache.org
> Subject: Re: the question  of hadoop
> 
> On 06/09/10 09:32, 褚 鵬兵 wrote:
> >
> > hi ,my hadoop friends:i have the 3 questions about hadoop.there are ....
> >
> > 1 the speed between the datanodes.   Tera data in one datanodes ,   the 
> > data  transfers from one datanode to the another datanode.   if the speed  
> > is bad, Hadoop will be slow, i think.   i heard the gNet architecture in 
> > Greenplum ,  then hadoop ?  SAS storage + G-Ethernet is best answer, isn't 
> > it?
> 
> if your code has locality gigabit ether is fine, saves the hassle of 
> getting faster stuff to work. Have you ever tried to debug infiniband 
> cluster problems?
> 
> > 2 the GUI tool   there is a hive web tool in hadoop.   but it is not enough 
> > to use it for our business work.   it is too simple to use it.
> >     if hadoop+hive is designed into DWH.   then how to use it for users.   
> > by CGI Tool(Command),?   by New Developed webGUITOOL.?
> 
> the community welcomes new contributions. I'd look at cascading, 
> datameeer's stuff, and other things. Hive is designed for people who 
> know SQL, like PHP developers.
> 
> > 3 5 computers Hadoop cluster and 1 computer SQLSERVER2000   5 computers 
> > Hadoop      celeron 2.66G      1G memory      Ethernet      namenode + 
> > secondarynamenode + 3 datanode   1 computer SQLSERVER2000      celeron 
> > 2.66G      1G memory  then i did select operation at the same data 100M .   
> >  5 computers Hadoop  is 2mins 30secs   1 computer SQLSERVER2000  is 2mins 
> > 25secs
> > the result is that  5 computers Hadoop is not good .why .can anyone give me 
> > some advises.
> > thanks in adverse.
> 
> Indexes give RBMS speed, but limit their scale. If your dataset fits 
> onto a single mssql or mysql and you can afford the index costs, stay 
> with that data in a RAID array. Hadoop isn't trying to compete in that 
> space -though things like CouchDB are trying to
> 
> However, before you dismiss Hadoop, get in touch with your SQL server or 
> oracle account team and say "we are planning on working with 15 
> Petabytes of storage with data coming in at 1-2PB/month" and see what 
> they say back and how big their quote is. The search terms "MapReduce a 
> Major Step Backwards" shows some of the debate going on.
>
RE: the question of hadoop

Reply via email to