Thanks for the helpful information. Naama
On Mon, Jun 16, 2008 at 12:17 AM, Jim Kellerman <[EMAIL PROTECTED]> wrote: > Comments inline below. > > --- > Jim Kellerman, Senior Engineer; Powerset > > > > -----Original Message----- > > From: Naama Kraus [mailto:[EMAIL PROTECTED] > > Sent: Sunday, June 15, 2008 3:39 AM > > To: [email protected] > > Subject: HBase and locality issues > > > > Hi, > > > > I have some questions regarding HBase and locality issues - > > I'd appreciate some explanations and clarifications. > > > > I understand HBase is built on top of HDFS. > > Say an HRegionServer creates a HStoreFile where it puts some > > column family content. Does HDFS split the file to multiple > > HDFS blocks and distributes them around bunch of machines ? > > Yes. HStoreFile is currently implemented using org.apache.hadoop.io.MapFile > > > If that's the case, when the region server needs to actually > > access the files, does HDFS underneath communicates remote > > machines to read the various blocks ? > > Sometimes. If a requested block is local, HDFS will try to get that one. > > > Doesn't it hurt performance since there is no locality in data access > > (region server actually works on remote blocks). > > Somewhat. We have other areas that we have identified as larger performance > bottlenecks that need to be addressed first. > > > Or is the HStoreFile implemented in some other way which > > writes it to the local disks of the region server node > > machine that owns it ? > > No. Blocks are placed according to HDFS strategies. > > > If so, then how ? Does this code overrides the HDFS behavior ? > > It doesn't. > > > Another related question is about Map Reduce and HBase. When > > a MapReduce job runs on top of HBase - i.e. gets a table as > > an input. How does the MapReduce framework know how to > > schedule map tasks near data ? Does it have any knowledge of > > the actual location of the data pieces composing the table to > > be processed ? > > No. It is on our list of things to do. See HBASE-57 > > > I'd be also glad to get pointers to the related source code (classes). > > > > Thanks for any information, > > Naama > > > > -- > > oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 > > oo 00 oo 00 oo 00 oo 00 oo "If you want your children to be > > intelligent, read them fairy tales. If you want them to be > > more intelligent, read them more fairy tales." (Albert > > Einstein) > > No virus found in this outgoing message. > Checked by AVG. > Version: 8.0.100 / Virus Database: 270.3.0/1503 - Release Date: 6/14/2008 > 6:02 PM > -- oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo "If you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales." (Albert Einstein)
