Hi Marcus, I don't know if it's related to your problem but in your machine setup you seem to imply that you have one region server and 3 datanodes on four different machines. If it's really the case, I recommend that you instead have 1 machine for the Namenode and Master and three other machines as Datanodes and RegionServers.
J-D On Thu, Jul 10, 2008 at 5:32 AM, Marcus Schlüter <[EMAIL PROTECTED]> wrote: > Hi everyone, > > We would like to use Hbase and Hadoop. > But when we tried to use real data with our test setup, we saw a lot of > crashes and could not succeed to insert the amount of data we are trying to > insert into an Hbase table. > Our goal is to have about 100 million of rows in one table, with each row > having about 100byte of raw data. > Our testsetup consists of the following servers: > > 3 x HP DL385 with 4GB RAM, 2x2,8Ghz Opterons and Smartarray RAID5 with an > capacity of 400GB. (all used as datanodes, and one of them also as the > namenode) > 1 x HP DL380 with 3GB RAM, 2x3,4Ghz Dualcore Xeons and Smartarray RAID5 > with an capacity of 320GB for hbase (master and regionserver). > > We used hadoop 0.16.4 with a replaction level of 2 and hbase 0.1.3. > Hbase is configured to use 2GB of heap space. > The table was created with the following query: > > create table logdata (logtype MAX_VERSIONS=1 COMPRESSION=BLOCK, banner_id > MAX_VERSIONS=1, contentunit_id MAX_VERSIONS=1, campaign_id MAX_VERSIONS=1, > network MAX_VERSIONS=1, geodata MAX_VERSIONS=1 COMPRESSION=BLOCK, > client_data MAX_VERSIONS=1 COMPRESSION=BLOCK, profile_data MAX_VERSIONS=1 > COMPRESSION=BLOCK, keyword MAX_VERSIONS=1 COMPRESSION=BLOCK, tstamp > MAX_VERSIONS=1, time MAX_VERSIONS=1); > > > there problem is, that the regionserver runs out of heap space and throws > the following exception after inserting a few million rows (not always the > same number of rows, ranging from 3 to about 10 million): > > Exception in thread "[EMAIL PROTECTED]" > java.lang.OutOfMemoryError: Java heap space > at java.io.DataInputStream.<init>(DataInputStream.java:42) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:186) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:578) > at org.apache.hadoop.ipc.Client.call(Client.java:501) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) > at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source) > at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:596) > at java.lang.Thread.run(Thread.java:619) > Exception in thread "ResponseProcessor for block blk_7988192980299756280" > java.lang.OutOfMemoryError: Java heap space > Exception in thread "IPC Server Responder" Exception in thread > "org.apache.hadoop.io.ObjectWritable Connection Culler" Exception in thread > "IPC Client connection to /192.168.1.117:54310" > java.lang.OutOfMemoryError: Java heap space > java.lang.OutOfMemoryError: Java heap space > java.lang.OutOfMemoryError: Java heap space > > any idears why we always see this crashes and if hbase should be able to > handle this amount of data in the setup we use? > > On a side note, we also observe that hbase seems to have a large storage > overhead. > When we insert about 1GB of rawdata into hbase, it uses about 8GB of HDFS > space (when taking into account the replication). > Is this large overhead expected? > > /Marcus > >
