Hey Josh.On 1) below, from the client's perspective, the region has disappeared; it of a sudden starts getting the NotServingRegionException (FYI, region will not close mid-update; updates are allowed finish before close goes into effect). The client needs to back up and figure the new location of the region its trying to update. Are you not using HTable? It manages the search for the new region location for you -- see, for example, the commit method around line 660 in HTable (look inside the getRegionLocation) -- with pause and maximum-retries.
On bulk uploading, somehow you need to run multiple concurrent clients working against different ranges in the keyspace: you could write a mapreduce job to do it (see under the mapred package in hbase for supporting code). But you also need more servers in the mix. Your current 'cozy' setup of one region server is hosting all clients. There is a basic load balancing of regions in place at the moment so if more servers, the client uploads should be carried near-evenly by all participants.
HADOOP-2075 is an umbrella issue in which we're trying to work out general tools to add to hbase to help with the bulk upload and -- perhaps -- dump of data.
On 2), that looks like an exception in a recently added feature done by Jim that hashes the key portions of filenames. He'll be in soon and will take a look at it.
St.Ack Josh Wills wrote:
This was a great thread-- it helped me a great deal in getting hbase up and running. Thanks very much to all of you. I upgraded to the 0.15.0 version of hadoop/hbase (as advised) and got much further than I did with the 0.14.2 release. I ran into a few things I wanted to ask you guys about-- 1) I'm in the process of uploading some data (~60GB) to an HTable on a single server running hadoop/hbase (i.e., namenode and datanode are on the same machine, as is the HMaster and HRegionServer. It's a cozy setup.) in chunks of ~500MB. As the upload runs, the regions occasionally get split, at which point my client code gets handed back a NotServingRegionException on whatever region the table is splitting. Right now, my strategy is to put the thread to sleep for a few seconds and then retry the operations, ala the "recalibrate" function in MultiRegionTable.java in the unit tests. It looks like eventually the HRegionServer gets up to date and everything goes back to normal. Is this the best way for me to handle this? I would also appreciate any other tips you guys might have on optimizing this sort of bulk upload-- once I get this setup, I have a very, very large dataset that I would like to store in HBase. 2) I was running one of these batch-style uploads last night on an HTable that I configured w/BloomFilters on a couple of my column families. During one of the compaction operations, I got the following exception-- FATAL org.apache.hadoop.hbase.HRegionServer: Set stop flag in regionserver/0:0:0:0:0:0:0:0:60020.splitOrCompactChecker java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at sun.security.provider.DigestBase.engineUpdate(DigestBase.java:102) at sun.security.provider.SHA.implDigest(SHA.java:94) at sun.security.provider.DigestBase.engineDigest(DigestBase.java:161) at sun.security.provider.DigestBase.engineDigest(DigestBase.java:140) at java.security.MessageDigest$Delegate.engineDigest(MessageDigest.java:531) at java.security.MessageDigest.digest(MessageDigest.java:309) at org.onelab.filter.HashFunction.hash(HashFunction.java:125) at org.onelab.filter.BloomFilter.add(BloomFilter.java:99) at org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Writer.append(HStoreFile.java:895) at org.apache.hadoop.hbase.HStore.compact(HStore.java:899) at org.apache.hadoop.hbase.HStore.compact(HStore.java:728) at org.apache.hadoop.hbase.HStore.compactHelper(HStore.java:632) at org.apache.hadoop.hbase.HStore.compactHelper(HStore.java:564) at org.apache.hadoop.hbase.HStore.compact(HStore.java:559) at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:717) at org.apache.hadoop.hbase.HRegionServer$SplitOrCompactChecker.checkForSplitsOrCompactions(HRegionServer.java:198) at org.apache.hadoop.hbase.HRegionServer$SplitOrCompactChecker.chore(HRegionServer.java:188) at org.apache.hadoop.hbase.Chore.run(Chore.java:58) Note that this wasn't the first compaction that was run (there were others before it that ran successfully) and that the region hadn't been split at this point. I defined the BloomFilterType.BLOOMFILTER on a couple of the columnfamilies, w/the largest one having ~100000 distinct entries. I don't know which of these caused the failure, but I noticed that 100000 is quite a bit larger than the # of entries used in the testcases, so I'm wondering if that might be the problem. Thanks again, the 0.15.0 stuff looks very good- Josh On 10/19/07, edward yoon <[EMAIL PROTECTED]> wrote:You're welcome. If you have any needs, questions, or comments in Hbase, please let us know! Edward. ---- B. Regards, Edward yoon (Assistant Manager/R&D Center/NHN, corp.) +82-31-600-6183, +82-10-7149-7856Date: Fri, 19 Oct 2007 14:33:45 +0800 From: [EMAIL PROTECTED] To: [email protected] Subject: Re: A basic question on HBase Dear edward yoon & Michael Stack, After using the hadoop branch-0.15, hbase runs correctly. Thank you very much! Best wishes, Bin YANG On 10/19/07, Bin YANG wrote:Thank you! I can download it now! On 10/19/07, edward yoon wrote:Run the following on the command-line: $ svn co http://svn.apache.org/repos/asf/lucene/hadoop/trunk hadoop See also for more information about the Hbase and Hbase Shell client program: - http://wiki.apache.org/lucene-hadoop/Hbase - http://wiki.apache.org/lucene-hadoop/Hbase/HbaseShell Edward. ---- B. Regards, Edward yoon (Assistant Manager/R&D Center/NHN, corp.) +82-31-600-6183, +82-10-7149-7856Date: Fri, 19 Oct 2007 13:46:51 +0800 From: [EMAIL PROTECTED] To: [email protected] Subject: Re: A basic question on HBase Dear Michael Stack: I am afraid that I cannot connect to the svn, Error: PROPFIND request failed on '/viewvc/lucene/hadoop/trunk' Error: PROPFIND of '/viewvc/lucene/hadoop/trunk': 302 Found (http://svn.apache.org) and Error: PROPFIND request failed on '/viewvc/lucene/hadoop/branches/branch-0.15' Error: PROPFIND of '/viewvc/lucene/hadoop/branches/branch-0.15': 302 Found (http://svn.apache.org) Would you please send me a 0.15 version of hadoop, or give some information on how to connect to the svn successfully? Best wishes, Bin YANG On 10/19/07, Michael Stack wrote:(Ignore my last message. I had missed your back and forth with Edward). Regards step 3. below, you are starting both mapreduce and dfs daemons. You only need dfs daemons running hbase so you could do ./bin/start-dfs.sh instead. Are you using hadoop 0.14.x? (It looks like it going by the commands and log excerpt below). If so, please use TRUNK or the 0.15.0 candidate (Branch is here http://svn.apache.org/viewvc/lucene/hadoop/branches/branch-0.15/). There is a big difference between hbase 0.14.0 and 0.15.0 (The 0.15.0 candidate contains the first hbase release). For example vestige log files are properly split and distributed in later hbases where before they threw the "Can not start region server because..." exception. As Edward points out, the master does not seem to be getting the region server 'report-for-duty' message (which doesn't jibe with the region server log that says -ROOT- has been deployed because master assigns regions). Regards your not being able to reformat -- presuming no valuable data in your hdfs, that all is running on localhost, and that you are moving from hadoop 0.14.0 to 0.15.0 -- just remove /tmp/hadoop-hadoop dir. St.Ack Bin YANG wrote:Dear edward, I will show you the steps what I have done: 1. hadoop-site.xml fs.default.name localhost:9000 Namenode mapred.job.tracker localhost:9001 JobTracker dfs.replication 1 2. /hadoop-0.14.2$ bin/hadoop namenode -format 3. bin/start-all.sh 4. hbase.site.xml hbase.master localhost:60000 The host and port that the HBase master runs at. TODO: Support 'local' (All running in single context). hbase.regionserver localhost:60010 The host and port a HBase region server runs at. 5. bin/hbase-start.sh The log: 1. hbase-hadoop-regionserver-yangbin.log 2007-10-18 15:40:58,588 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2007-10-18 15:40:58,592 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2007-10-18 15:40:58,690 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 60010: starting 2007-10-18 15:40:58,692 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 60010: starting 2007-10-18 15:40:58,694 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60010: starting 2007-10-18 15:40:58,692 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60010: starting 2007-10-18 15:40:58,691 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60010: starting 2007-10-18 15:40:58,696 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 60010: starting 2007-10-18 15:40:58,691 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60010: starting 2007-10-18 15:40:58,696 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 60010: starting 2007-10-18 15:40:58,697 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 60010: starting 2007-10-18 15:40:58,698 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 60010: starting 2007-10-18 15:40:58,699 INFO org.apache.hadoop.hbase.HRegionServer: HRegionServer started at: 127.0.1.1:60010 2007-10-18 15:40:58,709 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 60010: starting 2007-10-18 15:40:58,867 INFO org.apache.hadoop.hbase.HStore: HStore online for --ROOT--,,0/info 2007-10-18 15:40:58,872 INFO org.apache.hadoop.hbase.HRegion: region --ROOT--,,0 available 2007-10-18 18:21:55,558 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 1 time(s). 2007-10-18 18:21:56,577 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 2 time(s). 2007-10-18 18:21:57,585 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 3 time(s). 2007-10-18 18:21:58,593 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 4 time(s). 2007-10-18 18:22:05,874 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 127.0.1.1:60010 because logdir /tmp/hadoop-hadoop/hbase/log_yangbin_60010 exists at org.apache.hadoop.hbase.HRegionServer.(HRegionServer.java:482) at org.apache.hadoop.hbase.HRegionServer.(HRegionServer.java:407) at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1357) 2007-10-18 19:57:40,243 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2007-10-18 19:57:40,274 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2007-10-18 19:57:40,364 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 60010: starting 2007-10-18 19:57:40,366 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60010: starting 2007-10-18 19:57:40,367 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60010: starting 2007-10-18 19:57:40,368 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60010: starting 2007-10-18 19:57:40,368 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 60010: starting 2007-10-18 19:57:40,369 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60010: starting 2007-10-18 19:57:40,370 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 60010: starting 2007-10-18 19:57:40,371 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 60010: starting 2007-10-18 19:57:40,371 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 60010: starting 2007-10-18 19:57:40,372 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 60010: starting 2007-10-18 19:57:40,373 INFO org.apache.hadoop.hbase.HRegionServer: HRegionServer started at: 127.0.1.1:60010 2007-10-18 19:57:40,384 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 60010: starting 2007-10-18 19:57:41,118 INFO org.apache.hadoop.hbase.HStore: HStore online for --ROOT--,,0/info 2007-10-18 19:57:41,125 INFO org.apache.hadoop.hbase.HRegion: region --ROOT--,,0 available 2. hbase-hadoop-master-yangbin.log There is a lot of the below statement 2007-10-18 15:52:52,885 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 1 time(s). 2007-10-18 15:52:53,892 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 2 time(s). 2007-10-18 15:52:54,900 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 3 time(s). 2007-10-18 15:52:55,904 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 4 time(s). 2007-10-18 15:52:56,912 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 5 time(s). 2007-10-18 15:52:57,924 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 6 time(s). 2007-10-18 15:52:58,928 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 7 time(s). 2007-10-18 15:52:59,932 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 8 time(s). 2007-10-18 15:53:00,936 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 9 time(s). 2007-10-18 15:53:01,939 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.1.1:60010. Already tried 10 time(s). 2007-10-18 15:53:02,943 INFO org.apache.hadoop.ipc.RPC: Server at /127.0.1.1:60010 not available yet, Zzzzz...-- Bin YANG Department of Computer Science and Engineering Fudan University Shanghai, P. R. China EMail: [EMAIL PROTECTED]_________________________________________________________________ Windows Live Hotmail and Microsoft Office Outlook – together at last. Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033-- Bin YANG Department of Computer Science and Engineering Fudan University Shanghai, P. R. China EMail: [EMAIL PROTECTED]-- Bin YANG Department of Computer Science and Engineering Fudan University Shanghai, P. R. China EMail: [EMAIL PROTECTED]_________________________________________________________________ Windows Live Hotmail and Microsoft Office Outlook – together at last. Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033
