Large data sets
I am part of a working group that is developing a Bigtable-like structured storage system for Hadoop HDFS (see http://wiki.apache.org/lucene-hadoop/Hbase). I am interested in learning about large HDFS installations: - How many nodes do you have in a cluster? - How much data do you store in HDFS? - How many files do you have in HDFS? - Have you run into any limitations that have prevented you from growing your application? - Are there limitations in how many files you can put in a single directory? Google's GFS, for example does not really implement directories per-se, so it does not suffer from performance problems related to having too many files in a directory as traditional file systems do. The largest system I know about has about 1.5M files and about 150GB of data. If anyone has a larger system in use, I'd really like to hear from you. Were there particular obstacles you had in growing your system to that size, etc? Thanks in advance. -- Jim Kellerman, Senior Engineer; Powerset[EMAIL PROTECTED]
Re: Loading data into HDFS
This request isn't so much about loading data into HDFS, but we really need the ability to create a file that supports atomic appends for the HBase redo log. Since HDFS files currently don't exist until they are closed, the best we can do right now is close the current redo log and open a new one fairly frequently to minimize the number of updates that would get lost otherwise. I don't think we need the multi-appender model that GFS supports, just a single appender. -Jim On Tue, 2007-08-07 at 10:45 -0700, Eric Baldeschwieler wrote: I'll have our operations folks comment on our current techniques. We use map-reduce jobs to copy from all nodes in the cluster from the source. Generally using either HTTP(S) or HDFS protocol. We've seen write rates as high as 8.3 GBytes/sec on 900 nodes. This is network limited. We see roughly 20MBytes/sec/node (double the other rate) on one rack clusters, with everything connected with gigabit. We (the yahoo grid team) are planning to put some more energy into making the system more useful for real-time log handling in the next few releases. For example, I would like to be able to tail -f a file as it is written, I would like to have a generic log aggregation system and I would like to have the map-reduce framework log directly into HDFS using that system. I'd love to hear thoughts on other achievable improvements that would really help in this area. On Aug 3, 2007, at 1:42 AM, Jeff Hammerbacher wrote: We have a service which writes one copy of a logfile directly into HDFS (writes go to namenode). As Dennis mentions, since HDFS does not support atomic appends, if a failure occurs before closing a file, it never appears in the file system. Thus we have to rotate logfiles at a greater frequency that we'd like to checkpoint the data into HDFS. The system certainly isn't perfect but bulk-loading the data into HDFS was proving rather slow. I'd be curious to hear actual performance numbers and methodologies for bulk loads. I'll try to dig some up myself on Monday. On 8/2/07, Dennis Kubes [EMAIL PROTECTED] wrote: You can copy data from any node, so if you can do it from multiple nodes your performance would be better (although be sure not to overlap files). The master node is updated once a the block is copied it replication number of times. So if default replication is 3 then the 3 replicates must be active before the master is updated and the data appears int the dfs. How long the updates take to happen is a function of your server load and network speed and file size. Generally it is fast. So the process is the data is loaded into the dfs, replicates are created, and the master node is updated. In terms of consistency, if the data node crashes before the data is loaded then the data won't appear in the dfs. If the name node crashes before it is updated but all replicates are active, the data would appear once the name node has been fixed and updated through block reports. If a single node crashes that has a replicate once the namenode has been updated then the data will be replicated from one of the other 2 replicates to another 3 system if available. Dennis Kubes Venkates .P.B. wrote: Am I missing something very fundamental ? Can someone comment on these queries ? Thanks, Venkates P B On 8/1/07, Venkates .P.B. [EMAIL PROTECTED] wrote: Few queries regarding the way data is loaded into HDFS. -Is it a common practice to load the data into HDFS only through the master node ? We are able to copy only around 35 logs (64K each) per minute in a 2 slave configuration. -We are concerned about time it would take to update filenames and block maps in the master node when data is loaded from few/all the slave nodes. Can anyone let me know how long generally it takes for this update to happen. And one more question, what if the node crashes soon after the data is copied into one it. How is data consistency maintained here ? Thanks in advance, Venkates P B -- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED]
Re: hbase listTables() broken?
Looks like a bug. Please open a Jira if you have not already done so. -Jim On Tue, 2007-09-04 at 00:24 -0700, Andrew Hitchcock wrote: Hi, I've been playing around with HBase a little bit recently; writing some code to get an idea of how to use it. I noticed the listTables() method returned by an HConnection doesn't return what I expect. I don't know if this is a bug or if I'm just doing something wrong. I first created the table movieLog_table using HBase shell. Then I tried programmatically creating and deleting tables. I create and enable a table (which I know works, because I can use it to store and retrieve values), and then run this code: HConnection con = HConnectionManager.getConnection(conf); HTableDescriptor[] tables = con.listTables(); System.out.println(tables.length); for (HTableDescriptor tabledesc : tables) { System.out.println(tabledesc.getName().toString()); } When I run the code (which programmatically creates two tables, in addition to the one table I created using the shell), I get this result: 3 movieLog_table movieLog_table movieLog_table The number is correct, but the two new tables are reported as having the same name. Also, when I run show tables; in HBase shell, it shows the same result. If this is actually a bug and not an error on my end, I can create a JIRA task. Thanks, Andrew -- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED]
RE: HBase performance
12345678901234567890123456789012345678901234567890123456789012345 Performance always depends on the work load. However, having said that, you should read Michael Stonebraker's paper The End of an Architectural Era (It's Time for a Complete Rewrite) which was presented at the Very Large Database Conference. You can find a PDF copy of the paper here: http://www.vldb.org/conf/2007/papers/industrial/p1150-stonebraker.pdf In this paper he presents compelling evidence that column oriented databases (HBase is a column oriented database) can outperform traditional RDBMS systems (MySql) by an order of magnitude or more for almost every kind of work load. Here's a brief summary of why this is so: - writes: a row oriented database writes the whole row regardless of whether or not values are supplied for every field or not. Space is reserved for null fields, so the number of bytes written is the same for every row. In a column oriented database, only the columns for which values are supplied are written. Nulls are free. Also row oriented databases must write a row descriptor so that when the row is read, the column values can be found. - reads: Unless every column is being returned on a read, a column oriented database is faster because it only reads the columns requested. The row oriented database must read the entire row, figure out where the requested columns are and only return that portion of the data read. - compression: works better on a column oriented database because the data is similar, and stored together, which is not the case in a row oriented database. - scans: suppose you have a 600GB database with 200 columns of equal length (the TPC-H OLTP benchmark has 212 columns) but while you are scanning the table you only want to return 5 of the columns. Each column takes up 3GB of the 600GB. A row oriented database will have to read the entire 600GB to extract the 20GB of data desired. Think about how long it takes to read 600GB vs 20GB. Furthermore, in a column oriented database, each column can be read in parallel, and the inner loop only executes once per column rather than once per row as in the row oriented database. - bulk loads: column oriented databases have to construct their indexes as the load progresses, so even of the load goes from low value to high, btrees must be split and reorganized. For column oriented databases, this is not true. - adding capacity: in a row oriented database, you generally have to dump the database, create a new partitioning scheme and then load the dumped data into a new database. With HBase, storage is only limited by the DFS. Need more storage? Add another data node. We have done almost no tuning for HBase, but I'd be willing to bet that it would handily beat MySql in a drag race. --- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED] -Original Message- From: Rafael Turk [mailto:[EMAIL PROTECTED] Sent: Thursday, October 11, 2007 3:36 PM To: hadoop-user@lucene.apache.org Subject: HBase performance Hi All, Does any one have comments about how Hbase will perform in a 4 node cluster compared to an equivalent MySQL configuration? Thanks, Rafael
RE: HBase performance
Stonebraker has a new column oriented store called H-Store. It is also talked about in the paper. And now I'll shut up. I didn't intend to create such a firestorm. --- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED] -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Friday, October 12, 2007 11:29 AM To: hadoop-user@lucene.apache.org Subject: Re: HBase performance Jonathan Hendler wrote: Since Vertica is also a distributed database, I think it may be interesting to the newbies like myself on the list. To keep the conversation topical - while it's true there's a major campaign of PR around Vertica, I'd be interested in hearing more about how HBase compares with other column stores or hybrids. Vertica is presumably based on C-Store. C-Store seems not optimized for immediate query of recently updated data, but rather for delayed queries over mostly read-only data warehouses. HBase (modeled after BigTable) is instead optimized for real-time access to read-write data. So I think it depends a bit on what your application needs. E.g., from the C-Store paper: we expect read-only queries to be run in historical mode. In this mode, the query selects a timestamp, T, less than the one of the most recently committed transactions [...] Doug
RE: HBase performance
One more comment and then I'll really shut up, I promise. On re-reading the paper, you are all absolutely correct about C-Store, H-Store and Vertica. What is not in the paper and part of what he presented this week was applying column oriented stores to the TPC-H benchmark. The TPC-H OLTP telco benchmark has a schema of 212 columns, contains ~600GB data and each transaction accesses only 6 or 7 of the columns. In a full table scan, a row oriented store must read all 600GB of data. It has no choice. A column oriented store need only read the 6-7 columns which is approximately 20GB. I don't think anyone will argue that you can read 20GB a whole lot faster than 600GB. Jeff Hammerbacher wrote: 4) your section on adding capacity has NOTHING at all to do with organizing your data on disk in a column-oriented fashion; it's a property of any reasonably well-designed horizontally partitioned data store. Hmm, well column oriented-ness of BigTable and HBase do a pretty nice job of horizontal partitioning. Jonathan Hendler wrote: One of the valid points ... has to do with compression (and null values). For example - does HBase also offer tools, or a strategy for compression? Yes, see hbase.HColumnDescriptor.java compression is controlled on a per column family basis. --- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED]
RE: A basic question on HBase
Josh, Could you provide the parameters you used to configure the bloom filter? Thanks. --- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED] -Original Message- From: Josh Wills [mailto:[EMAIL PROTECTED] Sent: Sunday, October 21, 2007 7:28 PM To: hadoop-user@lucene.apache.org Subject: Re: A basic question on HBase 2) I was running one of these batch-style uploads last night on an HTable that I configured w/BloomFilters on a couple of my column families. During one of the compaction operations, I got the following exception-- FATAL org.apache.hadoop.hbase.HRegionServer: Set stop flag in regionserver/0:0:0:0:0:0:0:0:60020.splitOrCompactChecker java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at sun.security.provider.DigestBase.engineUpdate(DigestBase.java:102) at sun.security.provider.SHA.implDigest(SHA.java:94) at sun.security.provider.DigestBase.engineDigest(DigestBase.java:161) at sun.security.provider.DigestBase.engineDigest(DigestBase.java:140) at java.security.MessageDigest$Delegate.engineDigest(MessageDiges t.java:531) at java.security.MessageDigest.digest(MessageDigest.java:309) at org.onelab.filter.HashFunction.hash(HashFunction.java:125) at org.onelab.filter.BloomFilter.add(BloomFilter.java:99) at org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Writer.a ppend(HStoreFile.java:895) at org.apache.hadoop.hbase.HStore.compact(HStore.java:899) at org.apache.hadoop.hbase.HStore.compact(HStore.java:728) at org.apache.hadoop.hbase.HStore.compactHelper(HStore.java:632) at org.apache.hadoop.hbase.HStore.compactHelper(HStore.java:564) at org.apache.hadoop.hbase.HStore.compact(HStore.java:559) at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:717) at org.apache.hadoop.hbase.HRegionServer$SplitOrCompactChecker.ch eckForSplitsOrCompactions(HRegionServer.java:198) at org.apache.hadoop.hbase.HRegionServer$SplitOrCompactChecker.ch ore(HRegionServer.java:188) at org.apache.hadoop.hbase.Chore.run(Chore.java:58) Note that this wasn't the first compaction that was run (there were others before it that ran successfully) and that the region hadn't been split at this point. I defined the BloomFilterType.BLOOMFILTER on a couple of the columnfamilies, w/the largest one having ~10 distinct entries. I don't know which of these caused the failure, but I noticed that 10 is quite a bit larger than the # of entries used in the testcases, so I'm wondering if that might be the problem. Thanks again, the 0.15.0 stuff looks very good- Josh On 10/19/07, edward yoon [EMAIL PROTECTED] wrote: You're welcome. If you have any needs, questions, or comments in Hbase, please let us know! Edward. B. Regards, Edward yoon (Assistant Manager/RD Center/NHN, corp.) +82-31-600-6183, +82-10-7149-7856 Date: Fri, 19 Oct 2007 14:33:45 +0800 From: [EMAIL PROTECTED] To: hadoop-user@lucene.apache.org Subject: Re: A basic question on HBase Dear edward yoon Michael Stack, After using the hadoop branch-0.15, hbase runs correctly. Thank you very much! Best wishes, Bin YANG On 10/19/07, Bin YANG wrote: Thank you! I can download it now! On 10/19/07, edward yoon wrote: Run the following on the command-line: $ svn co http://svn.apache.org/repos/asf/lucene/hadoop/trunk hadoop See also for more information about the Hbase and Hbase Shell client program: - http://wiki.apache.org/lucene-hadoop/Hbase - http://wiki.apache.org/lucene-hadoop/Hbase/HbaseShell Edward. B. Regards, Edward yoon (Assistant Manager/RD Center/NHN, corp.) +82-31-600-6183, +82-10-7149-7856 Date: Fri, 19 Oct 2007 13:46:51 +0800 From: [EMAIL PROTECTED] To: hadoop-user@lucene.apache.org Subject: Re: A basic question on HBase Dear Michael Stack: I am afraid that I cannot connect to the svn, Error: PROPFIND request failed on '/viewvc/lucene/hadoop/trunk' Error: PROPFIND of '/viewvc/lucene/hadoop/trunk': 302 Found (http://svn.apache.org) and Error: PROPFIND request failed on '/viewvc/lucene/hadoop/branches/branch-0.15' Error: PROPFIND of '/viewvc/lucene/hadoop/branches/branch-0.15': 302 Found (http://svn.apache.org) Would you please send me a 0.15 version of hadoop, or give some information on how to connect to the svn successfully? Best wishes, Bin YANG On 10/19/07, Michael Stack wrote: (Ignore my last message. I had missed your back and forth with Edward). Regards step 3. below, you are starting both mapreduce and dfs daemons. You only need dfs daemons running hbase so you could do ./bin/start-dfs.sh instead. Are you using hadoop 0.14.x? (It looks
RE: HBase UnknownScannerException
UnknownScannerException is thrown by the region server if a scanner request (next or close) is called with either a bogus scanner id (unlikely since the scanner id is hidden from the client application) or if the scanner's lease has expired. Every client request is given a lease by the region server. If the lease times out, the region server thinks that the client has gone away and it can clean up any resources being held for that request. Lease timeouts almost always occur on operations related to scanners. If a scanner's lease times out, UnknownScannerException will be thrown if the client issues a scanner request after the timeout because the region server has already cleaned up the resources associated with the scanner. (The default lease timeout for client requests is 30 seconds, so it is very likely that this is the situation you are running into). If you attempt to do an update to the same table that you have a scanner open on, the update will stall waiting for the scanner to complete. So a loop which does while (scanner.next(key, val)) { if (key.equals(something) || val.equals(somethingElse)) { table.startUpdate(...) table.put(...) table.commit(...) } } will not work because the update will wait for the scanner to finish, (which it won't), and both the update and the next scanner.next should both fail. Hope that helps. --- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED] -Original Message- From: Cedric Ho [mailto:[EMAIL PROTECTED] Sent: Thursday, October 25, 2007 7:58 PM To: hadoop-user@lucene.apache.org Subject: HBase UnknownScannerException Hi, I was trying HBase from the 0.15 branch. And was doing: HScannerInterface s = table.obtainScanner(...) while(s.next(key, val)) { } And encounter the following exception whenever I spent too much time in the while loop, (e.g. 30 ~ 120 seconds). I am also doing insert in the same table in another process. But the records they are working on are definitely non-overlapping. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.UnknownScannerException: Name: -7220522571873774180 at org.apache.hadoop.hbase.HRegionServer.next(HRegionServer.java:1075) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth odAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at org.apache.hadoop.ipc.Client.call(Client.java:482) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) at $Proxy1.next(Unknown Source) at org.apache.hadoop.hbase.HTable$ClientScanner.next(HTable.java:831) ... there are totally 6 machines: one running namenode, secondarynamenode, hbase master five running datanode, regionserver Cheers, Cedric
RE: How to Setup Hbase in 10 mintues
They are indeed quite similar but do have some significant differences. Some of the commands take different arguments and there are different commands run in each. If HBase were a part of hadoop proper instead of a contrib project (like record io which was moved from contrib into the main hadoop tree), I would be more inclined to merge the scripts since they would have more in common. For now, I think keeping them separate is probably the right thing to do. --- Jim Kellerman -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 30, 2007 9:53 AM To: hadoop-user@lucene.apache.org Subject: Re: How to Setup Hbase in 10 mintues Holger Stenzhorn wrote: This fix is exactly the same as done for hadoop-daemon.sh (and introduced into the Subversion repository already). Which begs the question: could HBase use hadoop-daemon.sh directly? If not, could hadoop-daemon.sh be modified to support HBase? Maintaining two slightly different versions of something makes maintenance painful. Doug
RE: HBase question on HRegions server
-Original Message- From: Bin YANG [mailto:[EMAIL PROTECTED] Sent: Thursday, November 01, 2007 3:06 AM To: hadoop-user@lucene.apache.org Subject: HBase question on HRegions server Hi, I am confused with some thing in HBase. 1. All data is stored in HDFS. Data is served to clients by HRegionServers. Is it allowed that the tablet T is on machine A, and served by a HRegionServers running on machine B? Yes, it is possible. Depending how how many replicas of the data there are in HDFS, it is possible that the data is on machines A, B, C and the region server is running on machine D. In the future, we will be investigating how to assign regions to a region server based on where the data is located. What information does the META table maintain? The map from T to the physical address in machine A, or the map from T to which machine serves it, for example, machine B? There are three pieces of data stored in the ROOT and META table: 1. The HRegionInfo object that describes the region. It includes the startKey, endKey, regionId, regionName and the HTableDescriptor 2. The host:port of the region server currently serving the region 3. A sequence number so that we can tell if the host:port is a current region assignment or if it is a stale assignment 2. Similar to Bigtable paper, what does the tablet location(section 5.1) stand for? Is it the map from the tablet id to physical address, or the map from the tablet to which machine serves it? I don't know exactly what Google stores in their meta table. What HBase stores is the data above. From it we can contact a region server directly and the region server can locate the region's files in HDFS. thanks -- Bin YANG Department of Computer Science and Engineering Fudan University Shanghai, P. R. China EMail: [EMAIL PROTECTED]
RE: HBase question on HRegions server
-Original Message- From: Bin YANG [mailto:[EMAIL PROTECTED] Sent: Thursday, November 01, 2007 6:59 PM To: hadoop-user@lucene.apache.org Subject: Re: HBase question on HRegions server Thank you very much Michael and Jim! That means the master does not maintain the mapping from HRegion to HRegionServer. And the mapping from HRegion to HRegionServer is in the META and ROOT. Is it right? Not quite. The master does maintain a mapping of the ROOT and META regions to which region servers are serving them, but not for regions which are part of a user table. There are a couple of reasons for this: - Unlike Bigtable which stores the ROOT location in Chubby, HBase keeps the ROOT location in the Master so that the clients know how to find the ROOT region. - In order for the master to perform administration functions such as table creation, it must know where the META regions are so it can determine whether the table already exists. Further, if a region server dies, the master must scan the META regions to determine which regions the dead server was serving. So if a client want to read a tablet, it should first find the ROOT, find corresponding META, and the client will know which tablet server is serving the tablet. Is it right? Correct. And that is exactly what HConnectionManager does for HTable (which is the client API) The master just maintains the active RegionServer list. And the master is responsible for assigning regions to region servers. If the client want to create a new tablet, how would a client to find a HRegionServer? Any active HRegionServer in master's list is a candidate? A client should use an HBaseAdmin object which provides the administrative API for the client. Does the HRgionServer write a new tablet row in the META directly or ask the Master to write a new tablet row in the META? It depends on the circumstances. Sometimes the region server does, sometimes the master does. thanks. On 11/2/07, Jim Kellerman [EMAIL PROTECTED] wrote: -Original Message- From: Bin YANG [mailto:[EMAIL PROTECTED] Sent: Thursday, November 01, 2007 3:06 AM To: hadoop-user@lucene.apache.org Subject: HBase question on HRegions server Hi, I am confused with some thing in HBase. 1. All data is stored in HDFS. Data is served to clients by HRegionServers. Is it allowed that the tablet T is on machine A, and served by a HRegionServers running on machine B? Yes, it is possible. Depending how how many replicas of the data there are in HDFS, it is possible that the data is on machines A, B, C and the region server is running on machine D. In the future, we will be investigating how to assign regions to a region server based on where the data is located. What information does the META table maintain? The map from T to the physical address in machine A, or the map from T to which machine serves it, for example, machine B? There are three pieces of data stored in the ROOT and META table: 1. The HRegionInfo object that describes the region. It includes the startKey, endKey, regionId, regionName and the HTableDescriptor 2. The host:port of the region server currently serving the region 3. A sequence number so that we can tell if the host:port is a current region assignment or if it is a stale assignment 2. Similar to Bigtable paper, what does the tablet location(section 5.1) stand for? Is it the map from the tablet id to physical address, or the map from the tablet to which machine serves it? I don't know exactly what Google stores in their meta table. What HBase stores is the data above. From it we can contact a region server directly and the region server can locate the region's files in HDFS. thanks -- Bin YANG Department of Computer Science and Engineering Fudan University Shanghai, P. R. China EMail: [EMAIL PROTECTED] -- Bin YANG Department of Computer Science and Engineering Fudan University Shanghai, P. R. China EMail: [EMAIL PROTECTED]
RE: NoSuchElementException when creating a table
C:\hadoop is my installation C:\workspace\hadoop-commit is my checked out SVN tree which is current with trunk. /cygdrive/c$ diff hadoop/conf/hadoop-site.xml workspace/hadoop-commit/conf 7,17c7 property namehadoop.tmp.dir/name valueC:\hadoop\tmp/value descriptionA base for other temporary directories./description /property !-- uncomment if running distributed property namefs.default.name/name valuelocalhost:5/value /property -- --- /cygdrive/c/hadoop/src/contrib/hbase$ diff conf/hbase-site.xml ../../../../work space/hadoop-commit/src/contrib/hbase/conf/ /cygdrive/c/hadoop/src/contrib/hbase$ (i.e. no changes) sshd must be configured and running /cygdrive/c/hadoop/src/contrib/hbase$ /usr/sbin/sshd start HBase... /cygdrive/c/hadoop/src/contrib/hbase$ bin/start-hbase.sh FileSystem is file:/// starting master, logging to /cygdrive/c/hadoop/src/contrib/hbase/bin/../../../.. //logs/hbase-jim-master-JIM.out [EMAIL PROTECTED]'s password: localhost: starting regionserver, logging to /cygdrive/c/hadoop/src/contrib/hbas e/bin/../../../..//logs/hbase-jim-regionserver-JIM.out (create table with HBase shell) /cygdrive/c/hadoop/src/contrib/hbase$ bin/hbase shell Hbase Shell, 0.0.2 version. Copyright (c) 2007 by udanax, licensed to Apache Software Foundation. Type 'help;' for usage. Hbase create table test (contents); Creating table... Please wait. Table created successfully. (query META region for information about the newly created table) Hbase select info:regioninfo from .META.; +--+--+ | Row | Cell | +--+--+ | test,,1193980167062 | regionname: test,,1193980167062, star| | | tKey: , tableDesc: {name: test, fam| | | ilies: {contents:={name: contents, ma| | | x versions: 3, compression: NONE, in | | | memory: false, max length: 2147483647| | | , bloom filter: none}}} | +--+--+ 1 row(s) in set (0.16 sec) Hbase exit -- ; (shut down HBase...) /cygdrive/c/hadoop/src/contrib/hbase$ bin/stop-hbase.sh stopping master /cygdrive/c/hadoop/src/contrib/hbase$ --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Holger Stenzhorn [mailto:[EMAIL PROTECTED] Sent: Thursday, November 01, 2007 2:48 PM To: hadoop-user@lucene.apache.org Subject: NoSuchElementException when creating a table Hi, I checked out Hadoop (including HBase) from its Subversion repository today, build it successfully (on Cygwin) and started HBase in local mode. Then I took your little example program from the Wiki and it crashes at the last line: HBaseConfiguration conf = new HBaseConfiguration(); HTableDescriptor desc = new HTableDescriptor(test); desc.addFamily(new HColumnDescriptor(content:)); HBaseAdmin admin = new HBaseAdmin(conf); admin.createTable(desc); ...and I get the following stacktrace: java.io.IOException: java.io.IOException: java.util.NoSuchElementException at java.util.TreeMap.key(TreeMap.java:1206) at java.util.TreeMap$NavigableSubMap.lastKey(TreeMap.java:1435) at java.util.Collections$SynchronizedSortedMap.lastKey(Collection s.java:2125) at org.apache.hadoop.hbase.HMaster.createTable(HMaster.java:2460) at org.apache.hadoop.hbase.HMaster.createTable(HMaster.java:2424) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccess orImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth odAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeCo nstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Dele gatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteExc eption(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.HBaseAdmin.createTableAsync(HBaseAdmin .java:150) at org.apache.hadoop.hbase.HBaseAdmin.createTable(HBaseAdmin.java:119) at Test.main(Test.java:19) After doing a little debugging I found the culprit line
RE: hbase feature question
Currently, HBase supports specifying a maximum number of versions to keep. Older versions are removed during compaction. However, we do not currently support a TTL for columns. Please enter a Jira using the component contrib/hbase and request a feature improvement. Thanks. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: news [mailto:[EMAIL PROTECTED] On Behalf Of Billy Sent: Saturday, November 17, 2007 8:18 AM To: hadoop-user@lucene.apache.org Subject: hbase feature question I was looking over the bigtable pdf again to make sure that's where I read this, but there setup allows Column Families to be removed from the database in garbage collection. Is this a feature that will be added to hbase? Basically it allows you to set a max ttl for a column row. I can see where this would be useful for nutch and other apps in crawling. Example storing links from z page pointing to x page if not updated by y time it gets removed form the dataset. keeps from having the scan the whole dataset to remove stale data. Billy
RE: Text and/or ImmutableBytesWritable issue?
Text objects typically contain more bytes than are actually in use. If you were to use the alternate constructor for ImmutableBytesWritable: new ImmutableBytesWritable(input.getBytes(), 0, input.getLength()); the test will pass. One more note: Relying on the default encoding being the same for Strings may work on any single machine but if one machine has a default encoding of EN_US and another's is UTF-8, passing an ImmutableBytesWritable from one machine to another will result in the String decoding failing. For this reason, we always specify an encoding for String.getBytes and in the String constructor: new ImmutableBytesWritable(this is a string.getBytes(UTF-8)) and new String(ibw.getBytes(), UTF-8) --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Jason Grey [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 21, 2007 8:27 AM To: hadoop-user@lucene.apache.org Subject: Text and/or ImmutableBytesWritable issue? Can anyone explain why testTextToBytes doesn't assert and testStringToBytes does? import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.io.Text; import junit.framework.TestCase; public class TestImmutableBytesWritable extends TestCase { public void testTextToBytes(){ Text input = new Text(this is a test.); ImmutableBytesWritable bytes = new ImmutableBytesWritable( input.getBytes() ); Text output = new Text( bytes.get() ); assertEquals(input, output); } public void testStringToBytes(){ String input = this is a test.; ImmutableBytesWritable bytes = new ImmutableBytesWritable( input.getBytes() ); String output = new String( bytes.get() ); assertEquals(input, output); } } If I inspect the objects during debugging at the point of the assert I see the following: * input bytes = [116, 104, 105, 115, 32, 105 , 115, 32, 97, 32, 116, 101 , 115, 116, 46, 0] length = 15 * bytes = [116, 104, 105, 115, 32, 105 , 115, 32, 97, 32, 116, 101 , 115, 116, 46, 0] * output bytes = [116, 104, 105, 115, 32, 105 , 115, 32, 97, 32, 116, 101 , 115, 116, 46, 0] length = 16 The length property appears to be off between the two Text objects, but all the data is correct... any help would be greatly appreciated. Thanks -jg-
RE: Text and/or ImmutableBytesWritable issue?
Well, it depends on how the Text object is initialized. If it is initialized with a String, it sets its internal length to the length of the string. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: stack [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 21, 2007 9:04 AM To: hadoop-user@lucene.apache.org Subject: Re: Text and/or ImmutableBytesWritable issue? What Jim just said, but it looks to me like Text is doing the wrong thing. When you ask it its length, it returns the byte buffer capacity rather than how many bytes are in use. It says length is 16 but there are only 15 characters in your test string, UTF-8'd or not. St.Ack Jason Grey wrote: Can anyone explain why testTextToBytes doesn't assert and testStringToBytes does? import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.io.Text; import junit.framework.TestCase; public class TestImmutableBytesWritable extends TestCase { public void testTextToBytes(){ Text input = new Text(this is a test.); ImmutableBytesWritable bytes = new ImmutableBytesWritable( input.getBytes() ); Text output = new Text( bytes.get() ); assertEquals(input, output); } public void testStringToBytes(){ String input = this is a test.; ImmutableBytesWritable bytes = new ImmutableBytesWritable( input.getBytes() ); String output = new String( bytes.get() ); assertEquals(input, output); } } If I inspect the objects during debugging at the point of the assert I see the following: * input bytes = [116, 104, 105, 115, 32, 105 , 115, 32, 97, 32, 116, 101 , 115, 116, 46, 0] length = 15 * bytes = [116, 104, 105, 115, 32, 105 , 115, 32, 97, 32, 116, 101 , 115, 116, 46, 0] * output bytes = [116, 104, 105, 115, 32, 105 , 115, 32, 97, 32, 116, 101 , 115, 116, 46, 0] length = 16 The length property appears to be off between the two Text objects, but all the data is correct... any help would be greatly appreciated. Thanks -jg-
RE: lists
I hope he was subscribed to this list so he got the message :) --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Eric Baldeschwieler [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 12:04 PM To: hadoop-user@lucene.apache.org Subject: Re: lists Hi Folks, Please ignore the last email I sent with this subject. I was just planning to pass some information to the guy in the next cube. Instead I shared it with the world. Whoops, E14
RE: api about hbase?
The only examples we have at this point are the unit tests. They are rather contrived but it's what we have. You can find them in the source tree under src/contrib/hbase/src/test/org/apache/hadoop/hbase --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: ma qiang [mailto:[EMAIL PROTECTED] Sent: Friday, December 14, 2007 4:19 AM To: hadoop-user@lucene.apache.org Subject: api about hbase? Hi colleague, After reading the api docs about hbase,I don't know how to manipulate the hase using the java api .Would you please send me some examples? Thank you! Ma Qiang Department of Computer Science and Engineering Fudan University Shanghai, P. R. China
RE: point in time snapshot
Billy, Are you referring to snapshots of the entire DFS or of HBase? --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: news [mailto:[EMAIL PROTECTED] On Behalf Of Billy Sent: Tuesday, December 18, 2007 4:29 PM To: hadoop-user@lucene.apache.org Subject: point in time snapshot I been looking around jira and can not find a issue on snapshots is there an snapshot for backup option in the works? Say I want to do a backup on my data I would run a snapshot and it would be stored in the dfs as a backup file(s) but I could restore it if needed later down the road if current data got corrupted or deleted lost etc... Billy
RE: HashMap which can spill to disk for Hadoop?
Have you looked at hadoop.io.MapWritable? --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: C G [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 19, 2007 11:59 AM To: hadoop-user@lucene.apache.org Subject: HashMap which can spill to disk for Hadoop? Hi All: The aggregation classes in Hadoop use a HashMap to hold unique values in memory when computing unique counts, etc. I ran into a situation on 32-node grid (4G memory/node) where a single node runs out of memory within the reduce phase trying to manage a very large HashMap. This was disappointing because the dataset is only 44M rows (4G) of data. This is a scenario where I am counting unique values associated with various events, where the total number of events is very small and the number of unique values is very high. Since the event IDs serve as keys as the number of distinct event IDs is small, there is a consequently small number of reducers running, where each reducer is expected to manage a very large HashMap of unique values. It looks like I need to build my own unique aggregator, so I am looking for an implementation of HashMap which can spill to disk as needed. I've considered using BDB as a backing store, and I've looking into Derby's BackingStoreHashtable as well. For the present time I can restructure my data in an attempt to get more reducers to run, but I can see in the very near future where even that will run out of memory. Any thoughts,comments, or flames? Thanks, C G - Looking for last minute shopping deals? Find them fast with Yahoo! Search.
RE: hbase master heap space
Scanners time out on the region server side and resources get cleaned up, but that does not happen on the client side unless you later call the scanner again and the region server tells the client that that scanner has timed out. In short, any application that uses a scanner should close it. It might be a good idea to add a scanner watcher on the client that shuts them down. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Bryan Duxbury [mailto:[EMAIL PROTECTED] Sent: Friday, December 21, 2007 5:51 PM To: hadoop-user@lucene.apache.org Subject: Re: hbase master heap space Are you closing the scanners when you're done? If not, those might be hanging around for a long time. I don't think we've built in the proper timeout logic to make that work by itself. -Bryan On Dec 21, 2007, at 5:10 PM, Billy wrote: I was thanking the same thing and been running REST outside of the Master on each server for about 5 hours now and used the master as a backup if local rest interface failed. You are right I seen a little faster processing time from doing this vs. using just the master. Seams the problem is not with the master its self looks like REST is using up more and more memory not sure but I thank its to do with inserts maybe not but the memory usage is going up I an doing a scanner 2 threads reading rows and processing the data and inserting it in to a separate table building a inverted index. I will restart everything when this job is done and try to do just inserts and see if its the scanner or inserts. The master is holding at about 75mb and the rest interfaces are up to 400MB and slowly going up on the ones running the jobs. I am still testing I will see what else I can come up with. Billy stack [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Hey Billy: Master itself should use little memory and though it is not out of the realm of possibiliites, it should not have a leak. Are you running with the default heap size? You might want to give it more memory if you are (See http://wiki.apache.org/lucene-hadoop/Hbase/FAQ#3 for how). If you are uploading all via the REST server running on the master, the problem as you speculate, could be in the REST servlet itself (though it looks like it shouldn't be holding on to anything having given it a cursory glance). You could try running the REST server independent of the master. Grep for 'Starting the REST Server' in this page, http://wiki.apache.org/lucene-hadoop/Hbase/HbaseRest, for how (If you are only running one REST instance, your upload might go faster if you run multiple). St.Ack Billy wrote: I forgot to say that once restart the master only uses about 70mb of memory Billy Billy [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I not sure of this but why does the master server use up so much memory. I been running an script that been inserting data into a table for a little over 24 hours and the master crashed because of java.lang.OutOfMemoryError: Java heap space. So my question is why does the master use up so much memory at most it should store the -ROOT-,.META. tables in memory and block to table mapping. Is it cache or a memory leak? I am using the rest interface so could that be the reason? I inserted according to the high edit ids on all the region servers about 51,932,760 edits and the master ran out of memory with a heap of about 1GB. The other side to this is the data I inserted is only taking up 886.61 MB and that's with dfs.replication set to 2 so half that is only 440MB of data compressed at the block level. From what I understand the master should have lower memory and cpu usage and the namenode on hadoop should be the memory hog it has to keep up with all the data about the blocks.
RE: REST scanner get error 500 0x1b is not valid
-Original Message- From: Bryan Duxbury [mailto:[EMAIL PROTECTED] Sent: Monday, December 31, 2007 2:46 PM To: hadoop-user@lucene.apache.org Subject: Re: REST scanner get error 500 0x1b is not valid You're probably right - column names are not base64 encoded. Isn't the contract of row/column keys printable strings? No. row keys do not need to be printable strings. In the future, they may be changed to an arbitrary WritableComparable, even less restrictive than Text. Column family names are restricted to the set: \w+: There is no restriction on the names of a column family member (the part of the column key after the initial ':') Table names are restricted to the set of characters: [\w-.]+ except that table names starting with either '-' or '.' are reserved for HBase internal use. If so, then putting images in that field would appear to be a mismatch. Even if you need to use images as qualifiers, wouldn't it be more efficient to use an md5 of the image rather than the actual image? -Bryan On Dec 30, 2007, at 3:56 PM, Billy wrote: On one of my tables I get this when trying to get results from a scanner I call for the scanner location and get it fine but on the first call I get the below error Error 500 The character 0x1b is not valid. Not sure if this has to do with with the first row in my table but that should not cause the scanner to return 500 error I can open scanner on other tables this one is the only one doing this error. I tryed to delete the first row but no luck it will not delete useing shell or rest. I have tryed several ways includeing urlencodeing the row key and the col name but no luck First row in the table that would be called from the scanner and below curl results with 500 error. hql select * from anchors limit = 1; +-+- +-+ | Row | Column | Cell | +-+- +-+ | !vj!;c!$/$i$7!650i!| url:net.jcp-tokyo.www/ma| 1 | | 4d6-$n=| in/seisaku/index.htm:htt| | | | p | | +-+- +-+ 1 row(s) in set. (0.61 sec) [EMAIL PROTECTED] ~]# curl --verbose --header 'Accept: text/xml' -T /tmp/diff.txt http://192.168.1.200:60010/api/anchors/scanner? column=url: * About to connect() to 192.168.1.200 port 60010 * Trying 192.168.1.200... * connected * Connected to 192.168.1.200 (192.168.1.200) port 60010 PUT /api/anchors/scanner?column=url: HTTP/1.1 User-Agent: curl/7.12.1 (i686-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6 Host: 192.168.1.200:60010 Pragma: no-cache Accept: text/xml Content-Length: 1 Expect: 100-continue HTTP/1.1 100 Continue HTTP/1.1 201 Created Date: Sun, 30 Dec 2007 23:23:46 GMT Server: Jetty/5.1.4 (Linux/2.6.9-67.0.1.ELsmp i386 java/1.5.0_12 Location: /api/anchors/scanner/a99db67 Content-Length: 0 * Connection #0 to host 192.168.1.200 left intact * Closing connection #0 [EMAIL PROTECTED] ~]# curl --verbose --header 'Accept: text/xml' -T /tmp/diff.txt http://192.168.1.200:60010/api/anchors/scanner/a99db67 * About to connect() to 192.168.1.200 port 60010 * Trying 192.168.1.200... * connected * Connected to 192.168.1.200 (192.168.1.200) port 60010 PUT /api/anchors/scanner/a99db67 HTTP/1.1 User-Agent: curl/7.12.1 (i686-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6 Host: 192.168.1.200:60010 Pragma: no-cache Accept: text/xml Content-Length: 1 Expect: 100-continue HTTP/1.1 100 Continue HTTP/1.1 500 The+character+0x1b+is+not+valid%2E Date: Sun, 30 Dec 2007 23:24:02 GMT Server: Jetty/5.1.4 (Linux/2.6.9-67.0.1.ELsmp i386 java/1.5.0_12 Content-Type: text/html;charset=UTF-8 Content-Length: 1286 Connection: close html head titleError 500 The character 0x1b is not valid./title /head body h2HTTP ERROR: 500/h2preThe character 0x1b is not valid./pre pRequestURI=/api/anchors/scanner/a99db67/p pismalla href=http://jetty.mortbay.org;Powered by Jetty:///a/small/i/p /body /html * Closing connection #0
Question for HBase users
Do you have data stored in HBase that you cannot recreate? HADOOP-2478 will introduce an incompatible change in how HBase lays out files in HDFS so that should the root or meta tables be corrupted, it will be possible to reconstruct them from information in the file system alone. The problem is in building a migration utility. Anything that we could build to migrate from the current file structure to the new file structure would require that the root an meta regions be absolutely correct. If they are not, the migration would fail, because there is not enough information on disk currently to rebuild the root and meta regions. Is it acceptable for this change to be made without the provision of an upgrade utility? If not, are you willing to accept the risk that the upgrade may fail if you have corruption in your root or meta regions? After HADOOP-2478, we will be able to build a fault tolerant upgrade utility, should HBase's file structure change again. Additionally, we will be able to provide the equivalent of fsck for HBase after HADOOP-2478. --- Jim Kellerman, Senior Engineer; Powerset No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.17.13/1207 - Release Date: 1/2/2008 11:29 AM