Alex, So each row = 24 column-families(?) * 300,000,000 entries/family * ~40 bytes/entry = about 270GB/row ?
And that * 100,000 rows = about 27 petabytes of data? Is my math right here? :) With a big enough cluster, you might be able to get that amount of data in hadoop. I'm not sure anyone has had an HBase installation that big. One thing that is definitely not going to work with HBase is having single rows that are many GBs. A row can never be split across regions, and the default region size is 256MB (though configurable), so you'd be 3 orders of magnitude greater than the recommended maximum. So to directly answer your questions, one limitation is the size of a single row. The other limitation is the number of regions that can be handled on each node. The upper limits are in the 400-500 region / region-server range though this can vary depending on your hardware and usage patterns. That's about 100GB on an HBase node, so if you were to get this much data into HBase you'd need several hundreds of servers. One thing you'd definitely need to do is rework your schema a bit, spreading things across more rows so you can have reasonably sized regions. My short answer would be that this is not currently possible in HBase unless you had a very very large cluster and a bit of time to work out some bugs that I'm sure will pop up with an installation of this size. My question to you is, do you really need random access of this granularity to 27 petabytes of data? Jonathan Gray -----Original Message----- From: Alex Newman [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 24, 2008 6:21 AM To: hbase-user@hadoop.apache.org Subject: Scalability of HBase Where are the scalability limitations with hbase. Number of tablets? The size of the columns? I am thinking about 100k rows 24 columns But with on the order 300M entries per column with something like (timestamp,<32 byte string>) would something like this scale.