HBase is column-oriented; every cell is stored with the row, family, qualifier and timestamp so every pieces of data will bring a larger disk usage. Without any knowledge of your keys, I can't comment much more.
Then HDFS keeps a trash so every file compacted will end up there... if you just did the import, there will be a lot of these. Finally if you imported the data more than once, hbase keeps 3 versions by default. So in short, is it reasonable? Answer: it depends! J-D 2010/3/31 <y_823...@tsmc.com>: > Hi, > > We've dumped oracele data to files then put these files into different > hbase table. > The size of these files is 35G; we saw the HDFS usage up to 562G after > putting it into hbase. > Is that reasonable? > Thanks > > > > Fleming Chiu(邱宏明) > 707-6128 > y_823...@tsmc.com > 週一無肉日吃素救地球(Meat Free Monday Taiwan) > > > --------------------------------------------------------------------------- > TSMC PROPERTY > This email communication (and any attachments) is proprietary information > for the sole use of its > intended recipient. Any unauthorized review, use or distribution by anyone > other than the intended > recipient is strictly prohibited. If you are not the intended recipient, > please notify the sender by > replying to this email, and then delete this email and any copies of it > immediately. Thank you. > --------------------------------------------------------------------------- > > > >