Also depending on compression type chosen it might take less disk space
------------------------------ On Fri 11 Jan, 2013 3:53 PM IST Mesika, Asaf wrote: >130 GB raw data will take in HBase since it adds the family name, qualifier >and timestamp to each value, so it can even be 150GB. You can check it >exactly, by loading only one row with one column and see how much it takes on >the HDFS file system (run compaction first). > >Next, you 5 times that since you have 5 times replication, so 5x150=750GB > >On Jan 11, 2013, at 5:07 AM, Panshul Whisper wrote: > >> Hello, >> >> I have a 5 node hadoop cluster and a fully distributed Hbase setup on the >> cluster with 130 GB of HDFS space avaialble. HDFS replication is set to 5. >> >> I have a total of 115 GB of JSON files that need to be loaded into the >> Hbase database and then they have to processed. >> >> So is the available HDFS space sufficient for the operations?? considering >> the replication and all factors? >> or should I increase the space and by how much? >> >> Thanking You, >> >> -- >> Regards, >> Ouch Whisper >> 010101010101 >
