I have 3 tables, all of them have same column family name, and empty column
qualifier.
For row id let say it has something like below for each table ('|' is a
delimiter char in this context).Table1: A|B|C Table2: B|C|A Table3: C|A|B So as we can see above, all of them pretty much have similar content (and actually same row id length), and they all have same number of rows (I have verified it): 2,181,193 rows. However, when I check their table size I found different result: root@dev> du -h -t Table1 17.70M [Table1] root@dev> du -h -t Table2 27.58M [Table2] root@dev> du -h -t Table3 32.48M [Table3] I am a bit surprised to see the different results, but I realize that Accumulo applies compression to the data. Looking at those tables size info, am I right to conclude that A|B|C somehow seems have better compression rate than B|C|A, which apparently is better than C|A|B? With this fact, it makes my job a bit more difficult to tell management disk space estimation we need to store our data in Accumulo. Earlier I was thinking if we can guesstimate how many rows we may have in the future, and multiply it by the factor x (and perhaps also multiply by 3 for replication), then that's the guesstimate I can give, but now I can't even figure out that 'x'. Does any of you have experience on this, and perhaps can share? Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/table-size-questions-tp15079.html Sent from the Developers mailing list archive at Nabble.com.
