Hello, As far as I understand, since "hadoop fs -du" command uses Linux' "du" internally this mean that the number of replicas (at the moment of command run) affect the result. Is that correct?
I have the following case. I have a small (1 master + 5 slaves each with DN, TT & RS) test HBase cluster with replication set to 2. The tables data size is monitoried with the help of "hadoop fs -du" command. There's a table which is constantly written to: data is only added in it. At some point I decided to reconfigure one of the slaves and shut it down. After reconfiguration (HBase already marked it as dead one) I brought it up again. Things went smoothly. However on the table size graph (I drew from data fetched with "hadoop fs -du" command) I noticed a little spike up on data size and then it went down to the normal/expected values. Can it be so that at some point of the taking out/reconfiguring/adding back node procedure at some point blocks were over-replicated? I'd expect them to be under-replicated for some time (as DN is down) and I'd expect to see the inverted spike: small decrease in data amount and then back to "expected" rate (after all blocks got replicated again). Any ideas? Thank you, Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase