Hey there, So, I wiped my HDFS and reinstalled everything, and am running smaller loads... so far, so good. I've got 7 regionservers.
My job basically takes a lot of documents and metadata with unique binary keys (like "055E51294F9D9CA331D968D04B72A11C"), combines them all in a reducer, then writes it to HBase. What I'm noticing is that it's writing to mostly one or two regions on one box at a time, even though I have 7 reducers running. Monitoring everything with dstat -v, I notice that only 2 of my servers are doing much. These boxes have very low CPU idling, and high disk output (a few GB a minute). Everything else has a a little bit of disk activity (maybe 500 MB/minute), but very idle CPUs. Is this normal behavior? I guess as more data is loaded, more regionservers are split, so over time, more boxen will be loading data? Cheers, Bradford
