Thanks for the helpful note Danny. Here's a few other things to add to your list.
+ Danny had a map that parsed Text input and then was doing the inserts into hbase using TableReduce. He was using TR probably because we suggested he use it but thinking on it, this is probably not the best MR setup for filling hbase. A MR job is going to sort and shuffle the map outputs. This intermediate shuffle/sort step is expensive -- and hbase 'sorts' on insert anyways. Danny changed his job so hbase inserts were done in the map task. The map made no emissions and his job had no reduce. + On the loading TaskTracker to RegionServers imbalance on job start, one tactic we could have tried was run a single TT at job start, then after split, add the second one (mid-job). + Danny tried hbase and ran into problems. Some of his issues were hbase bugs. Others were matters of network setup and hardware sizing. Rather than give up, he stuck with it and together we figured them out.
St.Ack
