Does it make sense to add some kind of throttle capability on the ColumnFamilyRecordReader for Hadoop?
If I have 60 or so Map tasks running at the same time when the cluster is already heavily loaded with OLTP operations, I can get some decreased on-line performance that may not be acceptable. (I'm loading an 8 node cluster with 2000 TPS.) By default my cluster of 8 nodes (which are also the Hadoop JobTracker nodes) has 8 Map tasks per node making the get_range_slices call, based on what the ColumnFamilyInputFormat has calculated from my token ranges. I can increase the inputSplitSize (ConfigHelper.setInputSplitSIze()) so that there is only one Map task per node, and this helps quite a bit. But is it reasonable to provide a configurable sleep to cause a wait in between smaller size range queries? That would stretch out the Map time and let the OLTP processing be less affected. --Michael
