Hello, As far as I know, the number of map tasks for "scan-based" mapreduce job is equal (not more than) number of underlying regions (for scan). Of course, if the max map task capacity is big enough. I have a situation, when map-side processing is very heavy but uses quite small amount of records from the HBase table. It may occur that those records belongs to one or several regions and this results in running just several map tasks. This may make processing very slow without utilising all of the cluster resources :(. Is there a way to set the minimal number (or just particular number) of map tasks in this situation? Is the only way is to enhance TableInputFormat for me?
Thank you, Alex Baranau --- http://sematext.com
