Thanks Ray, that's what I've been doing also. I just now created a JIRA issue to see if we can do it automatically in the future (maybe based on the number of regions, like we do for getSplits).
https://issues.apache.org/jira/browse/HIVE-1406 JVS On Jun 14, 2010, at 9:53 AM, Ray Duong wrote: Try setting the number of mappers base on your cluster size. set mapred.map.tasks=XX; Also, make sure to configure hive to hit multiple zookeepers. -ray On Mon, Jun 14, 2010 at 9:00 AM, Martin Fiala <[email protected]<mailto:[email protected]>> wrote: Hello, I am a newbie to Hive, but I'm already quite familiar with Hadoop/HBase. I must appreciate the whole project and especially the new integration with HBase, which is what we really need. :) So back to the problem, I got Hive running with HBase, it works really nice, gets data from HBase, computes something and returns results. But even when I run it on a large table with hundreds of regions, it splits data only into 2 maps, which means only 2 task trackers running on a 10 node cluster. When I run similar task written in Java+MapReduce and fire it up, it splits input into hundreds of maps and the computation is nicely distributed. Is it some misconfiguration or why the Hive's InputSplit gives me only 2 maps? Regards, Martin Fiala
