Re: How does sqoop distribute it's data evenly across HDFS?

Harsh J Wed, 16 Mar 2011 21:29:24 -0700

There's a balancer available to re-balance DNs across the HDFS cluster
in general. It is available in the $HADOOP_HOME/bin/ directory as
start-balancer.sh


But what I think sqoop implies is that your data is balanced due to
the map jobs it runs for imports (using a provided split factor
between maps), which should make it write chunks of data out to
different DataNodes.

I guess you could get more information on the Sqoop mailing list
[email protected],
https://groups.google.com/a/cloudera.org/group/sqoop-user/topics

On Thu, Mar 17, 2011 at 5:04 AM, BeThere <[email protected]> wrote:
> The sqoop documentation seems to imply that it uses the key information 
> provided to it on the command line to ensure that the SQL data is distributed 
> evenly across the DFS. However I cannot see any mechanism for achieving this 
> explicitly other than relying on the implicit distribution provided by 
> default by HDFS. Is this correct or are there methods on some API that allow 
> me to manage the distribution to ensure that it is balanced across all nodes 
> in my cluster?
>
> Thanks,
>
>         Andy D
>
>



-- 
Harsh J
http://harshj.com

Re: How does sqoop distribute it's data evenly across HDFS?

Reply via email to