Re: Forcing all blocks to be present "locally"

Andrzej Bialecki Mon, 25 Sep 2006 14:33:35 -0700

Bryan A. P. Pendleton wrote:

Would the "replication" parameter be sufficient for you? This willallow youto push the system to make a copy of each block in a file on a higherset of
nodes, possibly equal to the number of nodes in your cluster. Of course,
this saves no space over local copying, but it does mean that youwon't have
to do the copy manually, and local-access should be sped up.
Just use "hadoop dfs -setrep -R # /path/to/criticalfiles" where # = your
cluster size. This assumes you're running a DataNode on each node thatyou
want the copies made to (and, well, that the nodes doing lookups == the
nodes running datanodes, or else you'll end up with extra copies).

No, I don't think this would help ... I don't want to replicate eachsegment to all nodes, I can't afford it - this would quickly exhaust thetotal capacity of the cluster. If I set the replication factor lowerthan the size of the cluster, then again I have no guarantee that wholefiles are present locally.

Let's say I have 3 segments, and I want to run 3 map tasks, each withits own segment data. The idea is that I want to make sure that task1executing on node1 will have all blocks from segment1 on the local diskof node1; and the same for task2, task3 and so on.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Forcing all blocks to be present "locally"

Reply via email to