Re: Forcing all blocks to be present "locally"

Eric Baldeschwieler Mon, 25 Sep 2006 16:34:52 -0700

You might try setting the block size for these files to be "verylarge". This should guaranty that the entire file ends up on one node.

If an index is composed of many files, you could "tar" them togetherso each index is exactly one file.

Might work... Of course as indexes get really large, this approachmight have side effects.



On Sep 25, 2006, at 2:32 PM, Andrzej Bialecki wrote:

Bryan A. P. Pendleton wrote:
Would the "replication" parameter be sufficient for you? This willallow youto push the system to make a copy of each block in a file on ahigher set ofnodes, possibly equal to the number of nodes in your cluster. Ofcourse,this saves no space over local copying, but it does mean that youwon't have
to do the copy manually, and local-access should be sped up.
Just use "hadoop dfs -setrep -R # /path/to/criticalfiles" where #= yourcluster size. This assumes you're running a DataNode on each nodethat youwant the copies made to (and, well, that the nodes doing lookups== the
nodes running datanodes, or else you'll end up with extra copies).
No, I don't think this would help ... I don't want to replicateeach segment to all nodes, I can't afford it - this would quicklyexhaust the total capacity of the cluster. If I set the replicationfactor lower than the size of the cluster, then again I have noguarantee that whole files are present locally.
Let's say I have 3 segments, and I want to run 3 map tasks, eachwith its own segment data. The idea is that I want to make surethat task1 executing on node1 will have all blocks from segment1 onthe local disk of node1; and the same for task2, task3 and so on.
--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Forcing all blocks to be present "locally"

Reply via email to