Thanks, I'll try.

One more question, I've got few more nodes, which can be added to the cluster. But how to do that?

If I understand it (according to Hadoop's wiki pages):

1. On master node - edit slaves file and add IP addresses of new nodes (everything clear)
2. log in to each newly added node and run (it's clear to me too)

$ hadoop-daemon.sh start datanode
$ hadoop-daemon.sh start tasktracker

3. Now I'm not sure, I'm not using dfs.include/mapred.include, do I have to run commands:

$ hadoop dfsadmin -refreshNodes
$ hadoop mradmin -refreshNodes

If yes, must it be run on master node, or new slaves nodes?

Ondrej



On 06/14/2012 04:03 PM, Harsh J wrote:
Ondřej,

That isn't currently possible with local storage FS. Your 1 TB NFS
point can help but I suspect it may act as a slow-down point if nodes
use it in parallel. Perhaps mount it only on 3-4 machines (or less),
instead of all, to avoid that?

On Thu, Jun 14, 2012 at 7:28 PM, Ondřej Klimpera<klimp...@fit.cvut.cz>  wrote:
Hello,

you're right. That's exactly what I ment. And your answer is exactly what I
thought. I was just wondering if Hadoop can distribute the data to other
node's local storages if own local space is full.

Thanks


On 06/14/2012 03:38 PM, Harsh J wrote:
Ondřej,

If by processing you mean trying to write out (map outputs)>    20 GB of
data per map task, that may not be possible, as the outputs need to be
materialized and the disk space is the constraint there.

Or did I not understand you correctly (in thinking you are asking
about MapReduce)? Cause you otherwise have ~50 GB space available for
HDFS consumption (assuming replication = 3 for proper reliability).

On Thu, Jun 14, 2012 at 1:25 PM, Ondřej Klimpera<klimp...@fit.cvut.cz>
  wrote:
Hello,

we're testing application on 8 nodes, where each node has 20GB of local
storage available. What we are trying to achieve is to get more than 20GB
to
be processed on this cluster.

Is there a way how to distribute the data on the cluster?

There is also one shared NFS storage disk with 1TB of available space,
which
is now unused.

Thanks for your reply.

Ondrej Klimpera





Reply via email to