I will be running a cluster with 100-200 nodes, most of which will be
shut down at night. For the sake of example lets say that 4 'reliable
slaves' will remain turned on continuously, and let me call the rest
'unreliable slaves'.

Storage wise, how would I go about this (using HDFS)? I figure that it
would be a bad idea to put persistent data on the unreliable slaves,
since turning ~100 computers of simultaneously might wreck havoc to the
hdfs(?). So the idea would be to let persistent data only reside on
reliable slaves.

Would setting dfs.datanode.du.pct=0 on the unreliable slaves do the
trick?

Cheers,
Mikkel

Reply via email to