Ondřej, If by processing you mean trying to write out (map outputs) > 20 GB of data per map task, that may not be possible, as the outputs need to be materialized and the disk space is the constraint there.
Or did I not understand you correctly (in thinking you are asking about MapReduce)? Cause you otherwise have ~50 GB space available for HDFS consumption (assuming replication = 3 for proper reliability). On Thu, Jun 14, 2012 at 1:25 PM, Ondřej Klimpera <klimp...@fit.cvut.cz> wrote: > Hello, > > we're testing application on 8 nodes, where each node has 20GB of local > storage available. What we are trying to achieve is to get more than 20GB to > be processed on this cluster. > > Is there a way how to distribute the data on the cluster? > > There is also one shared NFS storage disk with 1TB of available space, which > is now unused. > > Thanks for your reply. > > Ondrej Klimpera -- Harsh J