@Harsh --- I was wondering...although it doesn't make much/any sense --- if a person wants to store the files only on HDFS (something like a backup) consider the above hardware scenario --- no MR processing, In that case, it should be possible to have a file with a size more than 20 GB to be stored on nodes with each having 20 GB hard disk, as replicas will be evenly distributed across the cluster, right ?
Regards, Praveenesh On Thu, Jun 14, 2012 at 7:08 PM, Harsh J <ha...@cloudera.com> wrote: > Ondřej, > > If by processing you mean trying to write out (map outputs) > 20 GB of > data per map task, that may not be possible, as the outputs need to be > materialized and the disk space is the constraint there. > > Or did I not understand you correctly (in thinking you are asking > about MapReduce)? Cause you otherwise have ~50 GB space available for > HDFS consumption (assuming replication = 3 for proper reliability). > > On Thu, Jun 14, 2012 at 1:25 PM, Ondřej Klimpera <klimp...@fit.cvut.cz> > wrote: > > Hello, > > > > we're testing application on 8 nodes, where each node has 20GB of local > > storage available. What we are trying to achieve is to get more than > 20GB to > > be processed on this cluster. > > > > Is there a way how to distribute the data on the cluster? > > > > There is also one shared NFS storage disk with 1TB of available space, > which > > is now unused. > > > > Thanks for your reply. > > > > Ondrej Klimpera > > > > -- > Harsh J >