@Harsh ---

I was wondering...although it doesn't make much/any sense --- if a person
wants to store the files only on HDFS (something like a backup) consider
the above hardware scenario --- no MR processing, In that case, it should
be possible to have a file with a size more than 20 GB to be stored on
nodes with each having 20 GB hard disk, as replicas will be evenly
distributed across the cluster, right ?

Regards,
Praveenesh

On Thu, Jun 14, 2012 at 7:08 PM, Harsh J <ha...@cloudera.com> wrote:

> Ondřej,
>
> If by processing you mean trying to write out (map outputs) > 20 GB of
> data per map task, that may not be possible, as the outputs need to be
> materialized and the disk space is the constraint there.
>
> Or did I not understand you correctly (in thinking you are asking
> about MapReduce)? Cause you otherwise have ~50 GB space available for
> HDFS consumption (assuming replication = 3 for proper reliability).
>
> On Thu, Jun 14, 2012 at 1:25 PM, Ondřej Klimpera <klimp...@fit.cvut.cz>
> wrote:
> > Hello,
> >
> > we're testing application on 8 nodes, where each node has 20GB of local
> > storage available. What we are trying to achieve is to get more than
> 20GB to
> > be processed on this cluster.
> >
> > Is there a way how to distribute the data on the cluster?
> >
> > There is also one shared NFS storage disk with 1TB of available space,
> which
> > is now unused.
> >
> > Thanks for your reply.
> >
> > Ondrej Klimpera
>
>
>
> --
> Harsh J
>

Reply via email to