Praveenesh,

Yes, you are absolutely right, you can indeed store >20 GB per file on
such a cluster (and have it replicated properly) due to the the HDFS'
chunking writes into smaller sized blocks.

On Thu, Jun 14, 2012 at 7:23 PM, praveenesh kumar <praveen...@gmail.com> wrote:
> @Harsh ---
>
> I was wondering...although it doesn't make much/any sense --- if a person
> wants to store the files only on HDFS (something like a backup) consider
> the above hardware scenario --- no MR processing, In that case, it should
> be possible to have a file with a size more than 20 GB to be stored on
> nodes with each having 20 GB hard disk, as replicas will be evenly
> distributed across the cluster, right ?
>
> Regards,
> Praveenesh
>
> On Thu, Jun 14, 2012 at 7:08 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Ondřej,
>>
>> If by processing you mean trying to write out (map outputs) > 20 GB of
>> data per map task, that may not be possible, as the outputs need to be
>> materialized and the disk space is the constraint there.
>>
>> Or did I not understand you correctly (in thinking you are asking
>> about MapReduce)? Cause you otherwise have ~50 GB space available for
>> HDFS consumption (assuming replication = 3 for proper reliability).
>>
>> On Thu, Jun 14, 2012 at 1:25 PM, Ondřej Klimpera <klimp...@fit.cvut.cz>
>> wrote:
>> > Hello,
>> >
>> > we're testing application on 8 nodes, where each node has 20GB of local
>> > storage available. What we are trying to achieve is to get more than
>> 20GB to
>> > be processed on this cluster.
>> >
>> > Is there a way how to distribute the data on the cluster?
>> >
>> > There is also one shared NFS storage disk with 1TB of available space,
>> which
>> > is now unused.
>> >
>> > Thanks for your reply.
>> >
>> > Ondrej Klimpera
>>
>>
>>
>> --
>> Harsh J
>>



-- 
Harsh J

Reply via email to