So, I'm not getting how a 1KB file can cost a block of 64MB. Can
anyone explain me?

On Fri, Jun 10, 2011 at 5:13 PM, Philip Zeyliger <phi...@cloudera.com> wrote:
> On Fri, Jun 10, 2011 at 9:08 AM, Pedro Costa <psdc1...@gmail.com> wrote:
>> This means that, when HDFS reads 1KB file from the disk, he will put
>> the data in blocks of 64MB?
>
> No.
>
>>
>> On Fri, Jun 10, 2011 at 5:00 PM, Philip Zeyliger <phi...@cloudera.com> wrote:
>>> On Fri, Jun 10, 2011 at 8:42 AM, Pedro Costa <psdc1...@gmail.com> wrote:
>>>> But, how can I say that a 1KB file will only use 1KB of disc space, if
>>>> a block is configured has 64MB? In my view, if a 1KB use a block of
>>>> 64MB, the file will occupy 64MB in the disc.
>>>
>>> A block of HDFS is the unit of distribution and replication, not the
>>> unit of storage.  HDFS uses the underlying file systems for physical
>>> storage.
>>>
>>> -- Philip
>>>
>>>>
>>>> How can you disassociate a  64MB data block from HDFS of a disk block?
>>>>
>>>> On Fri, Jun 10, 2011 at 5:01 PM, Marcos Ortiz <mlor...@uci.cu> wrote:
>>>>> On 06/10/2011 10:35 AM, Pedro Costa wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> If I define HDFS to use blocks of 64 MB, and I store in HDFS a 1KB
>>>>> file, this file will ocupy 64MB in the HDFS?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> HDFS is not very efficient storing small files, because each file is 
>>>>> stored
>>>>> in a block (of 64 MB in your case), and the block metadata
>>>>> is held in memory by the NN. But you should know that this 1KB file only
>>>>> will use 1KB of disc space.
>>>>>
>>>>> For small files, you can use Hadoop archives.
>>>>> Regards
>>>>>
>>>>> --
>>>>> Marcos Luís Ortíz Valmaseda
>>>>>  Software Engineer (UCI)
>>>>>  http://marcosluis2186.posterous.com
>>>>>  http://twitter.com/marcosluis2186
>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to