So, I'm not getting how a 1KB file can cost a block of 64MB. Can anyone explain me?
On Fri, Jun 10, 2011 at 5:13 PM, Philip Zeyliger <phi...@cloudera.com> wrote: > On Fri, Jun 10, 2011 at 9:08 AM, Pedro Costa <psdc1...@gmail.com> wrote: >> This means that, when HDFS reads 1KB file from the disk, he will put >> the data in blocks of 64MB? > > No. > >> >> On Fri, Jun 10, 2011 at 5:00 PM, Philip Zeyliger <phi...@cloudera.com> wrote: >>> On Fri, Jun 10, 2011 at 8:42 AM, Pedro Costa <psdc1...@gmail.com> wrote: >>>> But, how can I say that a 1KB file will only use 1KB of disc space, if >>>> a block is configured has 64MB? In my view, if a 1KB use a block of >>>> 64MB, the file will occupy 64MB in the disc. >>> >>> A block of HDFS is the unit of distribution and replication, not the >>> unit of storage. HDFS uses the underlying file systems for physical >>> storage. >>> >>> -- Philip >>> >>>> >>>> How can you disassociate a 64MB data block from HDFS of a disk block? >>>> >>>> On Fri, Jun 10, 2011 at 5:01 PM, Marcos Ortiz <mlor...@uci.cu> wrote: >>>>> On 06/10/2011 10:35 AM, Pedro Costa wrote: >>>>> >>>>> Hi, >>>>> >>>>> If I define HDFS to use blocks of 64 MB, and I store in HDFS a 1KB >>>>> file, this file will ocupy 64MB in the HDFS? >>>>> >>>>> Thanks, >>>>> >>>>> HDFS is not very efficient storing small files, because each file is >>>>> stored >>>>> in a block (of 64 MB in your case), and the block metadata >>>>> is held in memory by the NN. But you should know that this 1KB file only >>>>> will use 1KB of disc space. >>>>> >>>>> For small files, you can use Hadoop archives. >>>>> Regards >>>>> >>>>> -- >>>>> Marcos Luís Ortíz Valmaseda >>>>> Software Engineer (UCI) >>>>> http://marcosluis2186.posterous.com >>>>> http://twitter.com/marcosluis2186 >>>>> >>>>> >>>> >>> >>