On Fri, Jun 10, 2011 at 9:08 AM, Pedro Costa <[email protected]> wrote: > This means that, when HDFS reads 1KB file from the disk, he will put > the data in blocks of 64MB?
No. > > On Fri, Jun 10, 2011 at 5:00 PM, Philip Zeyliger <[email protected]> wrote: >> On Fri, Jun 10, 2011 at 8:42 AM, Pedro Costa <[email protected]> wrote: >>> But, how can I say that a 1KB file will only use 1KB of disc space, if >>> a block is configured has 64MB? In my view, if a 1KB use a block of >>> 64MB, the file will occupy 64MB in the disc. >> >> A block of HDFS is the unit of distribution and replication, not the >> unit of storage. HDFS uses the underlying file systems for physical >> storage. >> >> -- Philip >> >>> >>> How can you disassociate a 64MB data block from HDFS of a disk block? >>> >>> On Fri, Jun 10, 2011 at 5:01 PM, Marcos Ortiz <[email protected]> wrote: >>>> On 06/10/2011 10:35 AM, Pedro Costa wrote: >>>> >>>> Hi, >>>> >>>> If I define HDFS to use blocks of 64 MB, and I store in HDFS a 1KB >>>> file, this file will ocupy 64MB in the HDFS? >>>> >>>> Thanks, >>>> >>>> HDFS is not very efficient storing small files, because each file is stored >>>> in a block (of 64 MB in your case), and the block metadata >>>> is held in memory by the NN. But you should know that this 1KB file only >>>> will use 1KB of disc space. >>>> >>>> For small files, you can use Hadoop archives. >>>> Regards >>>> >>>> -- >>>> Marcos Luís Ortíz Valmaseda >>>> Software Engineer (UCI) >>>> http://marcosluis2186.posterous.com >>>> http://twitter.com/marcosluis2186 >>>> >>>> >>> >> > > > > -- > --------------------------- > Pedro Sá da Costa > > @: [email protected] > @: [email protected] >
