Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "FAQ" page has been changed by SomeOtherAccount: http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=100&rev2=101 Block replica files can be found on a DataNode in storage directories specified by configuration parameter [[http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html#dfs.datanode.data.dir|dfs.datanode.data.dir]]. If the parameter is not set in the DataNode’s {{{hdfs-site.xml}}}, then the default location {{{/tmp}}} will be used. This default is intended to be used only for testing. In a production system this is an easy way to lose actual data, as local OS may enforce recycling policies on {{{/tmp}}}. Thus the parameter must be overridden.<<BR>> If [[http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html#dfs.datanode.data.dir|dfs.datanode.data.dir]] correctly specifies storage directories on all !DataNodes, then you might have a real data loss, which can be a result of faulty hardware or software bugs. If the file(s) containing missing blocks represent transient data or can be recovered from an external source, then the easiest way is to remove (and potentially restore) them. Run [[http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#fsck|fsck]] in order to determine which files have missing blocks. If you would like (highly appreciated) to further investigate the cause of data loss, then you can dig into NameNode and DataNode logs. From the logs one can track the entire life cycle of a particular block and its replicas. + == If a block size of 64MB is used and a file is written that uses less than 64MB, will 64MB of disk space be consumed? == + + Short answer: No. + + Longer answer: Since HFDS does not do raw disk block storage, there are two block sizes in use when writing a file in HDFS: the HDFS blocks size and the underlying file system's block size. HDFS will create files up to the size of the HDFS block size as well as a meta file that contains CRC32 checksums for that block. The underlying file system store that file as increments of its block size on the actual raw disk, just as it would any other file. = Platform Specific = == Mac OS X ==
