Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "FAQ" page has been changed by SomeOtherAccount:
http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=100&rev2=101

  Block replica files can be found on a DataNode in storage directories 
specified by configuration parameter 
[[http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html#dfs.datanode.data.dir|dfs.datanode.data.dir]].
 If the parameter is not set in the DataNode’s {{{hdfs-site.xml}}}, then the 
default location {{{/tmp}}} will be used. This default is intended to be used 
only for testing. In a production system this is an easy way to lose actual 
data, as local OS may enforce recycling policies on {{{/tmp}}}. Thus the 
parameter must be overridden.<<BR>>
  If 
[[http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html#dfs.datanode.data.dir|dfs.datanode.data.dir]]
 correctly specifies storage directories on all !DataNodes, then you might have 
a real data loss, which can be a result of faulty hardware or software bugs. If 
the file(s) containing missing blocks represent transient data or can be 
recovered from an external source, then the easiest way is to remove (and 
potentially restore) them. Run 
[[http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#fsck|fsck]] 
in order to determine which files have missing blocks. If you would like 
(highly appreciated) to further investigate the cause of data loss, then you 
can dig into NameNode and DataNode logs. From the logs one can track the entire 
life cycle of a particular block and its replicas.
  
+ == If a block size of 64MB is used and a file is written that uses less than 
64MB, will 64MB of disk space be consumed? ==
+ 
+ Short answer: No.  
+ 
+ Longer answer:  Since HFDS does not do raw disk block storage, there are two 
block sizes in use when writing a file in HDFS: the HDFS blocks size and the 
underlying file system's block size.  HDFS will create files up to the size of 
the HDFS block size as well as a meta file that contains CRC32 checksums for 
that block.  The underlying file system store that file as increments of its 
block size on the actual raw disk, just as it would any other file.
  
  = Platform Specific =
  == Mac OS X ==

Reply via email to