[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439451#comment-15439451 ]
Sean Mackrory commented on HDFS-10797: -------------------------------------- It's worth pointing out that the df command also shows some potentially surprising increases during this procedure, but I believe the behavior is correct. It increases more than you'd expect when appending data, but I believe this is because the snapshotted block and the amended block now need to be different blocks. The increase doesn't happen when using block sizes that divide 100MB evenly, which supports this idea. > Disk usage summary of snapshots causes renamed blocks to get counted twice > -------------------------------------------------------------------------- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Sean Mackrory > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org