[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534405#comment-15534405 ]
Xiao Chen commented on HDFS-10797: ---------------------------------- Thank you for the good discussions here [~mackrorysd] and [~jingzhao]! Sorry I missed the scenario Jing mentioned. The semantic looks great to me. Some nits: - There's an used var {{count}} in {{INodeDirectory#computeContentSummary}} - {{ContentSummaryComputationContext#nodeIncluded}} 's java doc has some typos: {{both either}} And more importantly, I think some update on the {{INodeDirectory#computeContentSummary}} logic: the snapshotCounts added by HDFS-8986 is supposed to count only contents under snapshots. Looks like this change break the unit test from that jira. I think the {{subtreeSummary}} is the problem here: we should only add the snapshot portion of the subtree into snapshotCounts. Example unit test is {{TestDFSShell#testDuSnapshots}}. What do you guys think? > Disk usage summary of snapshots causes renamed blocks to get counted twice > -------------------------------------------------------------------------- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Sean Mackrory > Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org