[
https://issues.apache.org/jira/browse/HDFS-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238670#comment-16238670
]
Tsz Wo Nicholas Sze commented on HDFS-12594:
--------------------------------------------
Some other comments on the patch.
- Since there is already a
"dfs.namenode.snapshotdiff.allow.snap-root-descendant", rename
"dfs.snapshotdiff-report.limit" to "dfs.namenode.snapshotdiff.listing.limit"
and move it next to DFS_NAMENODE_SNAPSHOT_DIFF_ALLOW_SNAP_ROOT_DESCENDANT.
- Use int for index and snapshotDiffReportLimit instead of Integer. Use long
instead of Long, boolean instead of Boolean, etc.
- SnapshotDiffReportGenerator should be moved to the
org.apache.hadoop.hdfs.client.impl package.
- Use byte[][] in SnapshotDiffReportListing for sourcePath and targetPath
-* bytes2String and string2Bytes are expensive, please avoid calling them.
{code}
public byte[] getParent() {
if (sourcePath == null || DFSUtilClient.bytes2String(sourcePath)
.isEmpty()) {
return null;
} else {
Path path = new Path(DFSUtilClient.bytes2String(sourcePath));
return DFSUtilClient.string2Bytes(path.getParent().toString());
}
}
{code}
- In DistributedFileSystem.getSnapshotDiffReportInternal,
-* deltetedList should be deletedList
-* remove snapDiffReport, just return snapshotDiffReport.generateReport();
I have not finished reviewing the entire patch yet. Will continue.
> SnapshotDiff - snapshotDiff fails if the snapshotDiff report exceeds the RPC
> response limit
> -------------------------------------------------------------------------------------------
>
> Key: HDFS-12594
> URL: https://issues.apache.org/jira/browse/HDFS-12594
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Reporter: Shashikant Banerjee
> Assignee: Shashikant Banerjee
> Priority: Major
> Attachments: HDFS-12594.001.patch, HDFS-12594.002.patch,
> HDFS-12594.003.patch, HDFS-12594.004.patch, SnapshotDiff_Improvemnets .pdf
>
>
> The snapshotDiff command fails if the snapshotDiff report size is larger than
> the configuration value of ipc.maximum.response.length which is by default
> 128 MB.
> Worst case, with all Renames ops in sanpshots each with source and target
> name equal to MAX_PATH_LEN which is 8k characters, this would result in at
> 8192 renames.
>
> SnapshotDiff is currently used by distcp to optimize copy operations and in
> case of the the diff report exceeding the limit , it fails with the below
> exception:
> Test set:
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 112.095 sec
> <<< FAILURE! - in
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> testDiffReportWithMillionFiles(org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport)
> Time elapsed: 111.906 sec <<< ERROR!
> java.io.IOException: Failed on local exception:
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length;
> Host Details : local host is: "hw15685.local/10.200.5.230"; destination host
> is: "localhost":59808;
> Attached is the proposal for the changes required.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]