[
https://issues.apache.org/jira/browse/HDFS-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216518#comment-16216518
]
Ewan Higgs commented on HDFS-12594:
-----------------------------------
Some minor things on a first pass:
{code}
+ if (getLastIndex() != -1) {
+ setLastIndex(-1);
+ }
{code}
Why not just set it?
I think the basic design is a good approach but it would be nicer to
restructure it by acknowledging that we're making a cursor/iterator here. So
the report request/response as follows:
{code}
message GetSnapshotDiffReportListingRequestProto {
required string snapshotRoot = 1;
required string fromSnapshot = 2;
required string toSnapshot = 3;
required string startPath = 4;
required int32 index = 5 [default = -1];
}
// ...
message SnapshotDiffReportListingProto {
// full path of the directory where snapshots were taken
repeated SnapshotDiffReportListingEntryProto modifiedEntries = 1;
repeated SnapshotDiffReportListingEntryProto createdEntries = 2;
repeated SnapshotDiffReportListingEntryProto deletedEntries = 3;
required bytes startPath = 4;
required int32 index = 5 [default = -1];
required bool isFromEarlier = 6;
}
{code}
... could be:
{code}
message SnapshotDiffReportCursorProto
required string startPath = 4;
required int32 index = 5 [default = -1];
}
message GetSnapshotDiffReportListingRequestProto {
required string snapshotRoot = 1;
required string fromSnapshot = 2;
required string toSnapshot = 3;
optional SnapshotDiffReportCursorProto cursor = 4;
}
// ...
message SnapshotDiffReportListingProto {
// full path of the directory where snapshots were taken
repeated SnapshotDiffReportListingEntryProto modifiedEntries = 1;
repeated SnapshotDiffReportListingEntryProto createdEntries = 2;
repeated SnapshotDiffReportListingEntryProto deletedEntries = 3;
required bool isFromEarlier = 4;
optional SnapshotDiffReportCursorProto cursor = 5;
}
{code}
Making a request with no cursor starts at the beginning.
> SnapshotDiff - snapshotDiff fails if the snapshotDiff report exceeds the RPC
> response limit
> -------------------------------------------------------------------------------------------
>
> Key: HDFS-12594
> URL: https://issues.apache.org/jira/browse/HDFS-12594
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Reporter: Shashikant Banerjee
> Assignee: Shashikant Banerjee
> Attachments: HDFS-12594.001.patch, HDFS-12594.002.patch,
> HDFS-12594.003.patch, SnapshotDiff_Improvemnets .pdf
>
>
> The snapshotDiff command fails if the snapshotDiff report size is larger than
> the configuration value of ipc.maximum.response.length which is by default
> 128 MB.
> Worst case, with all Renames ops in sanpshots each with source and target
> name equal to MAX_PATH_LEN which is 8k characters, this would result in at
> 8192 renames.
>
> SnapshotDiff is currently used by distcp to optimize copy operations and in
> case of the the diff report exceeding the limit , it fails with the below
> exception:
> Test set:
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 112.095 sec
> <<< FAILURE! - in
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> testDiffReportWithMillionFiles(org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport)
> Time elapsed: 111.906 sec <<< ERROR!
> java.io.IOException: Failed on local exception:
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length;
> Host Details : local host is: "hw15685.local/10.200.5.230"; destination host
> is: "localhost":59808;
> Attached is the proposal for the changes required.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]