[ 
https://issues.apache.org/jira/browse/HDFS-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245081#comment-16245081
 ] 

Tsz Wo Nicholas Sze commented on HDFS-12594:
--------------------------------------------

/SnapshotDiffReportListing.j
- In DiffReportListingEntry, getSourcePath(), getTargetPath() and getParent() 
should return byte[][].
It is inefficient to convert byte[] to byte[][].  E.g. in INODE_COMPARATOR, 
getParent() converts it to byte[] and then the DiffReportListingEntry 
constructor convert it back to byte[][].

- In ChildrenDiff, by the constructor, createdList is never null so that we 
should not check it.
{code}
+    public List<DiffReportListingEntry> getCreatedList() {
+      if (createdList == null) {
+        return Collections.<DiffReportListingEntry>emptyList();
+      } else {
+        return createdList;
+      }
+    }
{code}
-* Similarly for getDeletedList().
-* addCreatedList and addDeletedList are not used.  Please remove them.

- {{Collections.<DiffReportListingEntry> emptyList()}} can drops the type 
parameter, i.e. {{Collections.emptyList()}}.
-* Similarly, change {{new HashMap<Long, ChildrenDiff>()}}  to {{new 
HashMap<>()}}

- In SnapshotDiffReportListing, getTotalEntries() and all getModifyListSize() + 
getCreateListSize() + getDeleteListSize() are not used. Please remove them.

- Do not call clone().  It is expensive.  E.g. why first clone the startPath 
and then convert it to String?
{code}
      startPath = DFSUtilClient.bytes2String(report.getStartPath());
{code}

- Wrong javadoc?
{code}
+  /**
+   * store the starting path to process across RPC's for snapshot diff.
+   */
+  private final boolean isFromEarlier;
{code}

- Use long instead of Long, boolean instead of Boolean, etc.


> SnapshotDiff - snapshotDiff fails if the snapshotDiff report exceeds the RPC 
> response limit
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12594
>                 URL: https://issues.apache.org/jira/browse/HDFS-12594
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>         Attachments: HDFS-12594.001.patch, HDFS-12594.002.patch, 
> HDFS-12594.003.patch, HDFS-12594.004.patch, HDFS-12594.005.patch, 
> SnapshotDiff_Improvemnets .pdf
>
>
> The snapshotDiff command fails if the snapshotDiff report size is larger than 
> the configuration value of ipc.maximum.response.length which is by default 
> 128 MB. 
> Worst case, with all Renames ops in sanpshots each with source and target 
> name equal to MAX_PATH_LEN which is 8k characters, this would result in at 
> 8192 renames.
>  
> SnapshotDiff is currently used by distcp to optimize copy operations and in 
> case of the the diff report exceeding the limit , it fails with the below 
> exception:
> Test set: 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 112.095 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> testDiffReportWithMillionFiles(org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport)
>   Time elapsed: 111.906 sec  <<< ERROR!
> java.io.IOException: Failed on local exception: 
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; 
> Host Details : local host is: "hw15685.local/10.200.5.230"; destination host 
> is: "localhost":59808;
> Attached is the proposal for the changes required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to