[GitHub] [hbase] BukrosSzabolcs commented on a change in pull request #4066: HBASE-26707: Reduce number of renames during bulkload

GitBox Fri, 04 Feb 2022 08:03:22 -0800


BukrosSzabolcs commented on a change in pull request #4066:
URL: https://github.com/apache/hbase/pull/4066#discussion_r799601756




##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SecureBulkLoadManager.java
##########
@@ -412,35 +427,37 @@ private void closeSrcFs() throws IOException {
     }
 
     @Override
-    public void failedBulkLoad(final byte[] family, final String srcPath) 
throws IOException {
+    public void failedBulkLoad(final byte[] family, final String stagedPath) 
throws IOException {
       try {
-        Path p = new Path(srcPath);
-        if (srcFs == null) {
-          srcFs = FileSystem.newInstance(p.toUri(), conf);
-        }
-        if (!FSUtils.isSameHdfs(conf, srcFs, fs)) {
-          // files are copied so no need to move them back
+        String src = origlSources.get(stagedPath);
+        if(StringUtils.isEmpty(src)){
+          LOG.debug(stagedPath + " was not moved to staging. No need to move 
back");
           return;
         }
-        Path stageP = new Path(stagingDir, new Path(Bytes.toString(family), 
p.getName()));
 
-        // In case of Replication for bulk load files, hfiles are not renamed 
by end point during
-        // prepare stage, so no need of rename here again
-        if (p.equals(stageP)) {

Review comment:
       I do not see how could this happen. The staging folder is a temporary 
folder created and then cleaned up by the bulk load process. As far as I can 
tell it can not be the source folder for the process.
   It might have been an intentional way of preventing moving back files to 
their original place during the second `failedBulkLoad` call, but I do not see 
why would any want to do that either.
   Also now we have a list of files that would only contain the path if it was 
moved, so checking if it is still at it's original place would be redundant.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] BukrosSzabolcs commented on a change in pull request #4066: HBASE-26707: Reduce number of renames during bulkload

Reply via email to