ayushtkn commented on code in PR #4753:
URL: https://github.com/apache/hive/pull/4753#discussion_r1341646420


##########
common/src/java/org/apache/hadoop/hive/common/FileUtils.java:
##########
@@ -1387,6 +1388,47 @@ public static RemoteIterator<LocatedFileStatus> 
listFiles(FileSystem fs, Path pa
         status -> filter.accept(status.getPath()));
   }
 
+  /**
+   * Resolves a symlink on a local filesystem. In case of any exceptions or 
scheme other than "file"
+   * it simply returns the original path. Refer to DEBUG level logs for 
further details.
+   * @param path
+   * @param conf
+   * @return
+   * @throws IOException
+   */
+  public static Path resolveSymlinks(Path path, Configuration conf) throws 
IOException {
+    if (path == null) {
+      return null;
+    }
+
+    FileSystem srcFs;
+    String scheme = path.toUri().getScheme();
+    if (scheme != null) {
+      srcFs = path.getFileSystem(conf);

Review Comment:
   if scheme isn't null & if it is not file, do we need to get the FS, we can 
return without getting the FS, right?



##########
common/src/test/org/apache/hadoop/hive/common/TestFileUtils.java:
##########
@@ -338,4 +338,35 @@ private int assertExpectedFilePaths(RemoteIterator<? 
extends FileStatus> lfs, Li
     }
     return count;
   }
+
+  @Test
+  public void testResolveSymlinks() throws IOException {
+    HiveConf conf = new HiveConf();
+
+    java.nio.file.Path original = java.nio.file.Files.createTempFile("", "");

Review Comment:
   can we have an additional case where there are nested links.
   like 
   symlink path -> symlink path -> actual path
   
   means the provided symlink path resolves to a symlink rather than the actual 
path, we should get the final path ideally in this situation



##########
common/src/java/org/apache/hadoop/hive/common/FileUtils.java:
##########
@@ -1387,6 +1388,47 @@ public static RemoteIterator<LocatedFileStatus> 
listFiles(FileSystem fs, Path pa
         status -> filter.accept(status.getPath()));
   }
 
+  /**
+   * Resolves a symlink on a local filesystem. In case of any exceptions or 
scheme other than "file"
+   * it simply returns the original path. Refer to DEBUG level logs for 
further details.
+   * @param path
+   * @param conf
+   * @return
+   * @throws IOException
+   */
+  public static Path resolveSymlinks(Path path, Configuration conf) throws 
IOException {
+    if (path == null) {
+      return null;
+    }
+
+    FileSystem srcFs;
+    String scheme = path.toUri().getScheme();
+    if (scheme != null) {
+      srcFs = path.getFileSystem(conf);
+    } else {
+      srcFs = FileSystem.getLocal(conf);
+    }
+    LOG.debug("resolveSymlink path: {}, srcFs class: {}, scheme: {}", path, 
srcFs.getClass().getName(), scheme);
+
+    /**

Review Comment:
   you want a javadoc here? or normal /*



##########
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java:
##########
@@ -1221,9 +1221,9 @@ private Map<String, LocalResource> 
addTempResources(Configuration conf, String h
         LOG.info("Skipping vertex resource " + file + " that already exists in 
the session");
         continue;
       }
-      Path hdfsFilePath = new Path(hdfsDirPathStr, getResourceBaseName(new 
Path(file)));
-      LocalResource localResource = localizeResource(new Path(file),
-          hdfsFilePath, type, conf);
+      Path path = FileUtils.resolveSymlinks(new Path(file), conf);

Review Comment:
   We do a ``new Path(file)`` above at L1220 as well, can we refactor it into a 
variable?
   Secondly, we do have a ``null`` check for path inside FileUtils, not sure if 
it is of any use for us, since if ``file`` is ``null`` we won't pass ``null`` 
path but the code will throw an ``IAE``



##########
common/src/java/org/apache/hadoop/hive/common/FileUtils.java:
##########
@@ -1387,6 +1388,47 @@ public static RemoteIterator<LocatedFileStatus> 
listFiles(FileSystem fs, Path pa
         status -> filter.accept(status.getPath()));
   }
 
+  /**
+   * Resolves a symlink on a local filesystem. In case of any exceptions or 
scheme other than "file"
+   * it simply returns the original path. Refer to DEBUG level logs for 
further details.
+   * @param path
+   * @param conf
+   * @return
+   * @throws IOException

Review Comment:
   can you fill these values? returns resolved path and something like that



##########
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionState.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec.tez;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.StandardOpenOption;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.yarn.api.records.LocalResource;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestTezSessionState {
+
+  private class TestTezSessionPoolManager extends TezSessionPoolManager {
+    public TestTezSessionPoolManager() {
+      super();
+    }
+
+    @Override
+    public void setupPool(HiveConf conf) throws Exception {
+      super.setupPool(conf);
+    }
+
+    @Override
+    public TezSessionPoolSession createSession(String sessionId, HiveConf 
conf) {
+      return new SampleTezSessionState(sessionId, this, conf);
+    }
+  }
+
+  @Test
+  public void testSymlinkedLocalFilesAreLocalizedOnce() throws Exception {
+    Path jarPath = Files.createTempFile("jar", "");
+    Path symlinkPath = Paths.get(jarPath.toString() + ".symlink");
+    Files.createSymbolicLink(symlinkPath, jarPath);
+
+    // write some data into the fake jar, it's not a 0 length file in real life
+    Files.write(jarPath, "testSymlinkedLocalFilesToBeLocalized".getBytes(), 
StandardOpenOption.APPEND);

Review Comment:
   who cleans up these files post test execution?



##########
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionState.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec.tez;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.StandardOpenOption;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.yarn.api.records.LocalResource;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestTezSessionState {
+
+  private class TestTezSessionPoolManager extends TezSessionPoolManager {
+    public TestTezSessionPoolManager() {
+      super();
+    }
+
+    @Override
+    public void setupPool(HiveConf conf) throws Exception {
+      super.setupPool(conf);
+    }
+
+    @Override
+    public TezSessionPoolSession createSession(String sessionId, HiveConf 
conf) {
+      return new SampleTezSessionState(sessionId, this, conf);
+    }
+  }
+
+  @Test
+  public void testSymlinkedLocalFilesAreLocalizedOnce() throws Exception {
+    Path jarPath = Files.createTempFile("jar", "");
+    Path symlinkPath = Paths.get(jarPath.toString() + ".symlink");
+    Files.createSymbolicLink(symlinkPath, jarPath);
+
+    // write some data into the fake jar, it's not a 0 length file in real life
+    Files.write(jarPath, "testSymlinkedLocalFilesToBeLocalized".getBytes(), 
StandardOpenOption.APPEND);
+
+    Assert.assertTrue(Files.isSymbolicLink(symlinkPath));
+
+    HiveConf hiveConf = new HiveConf();
+    hiveConf.set(HiveConf.ConfVars.HIVE_JAR_DIRECTORY.varname, "/tmp");
+    TezSessionPoolManager poolManager = new TestTezSessionPoolManager();
+
+    TezSessionState sessionState = poolManager.getSession(null, hiveConf, 
true, false);
+
+    LocalResource l1 = 
sessionState.createJarLocalResource(jarPath.toUri().toString());
+    LocalResource l2 = 
sessionState.createJarLocalResource(symlinkPath.toUri().toString());
+
+    // local resources point to the same original resource
+    Assert.assertEquals(l1.getResource().toPath().toUri(), 
l2.getResource().toPath().toUri());

Review Comment:
   does just validation path works?
   ```
       Assert.assertEquals(l1.getResource().toPath(), 
l2.getResource().toPath());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to