[GitHub] [tez] steveloughran commented on a diff in pull request #274: TEZ-4479: Eagerly Init/Load FileSystem In Tez Task Containers

via GitHub Mon, 27 Mar 2023 07:28:05 -0700


steveloughran commented on code in PR #274:
URL: https://github.com/apache/tez/pull/274#discussion_r1149332283



##########
tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TezChild.java:
##########
@@ -503,6 +506,33 @@ public static TezChild newTezChild(Configuration conf, 
String host, int port, St
         hadoopShim);
   }
 
+  private static void eagerInitFileSystemPaths(Configuration conf) {
+    Collection<String> eagerInitPaths = conf.getTrimmedStringCollection(
+        TezConfiguration.TEZ_TASK_EAGER_INIT_FS_PATHS);
+    if (eagerInitFsPool == null && !eagerInitPaths.isEmpty()) {
+      eagerInitFsPool = Executors.newCachedThreadPool(new 
ThreadFactoryBuilder()
+          .setDaemon(true)
+          .setNameFormat("Eager-Init-Fs-Thread-%d")
+          .build());
+    }
+    for (String path : eagerInitPaths) {
+      eagerInitFsPool.execute(new Runnable() {

Review Comment:
   before rushing to create lots of fs instances in parallel, look at 
HADOOP-17313 and why we actually added semaphores to stop apps like tez 
creating too many at the same time. this code may cause overload problems, or 
the fs semaphore will hold you back for safety.
   
   best to look at why its taking so long; if s3a bucket existence checks 
aren't involved, then it'll be whatever auth mechanism is plugged in. same for 
abfs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tez] steveloughran commented on a diff in pull request #274: TEZ-4479: Eagerly Init/Load FileSystem In Tez Task Containers

Reply via email to