iemejia commented on code in PR #12400:
URL: https://github.com/apache/gluten/pull/12400#discussion_r3499925670


##########
backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala:
##########
@@ -527,10 +527,32 @@ object VeloxConfig extends ConfigRegistry {
   val COLUMNAR_VELOX_FILE_HANDLE_CACHE_ENABLED =
     
buildStaticConf("spark.gluten.sql.columnar.backend.velox.fileHandleCacheEnabled")
       .doc(
-        "Disables caching if false. File handle cache should be disabled " +
-          "if files are mutable, i.e. file content may change while file path 
stays the same.")
+        "Enables caching of file handles to avoid repeated open/close overhead 
on remote " +
+          "filesystems. Should be disabled if files are mutable, i.e. file 
content may " +
+          "change while file path stays the same.")
       .booleanConf
-      .createWithDefault(false)
+      .createWithDefault(true)
+
+  val COLUMNAR_VELOX_NUM_CACHE_FILE_HANDLES =
+    
buildStaticConf("spark.gluten.sql.columnar.backend.velox.numCacheFileHandles")
+      .doc(
+        "Maximum number of entries in the file handle cache. Each entry holds 
an open " +
+          "file descriptor (local FS) or connection state (remote FS). Note 
that on " +
+          "local filesystems, high values may approach the OS file descriptor 
limit " +
+          "(ulimit -n). On remote object stores (S3, ABFS, GCS) entries are 
HTTP " +
+          "connections, not OS file descriptors.")
+      .intConf
+      .createWithDefault(10000)

Review Comment:
   Fixed. Added `.checkValue(_ > 0, "must be a positive number")` following the 
same pattern used by other configs in this file (e.g., `ssdCacheIOThreads`).



##########
backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala:
##########
@@ -527,10 +527,32 @@ object VeloxConfig extends ConfigRegistry {
   val COLUMNAR_VELOX_FILE_HANDLE_CACHE_ENABLED =
     
buildStaticConf("spark.gluten.sql.columnar.backend.velox.fileHandleCacheEnabled")
       .doc(
-        "Disables caching if false. File handle cache should be disabled " +
-          "if files are mutable, i.e. file content may change while file path 
stays the same.")
+        "Enables caching of file handles to avoid repeated open/close overhead 
on remote " +
+          "filesystems. Should be disabled if files are mutable, i.e. file 
content may " +
+          "change while file path stays the same.")
       .booleanConf
-      .createWithDefault(false)
+      .createWithDefault(true)
+
+  val COLUMNAR_VELOX_NUM_CACHE_FILE_HANDLES =
+    
buildStaticConf("spark.gluten.sql.columnar.backend.velox.numCacheFileHandles")
+      .doc(
+        "Maximum number of entries in the file handle cache. Each entry holds 
an open " +
+          "file descriptor (local FS) or connection state (remote FS). Note 
that on " +
+          "local filesystems, high values may approach the OS file descriptor 
limit " +
+          "(ulimit -n). On remote object stores (S3, ABFS, GCS) entries are 
HTTP " +
+          "connections, not OS file descriptors.")
+      .intConf
+      .createWithDefault(10000)
+
+  val COLUMNAR_VELOX_FILE_HANDLE_EXPIRATION_DURATION_MS =
+    
buildStaticConf("spark.gluten.sql.columnar.backend.velox.fileHandleExpirationDurationMs")
+      .doc(
+        "Expiration time in milliseconds for cached file handles. Handles not 
accessed " +
+          "within this duration are evicted from the cache. This prevents 
stale handles " +
+          "from accumulating (e.g., expired HDFS leases, closed remote 
connections). " +
+          "A value of 0 disables TTL-based eviction.")
+      .longConf
+      .createWithDefault(600000L) // 10 minutes

Review Comment:
   Fixed. Added `.checkValue(_ >= 0, "must be a non-negative number (0 disables 
TTL-based eviction)")` — rejects negative values while preserving the 
documented 0-to-disable behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to