zml1206 commented on code in PR #11329:
URL: 
https://github.com/apache/incubator-gluten/pull/11329#discussion_r2671302381


##########
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxTransformerApi.scala:
##########
@@ -70,11 +70,28 @@ class VeloxTransformerApi extends TransformerApi with 
Logging {
   override def postProcessNativeConfig(
       nativeConfMap: JMap[String, String],
       backendPrefix: String): Unit = {
-    // 'spark.hadoop.fs.s3a.connection.timeout' by velox requires time unit, 
hadoop-aws versions
-    // before 3.4 do not have time unit.
-    val s3sConnectionTimeout = 
nativeConfMap.get("spark.hadoop.fs.s3a.connection.timeout")
-    if (NumberUtils.isCreatable(s3sConnectionTimeout)) {
-      nativeConfMap.put("spark.hadoop.fs.s3a.connection.timeout", 
s"${s3sConnectionTimeout}ms")
+    // S3A configurations that require time units for Velox.
+    // Hadoop-aws versions before 3.4 do not include time units by default.
+    // Reference: 
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html
+    //
+    // Map of config key to its default time unit:
+    // - Most S3A time configs default to milliseconds
+    // - fs.s3a.threads.keepalivetime defaults to seconds (special case)
+    val s3aTimeConfigs = Map(
+      "spark.hadoop.fs.s3a.connection.timeout" -> "ms",
+      "spark.hadoop.fs.s3a.connection.establish.timeout" -> "ms",
+      "spark.hadoop.fs.s3a.threads.keepalivetime" -> "s", // Note: defaults to 
seconds
+      "spark.hadoop.fs.s3a.connection.ttl" -> "ms",
+      "spark.hadoop.fs.s3a.multipart.purge.age" -> "ms"

Review Comment:
   Before Hadoop-3.4, there is no configuration `fs.s3a.connection.ttl`, and 
the unit for `fs.s3a.multipart.purge.age` is seconds. 
   
https://github.com/apache/hadoop/blob/1be78238728da9266a4f88195058f08fd012bf9c/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml#L1553-L1559



##########
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxTransformerApi.scala:
##########
@@ -70,11 +70,28 @@ class VeloxTransformerApi extends TransformerApi with 
Logging {
   override def postProcessNativeConfig(
       nativeConfMap: JMap[String, String],
       backendPrefix: String): Unit = {
-    // 'spark.hadoop.fs.s3a.connection.timeout' by velox requires time unit, 
hadoop-aws versions
-    // before 3.4 do not have time unit.
-    val s3sConnectionTimeout = 
nativeConfMap.get("spark.hadoop.fs.s3a.connection.timeout")
-    if (NumberUtils.isCreatable(s3sConnectionTimeout)) {
-      nativeConfMap.put("spark.hadoop.fs.s3a.connection.timeout", 
s"${s3sConnectionTimeout}ms")
+    // S3A configurations that require time units for Velox.
+    // Hadoop-aws versions before 3.4 do not include time units by default.
+    // Reference: 
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html

Review Comment:
   Reference is hadoop-aws-3.4, it should be replaced with an earlier version, 
such as 3.4.6.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to