(spark) branch branch-4.0 updated: [SPARK-50897][ML][CONNECT] Avoiding instance creation in ServiceLoader

mgrund Mon, 20 Jan 2025 20:19:33 -0800

This is an automated email from the ASF dual-hosted git repository.

mgrund pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.0 by this push:
     new b27966dc3c04 [SPARK-50897][ML][CONNECT] Avoiding instance creation in 
ServiceLoader
b27966dc3c04 is described below

commit b27966dc3c0453c990ff6a11a1a9b6bb9fb1000b
Author: Martin Grund <[email protected]>
AuthorDate: Tue Jan 21 05:17:24 2025 +0100

    [SPARK-50897][ML][CONNECT] Avoiding instance creation in ServiceLoader
    
    ### What changes were proposed in this pull request?
    When converting the iterator of the ServiceLoader call to Scala, we 
explicitly instantiate all classes that the service loader provides. Since we 
do not need the instances, this PR uses the `steam()` of the ServiceLoader to 
iterate over the list of providers and just extracts the clasess.
    
    ### Why are the changes needed?
    Performance / Stability
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Existing tests
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #49577 from grundprinzip/spark_ml_service_loader.
    
    Authored-by: Martin Grund <[email protected]>
    Signed-off-by: Martin Grund <[email protected]>
    (cherry picked from commit 4c663168f4c6097ec1b22db65558a7fd8bb68ac2)
    Signed-off-by: Martin Grund <[email protected]>
---
 .../scala/org/apache/spark/sql/connect/ml/MLUtils.scala   | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git 
a/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala
 
b/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala
index 86dd013b9d98..4e93aec47ef0 100644
--- 
a/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala
+++ 
b/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.connect.ml
 
 import java.util.{Optional, ServiceLoader}
+import java.util.stream.Collectors
 
 import scala.collection.immutable.HashSet
 import scala.jdk.CollectionConverters._
@@ -50,8 +51,18 @@ private[ml] object MLUtils {
   private def loadOperators(mlCls: Class[_]): Map[String, Class[_]] = {
     val loader = Utils.getContextOrSparkClassLoader
     val serviceLoader = ServiceLoader.load(mlCls, loader)
-    val providers = serviceLoader.asScala.toList
-    providers.map(est => est.getClass.getName -> est.getClass).toMap
+    // Instead of using the iterator, we use the "stream()" method that allows
+    // to iterate over a collection of providers that do not instantiate the 
class
+    // directly. Since there is no good way to convert a Java stream to a 
Scala stream,
+    // we collect the Java stream to a Java map and then convert it to a Scala 
map.
+    serviceLoader
+      .stream()
+      .collect(
+        Collectors.toMap(
+          (est: ServiceLoader.Provider[_]) => est.`type`().getName,
+          (est: ServiceLoader.Provider[_]) => est.`type`()))
+      .asScala
+      .toMap
   }
 
   private def parseInts(ints: proto.Ints): Array[Int] = {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.0 updated: [SPARK-50897][ML][CONNECT] Avoiding instance creation in ServiceLoader

Reply via email to