ever4Kenny opened a new issue, #3579:
URL: https://github.com/apache/celeborn/issues/3579

   ### What is the bug(with logs or screenshots)?
   26/01/05 16:30:15 ERROR ExecutorClassLoader: Failed to check existence of 
class org.apache.spark.shuffle.celeborn.ColumnarHashBasedShuffleWriter on REPL 
class server at 
spark://dc05-prod-lan-hadoop-host-168159.host.idcvdian.com:24503/classes
   java.lang.InterruptedException: 
AbstractBootstrap$PendingRegistrationPromise@225ee7c4(incomplete)
        at 
io.netty.util.concurrent.DefaultPromise.await0(DefaultPromise.java:684)
        at 
io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:300)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:289)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:214)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:226)
        at 
org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399)
        at 
org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
        at 
org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366)
        at 
org.apache.spark.executor.ExecutorClassLoader.getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:135)
        at 
org.apache.spark.executor.ExecutorClassLoader.$anonfun$fetchFn$1(ExecutorClassLoader.scala:66)
        at 
org.apache.spark.executor.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:176)
        at 
org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:113)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at 
org.apache.celeborn.reflect.DynConstructors$Builder.impl(DynConstructors.java:158)
        at 
org.apache.spark.shuffle.celeborn.SparkUtils.<clinit>(SparkUtils.java:191)
        at 
org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO.<init>(CelebornShuffleDataIO.java:41)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2780)
        at scala.collection.immutable.List.flatMap(List.scala:366)
        at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2772)
        at 
org.apache.spark.shuffle.ShuffleDataIOUtils$.loadShuffleDataIO(ShuffleDataIOUtils.scala:35)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager$.org$apache$spark$shuffle$sort$SortShuffleManager$$loadShuffleExecutorComponents(SortShuffleManager.scala:253)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager.shuffleExecutorComponents$lzycompute(SortShuffleManager.scala:88)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager.shuffleExecutorComponents(SortShuffleManager.scala:88)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager.getWriter(SortShuffleManager.scala:170)
        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   26/01/05 16:30:15 ERROR Executor: Exception in task 26.1 in stage 1.0 (TID 
147)
   java.lang.ExceptionInInitializerError
        at 
org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO.<init>(CelebornShuffleDataIO.java:41)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2780)
        at scala.collection.immutable.List.flatMap(List.scala:366)
        at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2772)
        at 
org.apache.spark.shuffle.ShuffleDataIOUtils$.loadShuffleDataIO(ShuffleDataIOUtils.scala:35)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager$.org$apache$spark$shuffle$sort$SortShuffleManager$$loadShuffleExecutorComponents(SortShuffleManager.scala:253)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager.shuffleExecutorComponents$lzycompute(SortShuffleManager.scala:88)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager.shuffleExecutorComponents(SortShuffleManager.scala:88)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager.getWriter(SortShuffleManager.scala:170)
        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   Caused by: java.lang.InterruptedException: 
AbstractBootstrap$PendingRegistrationPromise@225ee7c4(incomplete)
        at 
io.netty.util.concurrent.DefaultPromise.await0(DefaultPromise.java:684)
        at 
io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:300)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:289)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:214)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:226)
        at 
org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399)
        at 
org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
        at 
org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366)
        at 
org.apache.spark.executor.ExecutorClassLoader.getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:135)
        at 
org.apache.spark.executor.ExecutorClassLoader.$anonfun$fetchFn$1(ExecutorClassLoader.scala:66)
        at 
org.apache.spark.executor.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:176)
        at 
org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:113)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at 
org.apache.celeborn.reflect.DynConstructors$Builder.impl(DynConstructors.java:158)
        at 
org.apache.spark.shuffle.celeborn.SparkUtils.<clinit>(SparkUtils.java:191)
        ... 26 more
   26/01/05 16:30:16 INFO CelebornShuffleDataIO: Loading CelebornShuffleDataIO
   26/01/05 16:30:16 ERROR Executor: Exception in task 26.0 in stage 2.0 (TID 
237)
   java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.shuffle.celeborn.SparkUtils
        at 
org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO.<init>(CelebornShuffleDataIO.java:41)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2780)
        at scala.collection.immutable.List.flatMap(List.scala:366)
        at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2772)
        at 
org.apache.spark.shuffle.ShuffleDataIOUtils$.loadShuffleDataIO(ShuffleDataIOUtils.scala:35)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager$.org$apache$spark$shuffle$sort$SortShuffleManager$$loadShuffleExecutorComponents(SortShuffleManager.scala:253)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager.shuffleExecutorComponents$lzycompute(SortShuffleManager.scala:88)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager.shuffleExecutorComponents(SortShuffleManager.scala:88)
        at 
org.apache.spark.shuffle.sort.SortShuffleManager.getWriter(SortShuffleManager.scala:170)
        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   
   ### How to reproduce the bug?
   Steps to reproduce the bug.
   On spark 3.5, when we enable celeborn with 
spark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO,
 we were not able to start executor becuase SparkUtils failed to initialize 
even if we did not use columar shuffle. 
   That's because SparkUtils staticly loaded 
org.apache.spark.shuffle.celeborn.ColumnarHashBasedShuffleWriter, which was not 
included in the client spark shaded jar.
   
   To workaround this:
   1. Manually build the 
[spark-3.5-columnar-shuffle](https://github.com/apache/celeborn/tree/main/client-spark/spark-3.5-columnar-shuffle)
 module with mvn, and place the result jar to the spark classpath
   2. Apply the 
assets/spark-patch/Celeborn-Optimize-Skew-Partitions-spark3_5_6.patch to spark.
   
   However, this is irrational, 'cause we're not intended to use columnar 
shuffle and leave the setting celeborn.columnarShuffle.enabled=false. Plus, no 
doc ever mentioned this, any spark 3.5 user new to this project will stumble 
into this issue and get the spark job failed.
   
   Ideally we should on-demand loading the columnar class when enabled and well 
document the behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to