ehurheap opened a new issue, #9480:
URL: https://github.com/apache/hudi/issues/9480

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   create savepoint fails with OutOfMemoryError
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. hudi MOR table with 23,000 partitions
   2. metadata table is NOT enabled
   3. in cli:
   ```
   - hudi:events->set --conf spark.yarn.max.executor.failures=8
   - hudi:events->set --conf spark.yarn.am.memory=40g
   - hudi:events->set --conf spark.yarn.am.cores=2
   - hudi:events->set --conf spark.executor.instances=10
   ```
   4. `savepoint create --commit 20230809192242950 --sparkMaster yarn 
--sparkMemory 40G`
   5. After several minutes command fails:
   ```
   java.lang.OutOfMemoryError: Java heap space
   Failed: Could not create savepoint "20230809192242950".
   ```
   
   **Expected behavior**
   savepoint should be created
   
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   
   * Spark version : 3.3.0
   
   * Hive version : n/a
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Executor logs show `Issue communicating with driver in heartbeater` and 
`TimeoutException: Cannot receive any reply from <driver.ip:port> in 10000 
milliseconds`
   
   `--conf spark.executor.instances=10` setting was ignored - I still only got 
2 executors.
   
   Would it help to try this with a hudi writeClient instead of hudi-cli?
   
   
   **Stacktrace**
   ```
   23/08/18 17:47:59 WARN Executor: Issue communicating with driver in 
heartbeater
   org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10000 
milliseconds]. This timeout is controlled by spark.executor.heartbeatInterval
           at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) 
~[scala-library-2.12.15.jar:?]
           at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) 
~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at 
org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103) 
~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at 
org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1053) 
~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at 
org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:238) 
~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
~[scala-library-2.12.15.jar:?]
           at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2078) 
~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) 
~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_382]
           at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
~[?:1.8.0_382]
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 ~[?:1.8.0_382]
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 ~[?:1.8.0_382]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_382]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_382]
           at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
   Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
[10000 milliseconds]
           at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) 
~[scala-library-2.12.15.jar:?]
           at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) 
~[scala-library-2.12.15.jar:?]
           at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:293) 
~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) 
~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
           ... 13 more
   23/08/18 17:48:09 WARN Executor: Issue communicating with driver in 
heartbeater
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to