gudladona opened a new issue, #8199: URL: https://github.com/apache/hudi/issues/8199
**OOM during a Sync or Async clean operation** ENV: Hudi version: 0.11.1 Java Version 1.8 Spark Version: 3.1.2 EMR version: 6.4 Clean Policy: KEEP_LATEST_BY_HOURS -- 24 hours(default) Clean Parallelism: 200 (default) Metadata: disabled We have been experiencing consistent OOM errors when running Hudi delta-streamer job in continuous mode. The oom occurs during the "Generating list of file slices to be cleaned" phase. The image below shows the heap growth during the clean operation. The heap growth particularly happens during 2 API calls from the executors getReplacedFileGroupsBefore and getAllFileGroups on the file system view. <img width="1363" alt="image (7)" src="https://user-images.githubusercontent.com/7864088/225490459-566f1b27-0240-4149-b9a1-0e0cca68347f.png"> Also, we do have jfr files that contain memory profiles during a failed clean operation, Github does not allow us to attach them to the issue. We also tried the following setting `hoodie.embed.timeline.server.async: true` which seems to have reduced the heap usage. This seems to happen due to the single threaded nature of the async executor. Using this setting we notice the following heap usage <img width="1233" alt="image" src="https://user-images.githubusercontent.com/7864088/225494337-f6bf3c78-8b75-4612-9f50-3ce43183a7df.png"> Flame Graph for the Async clean with async timeline server <img width="1638" alt="image" src="https://user-images.githubusercontent.com/7864088/225494603-f96fc3b6-08b8-4f7a-ad7b-941cb0b992c1.png"> Flame Graph for Async clean with sync timeline server <img width="1641" alt="image" src="https://user-images.githubusercontent.com/7864088/225494795-0eb42127-70e8-485e-8967-675e2c3e0abc.png"> **To Reproduce** Description of the table's s3 partition structure <s3-prefix>/tenant=[0-9]/date=YYYY-MM-DD Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : 0.11.1 * Spark version : 3.2.1 * Hadoop version : 3.2.1 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ```qtp1729765409-406 at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B (StringCoding.java:350) at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941) at io.javalin.Context.result(Ljava/lang/String;)Lio/javalin/Context; (Context.kt:364) at org.apache.hudi.timeline.service.RequestHandler.writeValueAsStringSync(Lio/javalin/Context;Ljava/lang/Object;)V (RequestHandler.java:210) at org.apache.hudi.timeline.service.RequestHandler.writeValueAsString(Lio/javalin/Context;Ljava/lang/Object;)V (RequestHandler.java:176) at org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$18(Lio/javalin/Context;)V (RequestHandler.java:384) at org.apache.hudi.timeline.service.RequestHandler$$Lambda$2356.handle(Lio/javalin/Context;)V (Unknown Source) at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(Lio/javalin/Context;)V (RequestHandler.java:501) at io.javalin.security.SecurityUtil.noopAccessManager(Lio/javalin/Handler;Lio/javalin/Context;Ljava/util/Set;)V (SecurityUtil.kt:22) at io.javalin.Javalin$$Lambda$2336.manage(Lio/javalin/Handler;Lio/javalin/Context;Ljava/util/Set;)V (Unknown Source) at io.javalin.Javalin.lambda$addHandler$0(Lio/javalin/Handler;Ljava/util/Set;Lio/javalin/Context;)V (Javalin.java:606) at io.javalin.Javalin$$Lambda$2340.handle(Lio/javalin/Context;)V (Unknown Source) at io.javalin.core.JavalinServlet$service$2$1.invoke()V (JavalinServlet.kt:46) at io.javalin.core.JavalinServlet$service$2$1.invoke()Ljava/lang/Object; (JavalinServlet.kt:17) at io.javalin.core.JavalinServlet$service$1.invoke(Lkotlin/jvm/functions/Function0;)V (JavalinServlet.kt:143) at io.javalin.core.JavalinServlet$service$2.invoke()V (JavalinServlet.kt:41) at io.javalin.core.JavalinServlet.service(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (JavalinServlet.kt:107) at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (JettyServerUtil.kt:72) at org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (ScopedHandler.java:203) at org.apache.hudi.org.eclipse.jetty.servlet.ServletHandler.doScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (ServletHandler.java:480) at org.apache.hudi.org.eclipse.jetty.server.session.SessionHandler.doScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (SessionHandler.java:1668) at org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (ScopedHandler.java:201) at org.apache.hudi.org.eclipse.jetty.server.handler.ContextHandler.doScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (ContextHandler.java:1247) at org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.handle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (ScopedHandler.java:144) at org.apache.hudi.org.eclipse.jetty.server.handler.HandlerList.handle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (HandlerList.java:61) at org.apache.hudi.org.eclipse.jetty.server.handler.StatisticsHandler.handle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (StatisticsHandler.java:174) at org.apache.hudi.org.eclipse.jetty.server.handler.HandlerWrapper.handle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (HandlerWrapper.java:132) at org.apache.hudi.org.eclipse.jetty.server.Server.handle(Lorg/apache/hudi/org/eclipse/jetty/server/HttpChannel;)V (Server.java:502) at org.apache.hudi.org.eclipse.jetty.server.HttpChannel.handle()Z (HttpChannel.java:370) at org.apache.hudi.org.eclipse.jetty.server.HttpConnection.onFillable()V (HttpConnection.java:267) at org.apache.hudi.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded()V (AbstractConnection.java:305) at org.apache.hudi.org.eclipse.jetty.io.FillInterest.fillable()Z (FillInterest.java:103) at org.apache.hudi.org.eclipse.jetty.io.ChannelEndPoint$2.run()V (ChannelEndPoint.java:117) at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(Ljava/lang/Runnable;)V (EatWhatYouKill.java:333) at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(Z)Z (EatWhatYouKill.java:310) at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(Z)V (EatWhatYouKill.java:168) at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run()V (EatWhatYouKill.java:126) at org.apache.hudi.org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run()V (ReservedThreadExecutor.java:366) at org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Ljava/lang/Runnable;)V (QueuedThreadPool.java:765) at org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool$2.run()V (QueuedThreadPool.java:683) at java.lang.Thread.run()V (Thread.java:750)``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
