gudladona opened a new issue, #8199:
URL: https://github.com/apache/hudi/issues/8199

   
   
   **OOM during a Sync or Async clean operation**
   
   ENV:
   
   Hudi version: 0.11.1
   Java Version 1.8
   Spark Version: 3.1.2
   EMR version: 6.4
   Clean Policy: KEEP_LATEST_BY_HOURS -- 24 hours(default)
   Clean Parallelism: 200 (default)
   Metadata: disabled
   
   We have been experiencing consistent OOM errors when running Hudi 
delta-streamer job in continuous mode. The oom occurs during the "Generating 
list of file slices to be cleaned" phase. The image below shows the heap growth 
during the clean operation. 
   The heap growth particularly happens during 2 API calls from the executors 
getReplacedFileGroupsBefore and getAllFileGroups on the file system view.
   
   <img width="1363" alt="image (7)" 
src="https://user-images.githubusercontent.com/7864088/225490459-566f1b27-0240-4149-b9a1-0e0cca68347f.png";>
   
   Also, we do have jfr files that contain memory profiles during a failed 
clean operation, Github does not allow us to attach them to the issue.
   
   
   We also tried the following setting `hoodie.embed.timeline.server.async: 
true` which seems to have reduced the heap usage. This seems to happen due to 
the single threaded nature of the async executor.  Using this setting we notice 
the following heap usage
   
   <img width="1233" alt="image" 
src="https://user-images.githubusercontent.com/7864088/225494337-f6bf3c78-8b75-4612-9f50-3ce43183a7df.png";>
   
   Flame Graph for the Async clean with async timeline server
   
   <img width="1638" alt="image" 
src="https://user-images.githubusercontent.com/7864088/225494603-f96fc3b6-08b8-4f7a-ad7b-941cb0b992c1.png";>
   
   Flame Graph for Async clean with sync timeline server
   
   <img width="1641" alt="image" 
src="https://user-images.githubusercontent.com/7864088/225494795-0eb42127-70e8-485e-8967-675e2c3e0abc.png";>
   
   
   **To Reproduce**
   
   Description of the table's s3 partition structure
   
   <s3-prefix>/tenant=[0-9]/date=YYYY-MM-DD
   
   
   Steps to reproduce the behavior:
   
   1.  
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : 3.2.1
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```qtp1729765409-406
     at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
     at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B 
(StringCoding.java:350)
     at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B 
(String.java:941)
     at io.javalin.Context.result(Ljava/lang/String;)Lio/javalin/Context; 
(Context.kt:364)
     at 
org.apache.hudi.timeline.service.RequestHandler.writeValueAsStringSync(Lio/javalin/Context;Ljava/lang/Object;)V
 (RequestHandler.java:210)
     at 
org.apache.hudi.timeline.service.RequestHandler.writeValueAsString(Lio/javalin/Context;Ljava/lang/Object;)V
 (RequestHandler.java:176)
     at 
org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$18(Lio/javalin/Context;)V
 (RequestHandler.java:384)
     at 
org.apache.hudi.timeline.service.RequestHandler$$Lambda$2356.handle(Lio/javalin/Context;)V
 (Unknown Source)
     at 
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(Lio/javalin/Context;)V
 (RequestHandler.java:501)
     at 
io.javalin.security.SecurityUtil.noopAccessManager(Lio/javalin/Handler;Lio/javalin/Context;Ljava/util/Set;)V
 (SecurityUtil.kt:22)
     at 
io.javalin.Javalin$$Lambda$2336.manage(Lio/javalin/Handler;Lio/javalin/Context;Ljava/util/Set;)V
 (Unknown Source)
     at 
io.javalin.Javalin.lambda$addHandler$0(Lio/javalin/Handler;Ljava/util/Set;Lio/javalin/Context;)V
 (Javalin.java:606)
     at io.javalin.Javalin$$Lambda$2340.handle(Lio/javalin/Context;)V (Unknown 
Source)
     at io.javalin.core.JavalinServlet$service$2$1.invoke()V 
(JavalinServlet.kt:46)
     at io.javalin.core.JavalinServlet$service$2$1.invoke()Ljava/lang/Object; 
(JavalinServlet.kt:17)
     at 
io.javalin.core.JavalinServlet$service$1.invoke(Lkotlin/jvm/functions/Function0;)V
 (JavalinServlet.kt:143)
     at io.javalin.core.JavalinServlet$service$2.invoke()V 
(JavalinServlet.kt:41)
     at 
io.javalin.core.JavalinServlet.service(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (JavalinServlet.kt:107)
     at 
io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (JettyServerUtil.kt:72)
     at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (ScopedHandler.java:203)
     at 
org.apache.hudi.org.eclipse.jetty.servlet.ServletHandler.doScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (ServletHandler.java:480)
     at 
org.apache.hudi.org.eclipse.jetty.server.session.SessionHandler.doScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (SessionHandler.java:1668)
     at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (ScopedHandler.java:201)
     at 
org.apache.hudi.org.eclipse.jetty.server.handler.ContextHandler.doScope(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (ContextHandler.java:1247)
     at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.handle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (ScopedHandler.java:144)
     at 
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerList.handle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (HandlerList.java:61)
     at 
org.apache.hudi.org.eclipse.jetty.server.handler.StatisticsHandler.handle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (StatisticsHandler.java:174)
     at 
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerWrapper.handle(Ljava/lang/String;Lorg/apache/hudi/org/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 (HandlerWrapper.java:132)
     at 
org.apache.hudi.org.eclipse.jetty.server.Server.handle(Lorg/apache/hudi/org/eclipse/jetty/server/HttpChannel;)V
 (Server.java:502)
     at org.apache.hudi.org.eclipse.jetty.server.HttpChannel.handle()Z 
(HttpChannel.java:370)
     at org.apache.hudi.org.eclipse.jetty.server.HttpConnection.onFillable()V 
(HttpConnection.java:267)
     at 
org.apache.hudi.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded()V
 (AbstractConnection.java:305)
     at org.apache.hudi.org.eclipse.jetty.io.FillInterest.fillable()Z 
(FillInterest.java:103)
     at org.apache.hudi.org.eclipse.jetty.io.ChannelEndPoint$2.run()V 
(ChannelEndPoint.java:117)
     at 
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(Ljava/lang/Runnable;)V
 (EatWhatYouKill.java:333)
     at 
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(Z)Z
 (EatWhatYouKill.java:310)
     at 
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(Z)V
 (EatWhatYouKill.java:168)
     at 
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run()V 
(EatWhatYouKill.java:126)
     at 
org.apache.hudi.org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run()V
 (ReservedThreadExecutor.java:366)
     at 
org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Ljava/lang/Runnable;)V
 (QueuedThreadPool.java:765)
     at org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool$2.run()V 
(QueuedThreadPool.java:683)
     at java.lang.Thread.run()V (Thread.java:750)```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to