ehurheap commented on issue #8209:
URL: https://github.com/apache/hudi/issues/8209#issuecomment-1478788893

   I have attempted to run the cleaner as a separate step from the ingestion. 
The ingestion is now configured with
   ```
   hoodie.clean.automatic -> false
   hoodie.archive.automatic -> false
   ```
   
   Using the hudi-cli I submitted this clean command:
   
   `cleans run --sparkMaster local[8] --sparkMemory 60G --hoodieConfigs 
"hoodie.cleaner.policy=KEEP_LATEST_BY_HOURS hoodie.cleaner.hours.retained=1920 
hoodie.cleaner.parallelism=400"`
   
   After an hour and a half, it failed with 
   ```
   ...INFO S3NativeFileSystem: Opening 
's3://path-to-table/users_changes-v1/.hoodie/20221208164706388.savepoint' for 
reading
   #udi:users_changes->
   # java.lang.OutOfMemoryError: Java heap space
   # -XX:OnOutOfMemoryError="kill -9 %p"
   #   Executing /bin/sh -c "kill -9 1614"...
   Failed to clean hoodie dataset
   ```
   I tried tweaking the hudi-cli.sh script to specify `-Xmx=90G` in the java 
command, but that did not help.
   
   
   As an alternative I attempted the `spark-submit` version of the cleans 
command like so:
   ```
   spark-submit --deploy-mode cluster \
   --conf spark.executor.instances=30 \
   --conf spark.executor.cores=2 \
   --conf spark.executor.memory=20G \
   --conf spark.driver.memory=40G \
   --conf spark.app.name=HoodieCleaner_users_changes-v1 \
   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
   --class org.apache.hudi.utilities.HoodieCleaner 
/usr/lib/hudi/hudi-utilities-bundle_2.12-0.11.1-amzn-0.jar \
   --target-base-path s3://path-to-table/users_changes-v1 \
   --hoodie-conf hoodie.metadata.enable=false \
   --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_BY_HOURS \
   --hoodie-conf hoodie.cleaner.hours.retained=2000 \
   --hoodie-conf hoodie.cleaner.parallelism=400 \
   --hoodie-conf hoodie.clean.allow.multiple=false \
   --hoodie-conf hoodie.archive.async=false \
   --hoodie-conf hoodie.archive.automatic=false
   ```
   After about an hour the application attempt dies and the driver logs show 
this:
   ```
   23/03/22 00:43:50 ERROR Javalin: Exception occurred while servicing 
http-request
   java.lang.OutOfMemoryError: null
        at 
java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161) 
~[?:1.8.0_362]
        at 
java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155) 
~[?:1.8.0_362]
        at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125)
 ~[?:1.8.0_362]
        at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596) 
~[?:1.8.0_362]
        at java.lang.StringBuilder.append(StringBuilder.java:195) ~[?:1.8.0_362]
        at 
com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:460)
 ~[jackson-core-2.13.3.jar:2.13.3]
        at 
com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:85)
 ~[jackson-core-2.13.3.jar:2.13.3]
        at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3827)
 ~[jackson-databind-2.13.3.jar:2.13.3]
        at 
org.apache.hudi.timeline.service.RequestHandler.jsonifyResult(RequestHandler.java:198)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
org.apache.hudi.timeline.service.RequestHandler.writeValueAsStringSync(RequestHandler.java:209)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
org.apache.hudi.timeline.service.RequestHandler.writeValueAsString(RequestHandler.java:176)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$18(RequestHandler.java:384)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
io.javalin.core.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:23) 
~[__app__.jar:0.11.1-amzn-0]
        at 
io.javalin.http.JavalinServlet$addHandler$protectedHandler$1.handle(JavalinServlet.kt:128)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:45) 
~[__app__.jar:0.11.1-amzn-0]
        at 
io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:24) 
~[__app__.jar:0.11.1-amzn-0]
        at 
io.javalin.http.JavalinServlet$service$1.invoke(JavalinServlet.kt:136) 
~[__app__.jar:0.11.1-amzn-0]
        at 
io.javalin.http.JavalinServlet$service$2.invoke(JavalinServlet.kt:40) 
~[__app__.jar:0.11.1-amzn-0]
        at io.javalin.http.JavalinServlet.service(JavalinServlet.kt:81) 
~[__app__.jar:0.11.1-amzn-0]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
~[javax.servlet-api-3.1.0.jar:3.1.0]
        at 
io.javalin.websocket.JavalinWsServlet.service(JavalinWsServlet.kt:51) 
~[__app__.jar:0.11.1-amzn-0]
   ```
   I tried several attempts tweaking these parameters for each run:
   ```
   --hoodie-conf hoodie.cleaner.hours.retained
   --conf spark.executor.instances
   --conf spark.executor.cores
   --conf spark.executor.memory
   --conf spark.driver.memory
   ```
   Each time I got some error, for example both the following are errors that 
occurred in different runs:
   
   ```
   # java.lang.OutOfMemoryError: Requested array size exceeds VM limit
   ```
   and
   ```
   23/03/21 20:39:21 ERROR RequestHandler: Got runtime exception servicing 
request 
partition=env_id%3D2907378054%2Fweek%3D20221121&basepath=s3%3A%2F%2Fheap-datalake-storage%2Fdata%2Ftables%2Fusers_changes-v1&lastinstantts=20230321200037228&timelinehash=c015e055fa5d5d3f14376d8c4aee8b41e5be8cd928f0c72068646cb95f9365c5
   java.lang.NegativeArraySizeException: null
        at 
java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68) 
~[?:1.8.0_362]
        at java.lang.StringBuilder.<init>(StringBuilder.java:106) ~[?:1.8.0_362]
        at 
com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:455)
 ~[jackson-core-2.13.3.jar:2.13.3]
        at 
com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:85)
 ~[jackson-core-2.13.3.jar:2.13.3]
        at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3827)
 ~[jackson-databind-2.13.3.jar:2.13.3]
        at 
org.apache.hudi.timeline.service.RequestHandler.jsonifyResult(RequestHandler.java:198)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
org.apache.hudi.timeline.service.RequestHandler.writeValueAsStringSync(RequestHandler.java:209)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
org.apache.hudi.timeline.service.RequestHandler.writeValueAsString(RequestHandler.java:176)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$18(RequestHandler.java:384)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501)
 ~[__app__.jar:0.11.1-amzn-0]
        at 
io.javalin.core.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:23) 
~[__app__.jar:0.11.1-amzn-0]
   ```
   
   Is there some way to reduce the size of the clean operation so that we don't 
run into these errors?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to