ehurheap commented on issue #8209:
URL: https://github.com/apache/hudi/issues/8209#issuecomment-1478788893
I have attempted to run the cleaner as a separate step from the ingestion.
The ingestion is now configured with
```
hoodie.clean.automatic -> false
hoodie.archive.automatic -> false
```
Using the hudi-cli I submitted this clean command:
`cleans run --sparkMaster local[8] --sparkMemory 60G --hoodieConfigs
"hoodie.cleaner.policy=KEEP_LATEST_BY_HOURS hoodie.cleaner.hours.retained=1920
hoodie.cleaner.parallelism=400"`
After an hour and a half, it failed with
```
...INFO S3NativeFileSystem: Opening
's3://path-to-table/users_changes-v1/.hoodie/20221208164706388.savepoint' for
reading
#udi:users_changes->
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 1614"...
Failed to clean hoodie dataset
```
I tried tweaking the hudi-cli.sh script to specify `-Xmx=90G` in the java
command, but that did not help.
As an alternative I attempted the `spark-submit` version of the cleans
command like so:
```
spark-submit --deploy-mode cluster \
--conf spark.executor.instances=30 \
--conf spark.executor.cores=2 \
--conf spark.executor.memory=20G \
--conf spark.driver.memory=40G \
--conf spark.app.name=HoodieCleaner_users_changes-v1 \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--class org.apache.hudi.utilities.HoodieCleaner
/usr/lib/hudi/hudi-utilities-bundle_2.12-0.11.1-amzn-0.jar \
--target-base-path s3://path-to-table/users_changes-v1 \
--hoodie-conf hoodie.metadata.enable=false \
--hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_BY_HOURS \
--hoodie-conf hoodie.cleaner.hours.retained=2000 \
--hoodie-conf hoodie.cleaner.parallelism=400 \
--hoodie-conf hoodie.clean.allow.multiple=false \
--hoodie-conf hoodie.archive.async=false \
--hoodie-conf hoodie.archive.automatic=false
```
After about an hour the application attempt dies and the driver logs show
this:
```
23/03/22 00:43:50 ERROR Javalin: Exception occurred while servicing
http-request
java.lang.OutOfMemoryError: null
at
java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161)
~[?:1.8.0_362]
at
java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155)
~[?:1.8.0_362]
at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125)
~[?:1.8.0_362]
at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
~[?:1.8.0_362]
at java.lang.StringBuilder.append(StringBuilder.java:195) ~[?:1.8.0_362]
at
com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:460)
~[jackson-core-2.13.3.jar:2.13.3]
at
com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:85)
~[jackson-core-2.13.3.jar:2.13.3]
at
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3827)
~[jackson-databind-2.13.3.jar:2.13.3]
at
org.apache.hudi.timeline.service.RequestHandler.jsonifyResult(RequestHandler.java:198)
~[__app__.jar:0.11.1-amzn-0]
at
org.apache.hudi.timeline.service.RequestHandler.writeValueAsStringSync(RequestHandler.java:209)
~[__app__.jar:0.11.1-amzn-0]
at
org.apache.hudi.timeline.service.RequestHandler.writeValueAsString(RequestHandler.java:176)
~[__app__.jar:0.11.1-amzn-0]
at
org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$18(RequestHandler.java:384)
~[__app__.jar:0.11.1-amzn-0]
at
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501)
~[__app__.jar:0.11.1-amzn-0]
at
io.javalin.core.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:23)
~[__app__.jar:0.11.1-amzn-0]
at
io.javalin.http.JavalinServlet$addHandler$protectedHandler$1.handle(JavalinServlet.kt:128)
~[__app__.jar:0.11.1-amzn-0]
at
io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:45)
~[__app__.jar:0.11.1-amzn-0]
at
io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:24)
~[__app__.jar:0.11.1-amzn-0]
at
io.javalin.http.JavalinServlet$service$1.invoke(JavalinServlet.kt:136)
~[__app__.jar:0.11.1-amzn-0]
at
io.javalin.http.JavalinServlet$service$2.invoke(JavalinServlet.kt:40)
~[__app__.jar:0.11.1-amzn-0]
at io.javalin.http.JavalinServlet.service(JavalinServlet.kt:81)
~[__app__.jar:0.11.1-amzn-0]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
~[javax.servlet-api-3.1.0.jar:3.1.0]
at
io.javalin.websocket.JavalinWsServlet.service(JavalinWsServlet.kt:51)
~[__app__.jar:0.11.1-amzn-0]
```
I tried several attempts tweaking these parameters for each run:
```
--hoodie-conf hoodie.cleaner.hours.retained
--conf spark.executor.instances
--conf spark.executor.cores
--conf spark.executor.memory
--conf spark.driver.memory
```
Each time I got some error, for example both the following are errors that
occurred in different runs:
```
# java.lang.OutOfMemoryError: Requested array size exceeds VM limit
```
and
```
23/03/21 20:39:21 ERROR RequestHandler: Got runtime exception servicing
request
partition=env_id%3D2907378054%2Fweek%3D20221121&basepath=s3%3A%2F%2Fheap-datalake-storage%2Fdata%2Ftables%2Fusers_changes-v1&lastinstantts=20230321200037228&timelinehash=c015e055fa5d5d3f14376d8c4aee8b41e5be8cd928f0c72068646cb95f9365c5
java.lang.NegativeArraySizeException: null
at
java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68)
~[?:1.8.0_362]
at java.lang.StringBuilder.<init>(StringBuilder.java:106) ~[?:1.8.0_362]
at
com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:455)
~[jackson-core-2.13.3.jar:2.13.3]
at
com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:85)
~[jackson-core-2.13.3.jar:2.13.3]
at
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3827)
~[jackson-databind-2.13.3.jar:2.13.3]
at
org.apache.hudi.timeline.service.RequestHandler.jsonifyResult(RequestHandler.java:198)
~[__app__.jar:0.11.1-amzn-0]
at
org.apache.hudi.timeline.service.RequestHandler.writeValueAsStringSync(RequestHandler.java:209)
~[__app__.jar:0.11.1-amzn-0]
at
org.apache.hudi.timeline.service.RequestHandler.writeValueAsString(RequestHandler.java:176)
~[__app__.jar:0.11.1-amzn-0]
at
org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$18(RequestHandler.java:384)
~[__app__.jar:0.11.1-amzn-0]
at
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501)
~[__app__.jar:0.11.1-amzn-0]
at
io.javalin.core.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:23)
~[__app__.jar:0.11.1-amzn-0]
```
Is there some way to reduce the size of the clean operation so that we don't
run into these errors?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]