leobiscassi commented on issue #7533:
URL: https://github.com/apache/hudi/issues/7533#issuecomment-1377707695
I'm experiencing a similar situation. I did upgrade my tables to EMR 6.9
with hudi 0.12, my pipelines broke, so I downgraded the tables back to EMR 6.5
and hudi 0.9. After that I'm seeing that even with the metadata table config
enabled I'm not able to see them on s3. I've tried to do the following:
- Started EMR 6.5 cluster
- Executed hudi cli with the command `sudo /usr/lib/hudi/cli/bin/hudi-cli.sh`
- Connected to my table using `connect --path <S3-PATH>`
- Executed the command `metadata create`
The command fails but it seems to create an empty metadata table, this is
the stack trace
```shell
2023-01-10 18:47:36,061 INFO scheduler.DAGScheduler: ResultStage 0 (collect
at HoodieSparkEngineContext.java:73) failed in 0.607 s due to Job aborted due
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost
task 0.3 in stage 0.0 (TID 3) (ip-172-31-2-164.us-west-2.compute.internal
executor 1): java.lang.IllegalStateException: unread block data
at
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2934)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1704)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:457)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
2023-01-10 18:47:36,064 INFO scheduler.DAGScheduler: Job 0 failed: collect
at HoodieSparkEngineContext.java:73, took 0.652691 s
2023-01-10 18:47:36,065 ERROR core.SimpleExecutionStrategy: Command failed
java.lang.reflect.UndeclaredThrowableException
2023-01-10 18:47:36,066 WARN JLineShellComponent.exceptions:
java.lang.reflect.UndeclaredThrowableException
at
org.springframework.util.ReflectionUtils.rethrowRuntimeException(ReflectionUtils.java:315)
at
org.springframework.util.ReflectionUtils.handleInvocationTargetException(ReflectionUtils.java:295)
at
org.springframework.util.ReflectionUtils.handleReflectionException(ReflectionUtils.java:279)
at
org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:219)
at
org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
at
org.springframework.shell.core.SimpleExecutionStrategy.execute(SimpleExecutionStrategy.java:59)
at
org.springframework.shell.core.AbstractShell.executeCommand(AbstractShell.java:134)
at
org.springframework.shell.core.JLineShell.promptLoop(JLineShell.java:533)
at org.springframework.shell.core.JLineShell.run(JLineShell.java:179)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
in stage 0.0 (TID 3) (ip-172-31-2-164.us-west-2.compute.internal executor 1):
java.lang.IllegalStateException: unread block data
at
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2934)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1704)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:457)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2470)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2419)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2418)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2418)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1125)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1125)
at scala.Option.foreach(Option.scala:407)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1125)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2684)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2626)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2615)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:914)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2241)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2262)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2281)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2306)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
at
org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
at
org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
at
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at
org.apache.hudi.client.common.HoodieSparkEngineContext.map(HoodieSparkEngineContext.java:73)
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.getPartitionsToFilesMapping(HoodieBackedTableMetadataWriter.java:365)
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.bootstrapFromFilesystem(HoodieBackedTableMetadataWriter.java:313)
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.bootstrapIfNeeded(HoodieBackedTableMetadataWriter.java:272)
at
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.initialize(SparkHoodieBackedTableMetadataWriter.java:91)
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:114)
at
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:62)
at
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:58)
at
org.apache.hudi.cli.commands.MetadataCommand.create(MetadataCommand.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
... 6 more
Caused by: java.lang.IllegalStateException: unread block data
at
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2934)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1704)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:457)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
```
Does anyone have a clue on why this is happening?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]