leobiscassi commented on issue #7533:
URL: https://github.com/apache/hudi/issues/7533#issuecomment-1377707695

   I'm experiencing a similar situation. I did upgrade my tables to EMR 6.9 
with hudi 0.12, my pipelines broke, so I downgraded the tables back to EMR 6.5 
and hudi 0.9. After that I'm seeing that even with the metadata table config 
enabled I'm not able to see them on s3. I've tried to do the following:
   
   - Started EMR 6.5 cluster
   - Executed hudi cli with the command `sudo /usr/lib/hudi/cli/bin/hudi-cli.sh`
   - Connected to my table using `connect --path <S3-PATH>`
   - Executed the command `metadata create`
   
   The command fails but it seems to create an empty metadata table, this is 
the stack trace
   
   ```shell
   2023-01-10 18:47:36,061 INFO scheduler.DAGScheduler: ResultStage 0 (collect 
at HoodieSparkEngineContext.java:73) failed in 0.607 s due to Job aborted due 
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task 0.3 in stage 0.0 (TID 3) (ip-172-31-2-164.us-west-2.compute.internal 
executor 1): java.lang.IllegalStateException: unread block data
           at 
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2934)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1704)
           at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
           at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
           at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
           at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
           at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:457)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   
   Driver stacktrace:
   2023-01-10 18:47:36,064 INFO scheduler.DAGScheduler: Job 0 failed: collect 
at HoodieSparkEngineContext.java:73, took 0.652691 s
   2023-01-10 18:47:36,065 ERROR core.SimpleExecutionStrategy: Command failed 
java.lang.reflect.UndeclaredThrowableException
   2023-01-10 18:47:36,066 WARN JLineShellComponent.exceptions: 
   java.lang.reflect.UndeclaredThrowableException
           at 
org.springframework.util.ReflectionUtils.rethrowRuntimeException(ReflectionUtils.java:315)
           at 
org.springframework.util.ReflectionUtils.handleInvocationTargetException(ReflectionUtils.java:295)
           at 
org.springframework.util.ReflectionUtils.handleReflectionException(ReflectionUtils.java:279)
           at 
org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:219)
           at 
org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
           at 
org.springframework.shell.core.SimpleExecutionStrategy.execute(SimpleExecutionStrategy.java:59)
           at 
org.springframework.shell.core.AbstractShell.executeCommand(AbstractShell.java:134)
           at 
org.springframework.shell.core.JLineShell.promptLoop(JLineShell.java:533)
           at org.springframework.shell.core.JLineShell.run(JLineShell.java:179)
           at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 
in stage 0.0 (TID 3) (ip-172-31-2-164.us-west-2.compute.internal executor 1): 
java.lang.IllegalStateException: unread block data
           at 
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2934)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1704)
           at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
           at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
           at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
           at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
           at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:457)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   
   Driver stacktrace:
           at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2470)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2419)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2418)
           at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
           at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
           at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2418)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1125)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1125)
           at scala.Option.foreach(Option.scala:407)
           at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1125)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2684)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2626)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2615)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:914)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2241)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2262)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2281)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2306)
           at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
           at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
           at 
org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
           at 
org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
           at 
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
           at 
org.apache.hudi.client.common.HoodieSparkEngineContext.map(HoodieSparkEngineContext.java:73)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.getPartitionsToFilesMapping(HoodieBackedTableMetadataWriter.java:365)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.bootstrapFromFilesystem(HoodieBackedTableMetadataWriter.java:313)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.bootstrapIfNeeded(HoodieBackedTableMetadataWriter.java:272)
           at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.initialize(SparkHoodieBackedTableMetadataWriter.java:91)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:114)
           at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:62)
           at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:58)
           at 
org.apache.hudi.cli.commands.MetadataCommand.create(MetadataCommand.java:104)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
           ... 6 more
   Caused by: java.lang.IllegalStateException: unread block data
           at 
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2934)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1704)
           at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
           at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
           at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
           at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
           at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:457)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           ... 1 more
   ```
   
   Does anyone have a clue on why this is happening?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to