szingerpeter opened a new issue, #7533:
URL: https://github.com/apache/hudi/issues/7533
Hy,
I'm using Hudi version 1.0 with Spark version 3.2.1-amzn-0 and Hive version
3.1.3-amzn-0.
After rolling back a table I was facing the issue described in #4747
```
Caused by: java.io.FileNotFoundException: No such file or directory 's3://...
```
Thereafter, following the recommendation on #4747, I deleted manually the
metadata folder under `s3://<table_path>/.hoodie/metadata`, which solved the
problem.
After upserting into the table, the metadata
`s3://<table_path>/.hoodie/metadata` gets recreated. However, after querying
the data via spark and beeline, it only returns the entries, which have been
upserted in the last operation (~40M rows) and not any previous data (~2B
rows). If i delete `s3://<table_path>/.hoodie/metadata` again, then both spark
and beeline returns all the historical data and the newly inserted data.
I tried using hudi cli's `metadata create` command, but it fails with:
```
29990 [Spring Shell] ERROR org.apache.spark.SparkContext - Error
initializing SparkContext.
java.lang.ClassCastException:
org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterMetricsRequestProto
cannot be cast to com.google.protobuf.Message
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy46.getClusterMetrics(Unknown Source)
at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:271)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy47.getClusterMetrics(Unknown Source)
at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:631)
at
org.apache.spark.deploy.yarn.Client.$anonfun$submitApplication$1(Client.scala:181)
at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:65)
at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:181)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:582)
at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at
org.apache.hudi.cli.utils.SparkUtil.initJavaSparkConf(SparkUtil.java:117)
at
org.apache.hudi.cli.commands.MetadataCommand.initJavaSparkContext(MetadataCommand.java:367)
at
org.apache.hudi.cli.commands.MetadataCommand.create(MetadataCommand.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
at
org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
at
org.springframework.shell.core.SimpleExecutionStrategy.execute(SimpleExecutionStrategy.java:59)
at
org.springframework.shell.core.AbstractShell.executeCommand(AbstractShell.java:134)
at
org.springframework.shell.core.JLineShell.promptLoop(JLineShell.java:533)
at org.springframework.shell.core.JLineShell.run(JLineShell.java:179)
at java.lang.Thread.run(Thread.java:750)
```
is there a way of recreating the metadata table of an existing hudi table
such that it will reference historical data as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]