SabyasachiDasTR opened a new issue, #8580:
URL: https://github.com/apache/hudi/issues/8580
**Describe the problem you faced**
We have a spark streaming job which does only hudi upsert to load data to
the partitions. We have 1000’s of collections/partions where data is upserted
at high frequncy interval. Lets call it as the main application and it was
running on Hudi 0.11.1[emr-6.7.0].
There are nearly 100+ downstream application which reads the data written by
this main app based on daily/weekly scheduled run.
These downstream apps uses hudi api to read the source data from main app as
well. They range from 0.11 to 0.9 hudi version.
All these downstream apps were able to ready the Hudi 0.11.1 (V4) data.But
when we changed our main app version to Hudi 0.12.2[emr-6.10.0] they all
started failing with attached stacktrace. Basically an emr job running on older
V4 hudi version is not able to read data created by Hudi 0.12.2 (v5) version.
It is not backward compatible. This seems to be a basic requirement but is not
provided from hudi.
We do want to upgrade to latest hudi 0.12.2 version to be able to fix some
of long running issues like duplicate guid issue.
But to be able to do this we cannot ask to change all our downstream apps to
upgrade as well.
There are logistics challenges at different levels to do that and doesn’t
seem feasible.
**To Reproduce**
Steps to reproduce the behavior:
1. Start emr cluster with 6.10 for hudi .12.2 write to a table using the
upsert and config properties attached[Hudiconfigs_and_upsertMethod.txt].Using
the hudi.config file and Upsert method (both in the same file hudi.config) for
(TNI 6.10 version EMR) and upsert data we can send this information to an S3
bucket. Run this on EMR as a spark job.
2. Spin up a Jupiter notebook emr to replicate Downstream at 6.7 EMR hudi
0.11 which reads the table
3. Then call
val basePath=```s3://<bucket>/<table>/<partition```
val ViewDF = spark.read.format("org.apache.hudi").load(basePath)
ViewDF.createOrReplaceTempView("hudi_doc_table_branch_dest")
4. Reads fail with version 5 failure[ attached stacktrace ]
**Expected behavior**
Downstream job running on hudi 0.11 should be able to read the data
generated from main app running on hudo 0.12.2
**Environment Description**
* Hudi version : 0.12.2
* Spark version : 3.3.1
* Hive version : Hive not install on EMR Cluster.
* Hadoop version : 3.3.3
* Storage (HDFS/S3/GCS..) : s3 parquet
* Running on Docker? (yes/no) :no
**Additional context**
Aws ticket on the same : Case ID 12463972981
**Stacktrace**
``` val fs: FileSystem = FileSystem.get(new Configuration())
val maxCommitTime = HoodieDataSourceHelpers.latestCommit(fs, inputPath)
2023-04-07T09:15:37.718+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177]
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem] [S3NativeFileSystem]:
Opening 's3://a206760-novusdoc-s3-qa-use1/novusdoc/.hoodie/hoodie.properties'
for reading
2023-04-07T09:15:37.896+0000 [ERROR] [eb423741-4f44-4fea-b76f-912a6b3f0177]
[org.apache.spark.deploy.yarn.ApplicationMaster] [ApplicationMaster]: User
class threw exception: org.apache.hudi.exception.HoodieException: Unknown
versionCode:5
org.apache.hudi.exception.HoodieException: Unknown versionCode:5
at
org.apache.hudi.common.table.HoodieTableVersion.lambda$versionFromCode$1(HoodieTableVersion.java:58)
at java.util.Optional.orElseThrow(Optional.java:290)
at
org.apache.hudi.common.table.HoodieTableVersion.versionFromCode(HoodieTableVersion.java:58)
at
org.apache.hudi.common.table.HoodieTableConfig.getTableVersion(HoodieTableConfig.java:472)
at
org.apache.hudi.common.table.HoodieTableConfig.fetchConfigs(HoodieTableConfig.java:305)
at
org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:244)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:125)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:78)
at
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:668)
at
org.apache.hudi.HoodieDataSourceHelpers.allCompletedCommitsCompactions(HoodieDataSourceHelpers.java:77)
at
org.apache.hudi.HoodieDataSourceHelpers.latestCommit(HoodieDataSourceHelpers.java:67)
at
com.<pkg>.spark.parser.BaseStreamingParser.batchParser(BaseStreamingParser.scala:210)
at
com.<pkg>.spark.parser.BaseStreamingParser.batchParser$(BaseStreamingParser.scala:146)
at
com.<pkg>.spark.parser.<pkg>$.batchParser(AnzStreamingParser.scala:11)
at
com.<pkg>.spark.parser.BaseStreamingParser.main(BaseStreamingParser.scala:129)
at
com.<pkg>.spark.parser.BaseStreamingParser.main$(BaseStreamingParser.scala:96)
at com.<pkg>.spark.parser.<pkg>$.main(AnzStreamingParser.scala:11)
at com.<pkg>.spark.parser.<pkg>.main(AnzStreamingParser.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:740)
2023-04-07T09:15:37.899+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177]
[org.apache.spark.deploy.yarn.ApplicationMaster] [ApplicationMaster]: Final app
status: FAILED, exitCode: 15, (reason: User class threw exception:
org.apache.hudi.exception.HoodieException: Unknown versionCode:5
at
org.apache.hudi.common.table.HoodieTableVersion.lambda$versionFromCode$1(HoodieTableVersion.java:58)
at java.util.Optional.orElseThrow(Optional.java:290)
at
org.apache.hudi.common.table.HoodieTableVersion.versionFromCode(HoodieTableVersion.java:58)
at
org.apache.hudi.common.table.HoodieTableConfig.getTableVersion(HoodieTableConfig.java:472)
at
org.apache.hudi.common.table.HoodieTableConfig.fetchConfigs(HoodieTableConfig.java:305)
at
org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:244)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:125)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:78)
at
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:668)
at
org.apache.hudi.HoodieDataSourceHelpers.allCompletedCommitsCompactions(HoodieDataSourceHelpers.java:77)
at
org.apache.hudi.HoodieDataSourceHelpers.latestCommit(HoodieDataSourceHelpers.java:67)
at
com.<pkg>.spark.parser.BaseStreamingParser.batchParser(BaseStreamingParser.scala:210)
at
com.<pkg>.spark.parser.BaseStreamingParser.batchParser$(BaseStreamingParser.scala:146)
at
com.<pkg>.spark.parser.<pkg>$.batchParser(AnzStreamingParser.scala:11)
at
com.<pkg>.spark.parser.BaseStreamingParser.main(BaseStreamingParser.scala:129)
at
com.<pkg>.spark.parser.BaseStreamingParser.main$(BaseStreamingParser.scala:96)
at com.<pkg>.spark.parser.<pkg>$.main(AnzStreamingParser.scala:11)
at com.<pkg>.spark.parser.<pkg>.main(AnzStreamingParser.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:740)
)
2023-04-07T09:15:37.917+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177]
[org.apache.spark.SparkContext] [SparkContext]: Invoking stop() from shutdown
hook
2023-04-07T09:15:37.925+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177]
[org.sparkproject.jetty.server.AbstractConnector] [AbstractConnector]: Stopped
Spark@659dce1b{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
2023-04-07T09:15:37.927+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177]
[org.apache.spark.ui.SparkUI] [SparkUI]: Stopped Spark web UI at
http://ip-100-66-72-129.3175.aws-int.thomsonreuters.com:36067
2023-04-07T09:15:37.973+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177]
[org.apache.spark.MapOutputTrackerMasterEndpoint]
[MapOutputTrackerMasterEndpoint]: MapOutputTrackerMasterEndpoint stopped!
2023-04-07T09:15:38.044+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177]
[org.apache.spark.SparkContext] [SparkContext]: Successfully stopped
SparkContext
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]