SabyasachiDasTR opened a new issue, #8581:
URL: https://github.com/apache/hudi/issues/8581

   **Describe the problem you faced**
   
   We have a spark streaming job which does only hudi upsert to load data to 
the partitions. We have 1000’s of collections/partions where data is upserted 
at high frequncy interval. Lets call it as the main application and it was 
running on Hudi 0.11.1[emr-6.7.0].
   There are nearly 100+ downstream application which reads the data written by 
this main app based on daily/weekly scheduled run.
   These downstream apps uses hudi api to read the source data from main app as 
well. They range from 0.11 to 0.9 hudi version.
   
   All these downstream apps were able to ready the Hudi 0.11.1 (V4) data.But 
when we changed our main app version to  Hudi 0.12.2[emr-6.10.0] they all 
started failing with attached stacktrace. Basically an emr job running on older 
V4 hudi version is not able to read data created by Hudi 0.12.2 (v5) version. 
It is not backward compatible. This seems to be a basic requirement but is not 
provided from hudi.
   
   We do want to upgrade to latest hudi 0.12.2 version to be able to fix some 
of long running issues like duplicate guid issue. 
   But to be able to do this we cannot ask to change all our downstream apps to 
upgrade as well. 
   There are  logistics challenges at different levels to do that and doesn’t 
seem feasible.
   
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Start emr cluster with 6.10 for hudi .12.2  write to a table using the 
upsert and config properties attached[Hudiconfigs_and_upsertMethod.txt].Using 
the hudi.config file and Upsert method (both in the same file hudi.config) for 
(TNI 6.10 version EMR) and upsert data we can send this information to an S3 
bucket. Run this on EMR as a spark job. 
   2. Spin up a Jupiter notebook emr to replicate Downstream at 6.7 EMR hudi 
0.11 which reads the table
   3. Then call 
   val basePath=```s3://<bucket>/<table>/<partition```
   val ViewDF = spark.read.format("org.apache.hudi").load(basePath)
   ViewDF.createOrReplaceTempView("hudi_doc_table_branch_dest")
   4. Reads fail with version 5 failure[ attached stacktrace ]
   
   **Expected behavior**
   
   Downstream job running on hudi 0.11 should be able to read the data 
generated from main app running on hudo 0.12.2
   
   **Environment Description**
   
   * Hudi version : 0.12.2
   
   * Spark version :  3.3.1
   
   * Hive version : Hive not install on EMR Cluster.
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : s3 parquet 
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   Aws ticket on the same : Case ID 12463972981
   
   **Stacktrace**
   
   ```    val fs: FileSystem = FileSystem.get(new Configuration())
       val maxCommitTime = HoodieDataSourceHelpers.latestCommit(fs, inputPath)
   
   2023-04-07T09:15:37.718+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177] 
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem] [S3NativeFileSystem]: 
Opening 's3://a206760-novusdoc-s3-qa-use1/novusdoc/.hoodie/hoodie.properties' 
for reading
   2023-04-07T09:15:37.896+0000 [ERROR] [eb423741-4f44-4fea-b76f-912a6b3f0177] 
[org.apache.spark.deploy.yarn.ApplicationMaster] [ApplicationMaster]: User 
class threw exception: org.apache.hudi.exception.HoodieException: Unknown 
versionCode:5
   org.apache.hudi.exception.HoodieException: Unknown versionCode:5
        at 
org.apache.hudi.common.table.HoodieTableVersion.lambda$versionFromCode$1(HoodieTableVersion.java:58)
        at java.util.Optional.orElseThrow(Optional.java:290)
        at 
org.apache.hudi.common.table.HoodieTableVersion.versionFromCode(HoodieTableVersion.java:58)
        at 
org.apache.hudi.common.table.HoodieTableConfig.getTableVersion(HoodieTableConfig.java:472)
        at 
org.apache.hudi.common.table.HoodieTableConfig.fetchConfigs(HoodieTableConfig.java:305)
        at 
org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:244)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:125)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:78)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:668)
        at 
org.apache.hudi.HoodieDataSourceHelpers.allCompletedCommitsCompactions(HoodieDataSourceHelpers.java:77)
        at 
org.apache.hudi.HoodieDataSourceHelpers.latestCommit(HoodieDataSourceHelpers.java:67)
        at 
com.<pkg>.spark.parser.BaseStreamingParser.batchParser(BaseStreamingParser.scala:210)
        at 
com.<pkg>.spark.parser.BaseStreamingParser.batchParser$(BaseStreamingParser.scala:146)
        at 
com.<pkg>.spark.parser.<pkg>$.batchParser(AnzStreamingParser.scala:11)
        at 
com.<pkg>.spark.parser.BaseStreamingParser.main(BaseStreamingParser.scala:129)
        at 
com.<pkg>.spark.parser.BaseStreamingParser.main$(BaseStreamingParser.scala:96)
        at com.<pkg>.spark.parser.<pkg>$.main(AnzStreamingParser.scala:11)
        at com.<pkg>.spark.parser.<pkg>.main(AnzStreamingParser.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:740)
   2023-04-07T09:15:37.899+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177] 
[org.apache.spark.deploy.yarn.ApplicationMaster] [ApplicationMaster]: Final app 
status: FAILED, exitCode: 15, (reason: User class threw exception: 
org.apache.hudi.exception.HoodieException: Unknown versionCode:5
        at 
org.apache.hudi.common.table.HoodieTableVersion.lambda$versionFromCode$1(HoodieTableVersion.java:58)
        at java.util.Optional.orElseThrow(Optional.java:290)
        at 
org.apache.hudi.common.table.HoodieTableVersion.versionFromCode(HoodieTableVersion.java:58)
        at 
org.apache.hudi.common.table.HoodieTableConfig.getTableVersion(HoodieTableConfig.java:472)
        at 
org.apache.hudi.common.table.HoodieTableConfig.fetchConfigs(HoodieTableConfig.java:305)
        at 
org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:244)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:125)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:78)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:668)
        at 
org.apache.hudi.HoodieDataSourceHelpers.allCompletedCommitsCompactions(HoodieDataSourceHelpers.java:77)
        at 
org.apache.hudi.HoodieDataSourceHelpers.latestCommit(HoodieDataSourceHelpers.java:67)
        at 
com.<pkg>.spark.parser.BaseStreamingParser.batchParser(BaseStreamingParser.scala:210)
        at 
com.<pkg>.spark.parser.BaseStreamingParser.batchParser$(BaseStreamingParser.scala:146)
        at 
com.<pkg>.spark.parser.<pkg>$.batchParser(AnzStreamingParser.scala:11)
        at 
com.<pkg>.spark.parser.BaseStreamingParser.main(BaseStreamingParser.scala:129)
        at 
com.<pkg>.spark.parser.BaseStreamingParser.main$(BaseStreamingParser.scala:96)
        at com.<pkg>.spark.parser.<pkg>$.main(AnzStreamingParser.scala:11)
        at com.<pkg>.spark.parser.<pkg>.main(AnzStreamingParser.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:740)
   )
   2023-04-07T09:15:37.917+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177] 
[org.apache.spark.SparkContext] [SparkContext]: Invoking stop() from shutdown 
hook
   2023-04-07T09:15:37.925+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177] 
[org.sparkproject.jetty.server.AbstractConnector] [AbstractConnector]: Stopped 
Spark@659dce1b{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
   2023-04-07T09:15:37.927+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177] 
[org.apache.spark.ui.SparkUI] [SparkUI]: Stopped Spark web UI at 
http://ip-100-66-72-129.3175.aws-int.thomsonreuters.com:36067
   2023-04-07T09:15:37.973+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177] 
[org.apache.spark.MapOutputTrackerMasterEndpoint] 
[MapOutputTrackerMasterEndpoint]: MapOutputTrackerMasterEndpoint stopped!
   2023-04-07T09:15:38.044+0000 [INFO] [eb423741-4f44-4fea-b76f-912a6b3f0177] 
[org.apache.spark.SparkContext] [SparkContext]: Successfully stopped 
SparkContext
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to