[GitHub] [hudi] zyclove opened a new issue, #8903: [SUPPORT] aws spark3.2.1 & hudi 0.13.1 with java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.PartitionedFile

via GitHub Wed, 07 Jun 2023 23:44:37 -0700


zyclove opened a new issue, #8903:
URL: https://github.com/apache/hudi/issues/8903


   
   **Describe the problem you faced**
   run :
   spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.13.1 \
   > --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
   > --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
   > --conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
   
   query:
   select * from bi_ods_real.ods_api_test_task_log_rt   limit 10;
   
   with error:
   
   Unable to read data from MOR table using spark. ERROR: 
org.apache.spark.sql.execution.datasources.PartitionedFile
   
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.13.1
   
   * Spark version : aws emr 3.2.1
   
   * Hive version :2.3.9
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) :s3
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   spark-sql> select * from bi_ods_real.ods_api_test_task_log_rt   limit 10;
   23/06/08 06:29:56 ERROR SparkSQLDriver: Failed in [select * from 
bi_ods_real.ods_api_test_task_log_rt   limit 10]
   java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile.<init>(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.$anonfun$buildSplits$2(MergeOnReadSnapshotRelation.scala:237)
           at scala.Option.map(Option.scala:230)
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.$anonfun$buildSplits$1(MergeOnReadSnapshotRelation.scala:235)
           at scala.collection.immutable.List.map(List.scala:293)
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.buildSplits(MergeOnReadSnapshotRelation.scala:231)
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:223)
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:64)
           at 
org.apache.hudi.HoodieBaseRelation.buildScan(HoodieBaseRelation.scala:353)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.$anonfun$apply$4(DataSourceStrategy.scala:359)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.$anonfun$pruneFilterProject$1(DataSourceStrategy.scala:393)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:449)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:392)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:359)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
           at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
           at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:71)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)
           at 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)
           at 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)
           at scala.collection.Iterator.foreach(Iterator.scala:943)
           at scala.collection.Iterator.foreach$(Iterator.scala:943)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
           at 
scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)
           at 
scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192)
           at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75)
           at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
           at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:71)
           at 
org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:504)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$2(QueryExecution.scala:165)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:165)
           at 
org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:75)
           at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:158)
           at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:158)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$2(QueryExecution.scala:178)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:175)
           at 
org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:75)
           at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:171)
           at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:171)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$5(QueryExecution.scala:308)
           at 
org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:606)
           at 
org.apache.spark.sql.execution.QueryExecution.writePlans(QueryExecution.scala:308)
           at 
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:323)
           at 
org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:277)
           at 
org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:256)
           at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:102)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
           at scala.collection.Iterator.foreach(Iterator.scala:943)
           at scala.collection.Iterator.foreach$(Iterator.scala:943)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
           at scala.collection.IterableLike.foreach(IterableLike.scala:74)
           at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
           at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile.<init>(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.$anonfun$buildSplits$2(MergeOnReadSnapshotRelation.scala:237)
           at scala.Option.map(Option.scala:230)
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.$anonfun$buildSplits$1(MergeOnReadSnapshotRelation.scala:235)
           at scala.collection.immutable.List.map(List.scala:293)
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.buildSplits(MergeOnReadSnapshotRelation.scala:231)
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:223)
           at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:64)
           at 
org.apache.hudi.HoodieBaseRelation.buildScan(HoodieBaseRelation.scala:353)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.$anonfun$apply$4(DataSourceStrategy.scala:359)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.$anonfun$pruneFilterProject$1(DataSourceStrategy.scala:393)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:449)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:392)
           at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:359)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
           at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
           at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:71)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)
           at 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)
           at 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)
           at scala.collection.Iterator.foreach(Iterator.scala:943)
           at scala.collection.Iterator.foreach$(Iterator.scala:943)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
           at 
scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)
           at 
scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192)
           at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75)
           at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
           at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
           at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:71)
           at 
org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:504)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$2(QueryExecution.scala:165)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:165)
           at 
org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:75)
           at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:158)
           at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:158)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$2(QueryExecution.scala:178)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:175)
           at 
org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:75)
           at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:171)
           at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:171)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$5(QueryExecution.scala:308)
           at 
org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:606)
           at 
org.apache.spark.sql.execution.QueryExecution.writePlans(QueryExecution.scala:308)
           at 
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:323)
           at 
org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:277)
           at 
org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:256)
           at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:102)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
           at scala.collection.Iterator.foreach(Iterator.scala:943)
           at scala.collection.Iterator.foreach$(Iterator.scala:943)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
           at scala.collection.IterableLike.foreach(IterableLike.scala:74)
           at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
           at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
           at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] zyclove opened a new issue, #8903: [SUPPORT] aws spark3.2.1 & hudi 0.13.1 with java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.PartitionedFile

Reply via email to