[GitHub] [hudi] parisni opened a new issue, #9022: [SUPPORT] Cannot count rows in MDT since 0.12

via GitHub Tue, 20 Jun 2023 08:09:38 -0700


parisni opened a new issue, #9022:
URL: https://github.com/apache/hudi/issues/9022


   I try to read a mdt created with hudi 0.13.1 / spark 3.2.3
   
   hudi  0.11.1 / spark 3.2.3
   
   ```
   scala> spark.read.format("hudi").load("/tmp/metadata").select("key").count
   res0: Long = 857151
   ```
   
   
   hudi 0.12.3 / 0.13.1  /spark 3.2.3
   ```
   scala> spark.read.format("hudi").load("/tmp/metadata").select("key").count
   scala.MatchError: HFILE (of class 
org.apache.hudi.common.model.HoodieFileFormat)
     at 
org.apache.hudi.HoodieBaseRelation.x$3$lzycompute(HoodieBaseRelation.scala:223)
     at org.apache.hudi.HoodieBaseRelation.x$3(HoodieBaseRelation.scala:222)
     at 
org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:222)
     at 
org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:222)
     at 
org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:303)
     at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
     at 
org.apache.spark.sql.execution.datasources.Spark32NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark32NestedSchemaPruning.scala:56)
     at 
org.apache.spark.sql.execution.datasources.Spark32NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark32NestedSchemaPruning.scala:50)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
     at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
     at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
     at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:486)
     at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1122)
     at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1121)
     at 
org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:948)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
     at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
     at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
     at 
org.apache.spark.sql.execution.datasources.Spark32NestedSchemaPruning.apply0(Spark32NestedSchemaPruning.scala:50)
     at 
org.apache.spark.sql.execution.datasources.Spark32NestedSchemaPruning.apply(Spark32NestedSchemaPruning.scala:44)
     at 
org.apache.spark.sql.execution.datasources.Spark32NestedSchemaPruning.apply(Spark32NestedSchemaPruning.scala:39)
     at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
     at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
     at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
     at scala.collection.immutable.List.foldLeft(List.scala:91)
     at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
     at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
     at scala.collection.immutable.List.foreach(List.scala:431)
     at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
     at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
     at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
     at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:138)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
     at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
     at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196)
     at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:134)
     at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
     at 
org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
     at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
     at 
org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
     at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
     at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
     at 
org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:214)
     at 
org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:259)
     at 
org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:228)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
     at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704)
     at org.apache.spark.sql.Dataset.count(Dataset.scala:3011)
     ... 47 elided
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] parisni opened a new issue, #9022: [SUPPORT] Cannot count rows in MDT since 0.12

Reply via email to