yihua commented on code in PR #12772: URL: https://github.com/apache/hudi/pull/12772#discussion_r2076179687
########## hudi-spark-datasource/hudi-spark4.0.x/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSpark40Analysis.scala: ########## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi.analysis + +import org.apache.hudi.{DefaultSource, EmptyRelation, HoodieBaseRelation} +import org.apache.hudi.SparkAdapterSupport.sparkAdapter + +import org.apache.spark.sql.{AnalysisException, SparkSession, SQLContext} +import org.apache.spark.sql.catalyst.analysis.{ResolveInsertionBase, TableOutputResolver} +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, HiveTableRelation} +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.errors.DataTypeErrors.toSQLId +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation, PreprocessTableInsertion} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation +import org.apache.spark.sql.hudi.ProvidesHoodieConfig +import org.apache.spark.sql.hudi.catalog.HoodieInternalV2Table +import org.apache.spark.sql.sources.InsertableRelation +import org.apache.spark.sql.types.StructType +import org.apache.spark.sql.util.PartitioningUtils.normalizePartitionSpec + +/** + * NOTE: PLEASE READ CAREFULLY + * + * Since Hudi relations don't currently implement DS V2 Read API, we have to fallback to V1 here. + * Such fallback will have considerable performance impact, therefore it's only performed in cases + * where V2 API have to be used. Currently only such use-case is using of Schema Evolution feature + * + * Check out HUDI-4178 for more details + */ +case class HoodieSpark40DataSourceV2ToV1Fallback(sparkSession: SparkSession) extends Rule[LogicalPlan] Review Comment: Similarly, I see `HoodieSpark40Analysis` is the same as `HoodieSpark35Analysis` except the class names. Could we extract these rules as a common rule class? OK to extract the existing logic in a separate PR so it's easier to reuse in this PR. ########## hudi-spark-datasource/hudi-spark4-common/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSpark4Analysis.scala: ########## @@ -0,0 +1,363 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi.analysis + +import org.apache.hudi.{DataSourceReadOptions, DefaultSource, SparkAdapterSupport} +import org.apache.hudi.storage.StoragePath + +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.HoodieSpark4CatalystPlanUtils.MatchResolvedTable +import org.apache.spark.sql.catalyst.analysis.{EliminateSubqueryAliases, NamedRelation, ResolvedFieldName, UnresolvedAttribute, UnresolvedFieldName, UnresolvedPartitionSpec} +import org.apache.spark.sql.catalyst.analysis.SimpleAnalyzer.resolveExpressionByPlanChildren +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogUtils} +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.catalyst.trees.Origin +import org.apache.spark.sql.connector.catalog.{Table, V1Table} +import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.IdentifierHelper +import org.apache.spark.sql.execution.command.DescribeTableCommand +import org.apache.spark.sql.execution.datasources.{DataSource, LogicalRelation} +import org.apache.spark.sql.hudi.HoodieSqlCommonUtils.isMetaField +import org.apache.spark.sql.hudi.ProvidesHoodieConfig +import org.apache.spark.sql.hudi.analysis.HoodieSpark4Analysis.{HoodieV1OrV2Table, ResolvesToHudiTable} +import org.apache.spark.sql.hudi.catalog.HoodieInternalV2Table +import org.apache.spark.sql.hudi.command.{AlterHoodieTableDropPartitionCommand, ShowHoodieTablePartitionsCommand, TruncateHoodieTableCommand} +import org.apache.spark.sql.hudi.command.exception.HoodieAnalysisException +import org.apache.spark.sql.hudi.logical.TimeTravelRelation + +/** + * NOTE: PLEASE READ CAREFULLY + * + * Since Hudi relations don't currently implement DS V2 Read API, we have to fallback to V1 here. + * Such fallback will have considerable performance impact, therefore it's only performed in cases + * where V2 API have to be used. Currently only such use-case is using of Schema Evolution feature + * + * Check out HUDI-4178 for more details + */ + +/** + * Rule for resolve hoodie's extended syntax or rewrite some logical plan. + */ +case class HoodieSpark4ResolveReferences(spark: SparkSession) extends Rule[LogicalPlan] Review Comment: `HoodieSpark4Analysis` is exactly the same as `HoodieSpark3Analysis` except for class names. Let's avoid the code duplication, relocate, rename and reuse the class. ########## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala: ########## @@ -114,7 +126,14 @@ object HoodieAnalysis extends SparkAdapterSupport { session => HoodiePostAnalysisRule(session) ) - if (HoodieSparkUtils.isSpark3) { + if (HoodieSparkUtils.isSpark4) { + val spark4PostHocResolutionClass = "org.apache.spark.sql.hudi.analysis.HoodieSpark4PostAnalysisRule" Review Comment: Same here, `HoodieSpark4PostAnalysisRule` and `HoodieSpark3PostAnalysisRule` have the same logic. ########## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala: ########## @@ -114,7 +126,14 @@ object HoodieAnalysis extends SparkAdapterSupport { session => HoodiePostAnalysisRule(session) ) - if (HoodieSparkUtils.isSpark3) { + if (HoodieSparkUtils.isSpark4) { + val spark4PostHocResolutionClass = "org.apache.spark.sql.hudi.analysis.HoodieSpark4PostAnalysisRule" + val spark4PostHocResolution: RuleBuilder = + session => instantiateKlass(spark4PostHocResolutionClass, session) + + rules += spark4PostHocResolution + } else { + // spark 3c Review Comment: ```suggestion // spark 3 ``` ########## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala: ########## @@ -66,24 +69,33 @@ object HoodieAnalysis extends SparkAdapterSupport { val dataSourceV2ToV1Fallback: RuleBuilder = session => instantiateKlass(dataSourceV2ToV1FallbackClass, session) - val spark3ResolveReferencesClass = "org.apache.spark.sql.hudi.analysis.HoodieSpark3ResolveReferences" - val spark3ResolveReferences: RuleBuilder = - session => instantiateKlass(spark3ResolveReferencesClass, session) + val spark3PlusResolveReferencesClass = if (HoodieSparkUtils.isSpark4) + "org.apache.spark.sql.hudi.analysis.HoodieSpark4ResolveReferences" + else + "org.apache.spark.sql.hudi.analysis.HoodieSpark3ResolveReferences" + val spark3PlusResolveReferences: RuleBuilder = + session => instantiateKlass(spark3PlusResolveReferencesClass, session) // NOTE: PLEASE READ CAREFULLY BEFORE CHANGING // // It's critical for this rules to follow in this order; re-ordering this rules might lead to changes in // behavior of Spark's analysis phase (for ex, DataSource V2 to V1 fallback might not kick in before other rules, // leading to all relations resolving as V2 instead of current expectation of them being resolved as V1) - rules ++= Seq(dataSourceV2ToV1Fallback, spark3ResolveReferences) + rules ++= Seq(dataSourceV2ToV1Fallback, spark3PlusResolveReferences) - if (HoodieSparkUtils.gteqSpark3_5) { + if (HoodieSparkUtils.isSpark3_5) { rules += (_ => instantiateKlass( "org.apache.spark.sql.hudi.analysis.HoodieSpark35ResolveColumnsForInsertInto")) } + if (HoodieSparkUtils.isSpark4_0) { + rules += (_ => instantiateKlass( + "org.apache.spark.sql.hudi.analysis.HoodieSpark40ResolveColumnsForInsertInto")) + } val resolveAlterTableCommandsClass = - if (HoodieSparkUtils.gteqSpark3_5) { + if (HoodieSparkUtils.gteqSpark4_0) { + "org.apache.spark.sql.hudi.Spark40ResolveHudiAlterTableCommand" Review Comment: Similar here on `ResolveHudiAlterTableCommand` for reuse. Let's file JIRA tickets if we could not address them in this PR. ########## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala: ########## @@ -130,8 +149,9 @@ object HoodieAnalysis extends SparkAdapterSupport { // Default rules ) - val nestedSchemaPruningClass = - if (HoodieSparkUtils.gteqSpark3_5) { + val nestedSchemaPruningClass = if (HoodieSparkUtils.gteqSpark4_0) { + "org.apache.spark.sql.execution.datasources.Spark40NestedSchemaPruning" Review Comment: Same here on code reuse on `Spark40NestedSchemaPruning` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
