Re: [PR] [HUDI-6963] Fix class conflict of CreateIndex from Spark3.3 [hudi]

via GitHub Thu, 26 Oct 2023 00:04:37 -0700


boneanxs commented on code in PR #9895:
URL: https://github.com/apache/hudi/pull/9895#discussion_r1372638346



##########
hudi-spark-datasource/hudi-spark/src/main/antlr4/org/apache/hudi/spark/sql/parser/HoodieSqlCommon.g4:
##########
@@ -135,51 +120,13 @@
  nonReserved
      : CALL
      | COMPACTION
-     | CREATE
-     | DROP
-     | EXISTS
-     | FROM
-     | IN
-     | INDEX
-     | INDEXES
-     | IF

Review Comment:
   No, I checked and only remove unneeded.



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieSqlCommonAstBuilder.scala:
##########
@@ -149,144 +149,4 @@ class HoodieSqlCommonAstBuilder(session: SparkSession, 
delegate: ParserInterface
   private def typedVisit[T](ctx: ParseTree): T = {
     ctx.accept(this).asInstanceOf[T]
   }
-
-  /**

Review Comment:
   These codes are introduced from https://github.com/apache/hudi/pull/5761, 
maybe it's only for supporting Index commands



##########
hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_3ExtendedSqlAstBuilder.scala:
##########
@@ -3327,6 +3327,145 @@ class HoodieSpark3_3ExtendedSqlAstBuilder(conf: 
SQLConf, delegate: ParserInterfa
       position = Option(ctx.colPosition).map(pos =>
         UnresolvedFieldPosition(typedVisit[ColumnPosition](pos))))
   }
+
+  /**

Review Comment:
   Yea, it should be same



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scala:
##########
@@ -32,23 +31,21 @@ import 
org.apache.spark.sql.hudi.HoodieSqlCommonUtils.getTableLocation
 import org.apache.spark.sql.{Row, SparkSession}
 
 import java.util
-
 import scala.collection.JavaConverters.{collectionAsScalaIterableConverter, 
mapAsJavaMapConverter}
 
 case class CreateIndexCommand(table: CatalogTable,
                               indexName: String,
                               indexType: String,
                               ignoreIfExists: Boolean,
-                              columns: Seq[(Attribute, Map[String, String])],
-                              options: Map[String, String],
-                              override val output: Seq[Attribute]) extends 
IndexBaseCommand {
+                              columns: Seq[(Seq[String], Map[String, String])],
+                              options: Map[String, String]) extends 
IndexBaseCommand {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
     val tableId = table.identifier
     val metaClient = createHoodieTableMetaClient(tableId, sparkSession)
     val columnsMap: java.util.LinkedHashMap[String, java.util.Map[String, 
String]] =
       new util.LinkedHashMap[String, java.util.Map[String, String]]()
-    columns.map(c => columnsMap.put(c._1.name, c._2.asJava))
+    columns.map(c => columnsMap.put(c._1.mkString("."), c._2.asJava))

Review Comment:
   Now columns' name is `Seq[String]` instead of 
`Attribute`(`UnresolvedAttribute` to be more specific), since this pr will try 
to resolve column names
   
https://github.com/apache/hudi/blob/6eb38c5b78e4f53a7d21c15f5bb63cd528bc5132/hudi-spark-datasource/hudi-spark3.2plus-common/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSpark32PlusAnalysis.scala#L208
 
   `UnresolvedAttribute` is no need anymore, so here I directly use 
`Seq[String]` to represent the column name here



##########
hudi-spark-datasource/hudi-spark3.3.x/src/main/antlr4/org/apache/hudi/spark/sql/parser/HoodieSqlBase.g4:
##########
@@ -29,5 +29,12 @@ statement
     | createTableHeader ('(' colTypeList ')')? tableProvider?
         createTableClauses
         (AS? query)?                                                   
#createTable
+    | CREATE INDEX (IF NOT EXISTS)? identifier ON TABLE?

Review Comment:
   We cannot put the grammar in `HoodieSqlCommon.g4`, since we have to parse it 
in `HoodieSqlCommonAstBuilder` to build `CreateIndex`, while one param 
`FieldName` doesn't exist below Spark3.
   
   We already have this for `time travel`, and grammar is rarely modified, so 
changing it like so is acceptable?



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestIndexSyntax.scala:
##########
@@ -28,59 +29,61 @@ import 
org.apache.spark.sql.hudi.command.{CreateIndexCommand, DropIndexCommand,
 class TestIndexSyntax extends HoodieSparkSqlTestBase {
 
   test("Test Create/Drop/Show/Refresh Index") {
-    withTempDir { tmp =>
-      Seq("cow", "mor").foreach { tableType =>
-        val databaseName = "default"
-        val tableName = generateTableName
-        val basePath = s"${tmp.getCanonicalPath}/$tableName"
-        spark.sql(
-          s"""
-             |create table $tableName (
-             |  id int,
-             |  name string,
-             |  price double,
-             |  ts long
-             |) using hudi
-             | options (
-             |  primaryKey ='id',
-             |  type = '$tableType',
-             |  preCombineField = 'ts'
-             | )
-             | partitioned by(ts)
-             | location '$basePath'
+    if (HoodieSparkUtils.gteqSpark3_2) {

Review Comment:
   yea, will fix it as well



##########
hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_2ExtendedSqlAstBuilder.scala:
##########
@@ -3317,6 +3317,145 @@ class HoodieSpark3_2ExtendedSqlAstBuilder(conf: 
SQLConf, delegate: ParserInterfa
       position = Option(ctx.colPosition).map(pos =>
         UnresolvedFieldPosition(typedVisit[ColumnPosition](pos))))
   }
+
+   /**

Review Comment:
   Yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-6963] Fix class conflict of CreateIndex from Spark3.3 [hudi]

Reply via email to