Re: [PR] [HUDI-6963] Fix class conflict of CreateIndex from Spark3.3 [hudi]

via GitHub Mon, 23 Oct 2023 22:39:14 -0700


yihua commented on code in PR #9895:
URL: https://github.com/apache/hudi/pull/9895#discussion_r1369644417



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestIndexSyntax.scala:
##########
@@ -28,59 +29,61 @@ import 
org.apache.spark.sql.hudi.command.{CreateIndexCommand, DropIndexCommand,
 class TestIndexSyntax extends HoodieSparkSqlTestBase {
 
   test("Test Create/Drop/Show/Refresh Index") {
-    withTempDir { tmp =>
-      Seq("cow", "mor").foreach { tableType =>
-        val databaseName = "default"
-        val tableName = generateTableName
-        val basePath = s"${tmp.getCanonicalPath}/$tableName"
-        spark.sql(
-          s"""
-             |create table $tableName (
-             |  id int,
-             |  name string,
-             |  price double,
-             |  ts long
-             |) using hudi
-             | options (
-             |  primaryKey ='id',
-             |  type = '$tableType',
-             |  preCombineField = 'ts'
-             | )
-             | partitioned by(ts)
-             | location '$basePath'
+    if (HoodieSparkUtils.gteqSpark3_2) {

Review Comment:
   Looks like `TestSecondaryIndex` should also have a precondition on the spark 
version. 



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scala:
##########
@@ -32,23 +31,21 @@ import 
org.apache.spark.sql.hudi.HoodieSqlCommonUtils.getTableLocation
 import org.apache.spark.sql.{Row, SparkSession}
 
 import java.util
-
 import scala.collection.JavaConverters.{collectionAsScalaIterableConverter, 
mapAsJavaMapConverter}
 
 case class CreateIndexCommand(table: CatalogTable,
                               indexName: String,
                               indexType: String,
                               ignoreIfExists: Boolean,
-                              columns: Seq[(Attribute, Map[String, String])],
-                              options: Map[String, String],
-                              override val output: Seq[Attribute]) extends 
IndexBaseCommand {
+                              columns: Seq[(Seq[String], Map[String, String])],
+                              options: Map[String, String]) extends 
IndexBaseCommand {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
     val tableId = table.identifier
     val metaClient = createHoodieTableMetaClient(tableId, sparkSession)
     val columnsMap: java.util.LinkedHashMap[String, java.util.Map[String, 
String]] =
       new util.LinkedHashMap[String, java.util.Map[String, String]]()
-    columns.map(c => columnsMap.put(c._1.name, c._2.asJava))
+    columns.map(c => columnsMap.put(c._1.mkString("."), c._2.asJava))

Review Comment:
   Why change this?  for nested fields?



##########
hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_3ExtendedSqlAstBuilder.scala:
##########
@@ -3327,6 +3327,145 @@ class HoodieSpark3_3ExtendedSqlAstBuilder(conf: 
SQLConf, delegate: ParserInterfa
       position = Option(ctx.colPosition).map(pos =>
         UnresolvedFieldPosition(typedVisit[ColumnPosition](pos))))
   }
+
+  /**

Review Comment:
   I assume the SQL parsing of INDEX SQL statement should not be different 
across Spark versions.



##########
hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_2ExtendedSqlAstBuilder.scala:
##########
@@ -3317,6 +3317,145 @@ class HoodieSpark3_2ExtendedSqlAstBuilder(conf: 
SQLConf, delegate: ParserInterfa
       position = Option(ctx.colPosition).map(pos =>
         UnresolvedFieldPosition(typedVisit[ColumnPosition](pos))))
   }
+
+   /**

Review Comment:
   Got it.  So at least CreateIndex is still supported in Spark 3.2.



##########
hudi-spark-datasource/hudi-spark/src/main/antlr4/org/apache/hudi/spark/sql/parser/HoodieSqlCommon.g4:
##########
@@ -135,51 +120,13 @@
  nonReserved
      : CALL
      | COMPACTION
-     | CREATE
-     | DROP
-     | EXISTS
-     | FROM
-     | IN
-     | INDEX
-     | INDEXES
-     | IF

Review Comment:
   Do we still need some of these tokens for other SQL statements?



##########
hudi-spark-datasource/hudi-spark3.3.x/src/main/antlr4/org/apache/hudi/spark/sql/parser/HoodieSqlBase.g4:
##########
@@ -29,5 +29,12 @@ statement
     | createTableHeader ('(' colTypeList ')')? tableProvider?
         createTableClauses
         (AS? query)?                                                   
#createTable
+    | CREATE INDEX (IF NOT EXISTS)? identifier ON TABLE?

Review Comment:
   Could we still maintain the grammar in a single place for all Spark 
versions, but fail the logical plan of INDEX SQL statement in Spark 3.1 and 
below, so the grammar can be easily maintained?



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieSqlCommonAstBuilder.scala:
##########
@@ -149,144 +149,4 @@ class HoodieSqlCommonAstBuilder(session: SparkSession, 
delegate: ParserInterface
   private def typedVisit[T](ctx: ParseTree): T = {
     ctx.accept(this).asInstanceOf[T]
   }
-
-  /**

Review Comment:
   Are all of this code copied from Spark 3.3+?  Wondering if the original PR 
introducing this intends to support CreateIndex for Spark 3.2 and below.  cc 
@huberylee 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-6963] Fix class conflict of CreateIndex from Spark3.3 [hudi]

Reply via email to