Re: [PR] [HUDI-7294] [WIP] TVF to query hudi metadata [hudi]

via GitHub Thu, 11 Jan 2024 22:22:32 -0800


codope commented on code in PR #10491:
URL: https://github.com/apache/hudi/pull/10491#discussion_r1449892857



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestHoodieTableValuedFunction.scala:
##########
@@ -558,4 +558,50 @@ class TestHoodieTableValuedFunction extends 
HoodieSparkSqlTestBase {
       }
     }
   }
+
+  test(s"Test hudi_metadata Table-Valued Function") {
+    if (HoodieSparkUtils.gteqSpark3_2) {
+      withTempDir { tmp =>
+        Seq("cow").foreach { tableType =>

Review Comment:
   let's write down a few tests with MOR table and compaction/clustering after 
2 commits.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestHoodieTableValuedFunction.scala:
##########
@@ -558,4 +558,50 @@ class TestHoodieTableValuedFunction extends 
HoodieSparkSqlTestBase {
       }
     }
   }
+
+  test(s"Test hudi_metadata Table-Valued Function") {
+    if (HoodieSparkUtils.gteqSpark3_2) {
+      withTempDir { tmp =>
+        Seq("cow").foreach { tableType =>
+          val tableName = generateTableName
+          val identifier = tableName
+          spark.sql("set " + SPARK_SQL_INSERT_INTO_OPERATION.key + "=upsert")
+          spark.sql(
+            s"""
+               |create table $tableName (
+               |  id int,
+               |  name string,
+               |  ts long,
+               |  price int
+               |) using hudi
+               |partitioned by (price)
+               |tblproperties (
+               |  type = '$tableType',
+               |  primaryKey = 'id',
+               |  preCombineField = 'ts',
+               |  hoodie.datasource.write.recordkey.field = 'id',
+               |  hoodie.metadata.record.index.enable = 'true',
+               |  hoodie.metadata.index.column.stats.enable = 'true',
+               |  hoodie.metadata.index.column.stats.column.list = 'price'
+               |)
+               |location '${tmp.getCanonicalPath}/$tableName'
+               |""".stripMargin
+          )
+
+          spark.sql(
+            s"""
+               | insert into $tableName
+               | values (1, 'a1', 1000, 10), (2, 'a2', 2000, 20), (3, 'a3', 
3000, 30)
+               | """.stripMargin
+          )
+
+          val result1DF = spark.sql(
+            s"select * from hudi_metadata('$identifier')"
+          )
+          result1DF.show(false)

Review Comment:
   Can I have filters in the query? Also, it would make more sense to show the 
actual metadata partition type instead of ordinal.



##########
hudi-spark-datasource/hudi-spark3.2plus-common/src/main/scala/org/apache/spark/sql/catalyst/plans/logcal/HoodieMetadataTableValuedFunction.scala:
##########
@@ -0,0 +1,30 @@
+package org.apache.spark.sql.catalyst.plans.logcal
+
+import org.apache.hudi.common.util.ValidationUtils.checkState
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
+import org.apache.spark.sql.catalyst.plans.logical.LeafNode
+
+object HoodieMetadataTableValuedFunction {
+
+  val FUNC_NAME = "hudi_metadata";
+
+  def parseOptions(exprs: Seq[Expression], funcName: String): (String, 
Map[String, String]) = {
+    val args = exprs.map(_.eval().toString)
+    if (args.size != 1) {
+      throw new AnalysisException(s"Expect arguments (table_name or 
table_path) for function `$funcName`")
+    }
+
+    val identifier = args.head
+
+    (identifier, Map("hoodie.datasource.query.type" -> "snapshot"))

Review Comment:
   Should be snapshot by default. Need to set incremental when users pass 
`as.of.instant` in the query.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7294] [WIP] TVF to query hudi metadata [hudi]

Reply via email to