Re: [PR] [HUDI-7023] Support querying without syncing partition metadata to catalog [hudi]

via GitHub Thu, 07 Dec 2023 18:27:41 -0800


codope commented on code in PR #10153:
URL: https://github.com/apache/hudi/pull/10153#discussion_r1419864232



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala:
##########
@@ -140,6 +147,75 @@ class TestFunctionalIndex extends HoodieSparkSqlTestBase {
     }
   }
 
+  test("Test Functional Index With Hive Sync Non Partitioned Table") {
+    // There is a big difference between Java class loader architecture of 
versions 1.8 and 17.
+    // Hive 2.3.7 is compiled with Java 1.8, and the class loader used there 
throws error when Hive APIs are run on Java 17.
+    // So we special case this test only for Java 8.
+    if (HoodieSparkUtils.gteqSpark3_2 && HoodieTestUtils.getJavaVersion == 8) {
+      withTempDir { tmp =>
+        Seq("mor").foreach { tableType =>
+          val databaseName = "default"
+          val tableName = generateTableName
+          val basePath = s"${tmp.getCanonicalPath}/$tableName"
+          spark.sql(
+            s"""
+               |create table $tableName (
+               |  id int,
+               |  name string,
+               |  price double,
+               |  ts long
+               |) using hudi
+               | options (
+               |  primaryKey ='id',
+               |  type = '$tableType',
+               |  preCombineField = 'ts'
+               | )
+               | partitioned by(ts)
+               | location '$basePath'
+       """.stripMargin)
+          // ts=1000 and from_unixtime(ts, 'yyyy-MM-dd') = '1970-01-01'
+          spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
+          // ts=100000 and from_unixtime(ts, 'yyyy-MM-dd') = '1970-01-02'
+          spark.sql(s"insert into $tableName values(2, 'a2', 10, 100000)")
+          // ts=10000000 and from_unixtime(ts, 'yyyy-MM-dd') = '1970-04-26'
+          spark.sql(s"insert into $tableName values(3, 'a3', 10, 10000000)")
+
+          val createIndexSql = s"create index idx_datestr on $tableName using 
column_stats(ts) options(func='from_unixtime', format='yyyy-MM-dd')"
+          spark.sql(createIndexSql)
+          val metaClient = HoodieTableMetaClient.builder()
+            .setBasePath(basePath)
+            .setConf(spark.sessionState.newHadoopConf())
+            .build()
+          assertTrue(metaClient.getFunctionalIndexMetadata.isPresent)
+          val functionalIndexMetadata = 
metaClient.getFunctionalIndexMetadata.get()
+          assertEquals(1, functionalIndexMetadata.getIndexDefinitions.size())
+          assertEquals("func_index_idx_datestr", 
functionalIndexMetadata.getIndexDefinitions.get("func_index_idx_datestr").getIndexName)
+
+          // sync to hive without partition metadata
+          val hiveSyncProps = new TypedProperties()

Review Comment:
   I am adding anew table config for logical partitioning in 
https://github.com/apache/hudi/pull/10274/files#diff-53ae78ff1f1bd5d8b0f87cb69853299e5228b44f30b770e27f60c0c3c27d4185R301
   If logical partitioning is enabled for a table then neither the table will 
be physically partitioned, nor the partition metadata be synced to catalog. Do 
you think we should add a separate meta sync config, as in, user can still have 
physically partitioned table but no need to sync partition metadata to catalog?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7023] Support querying without syncing partition metadata to catalog [hudi]

Reply via email to