hudi-agent commented on code in PR #18583:
URL: https://github.com/apache/hudi/pull/18583#discussion_r3200754819
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/schema/TestVariantDataType.scala:
##########
@@ -128,6 +130,119 @@ class TestVariantDataType extends HoodieSparkSqlTestBase {
}
}
+ test("Test Query Log Only MOR Table With VARIANT column triggers
compaction") {
+ assume(HoodieSparkUtils.gteqSpark4_0, "Variant type requires Spark 4.0 or
higher")
+
+ withRecordType()(withTempDir { tmp =>
Review Comment:
🤖 nit: `tmp.getCanonicalPath` is evaluated four times inline in this test
body, while the sibling BLOB tests in this PR extract it to `val tablePath`.
Could you add `val tablePath = tmp.getCanonicalPath` alongside `val tableName`
here to keep the pattern consistent?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestVectorDataSource.scala:
##########
@@ -674,6 +675,129 @@ class TestVectorDataSource extends
HoodieSparkClientTestBase {
assertTrue(r7.getSeq[Double](1).forall(_ == 1.0), "key_7 should have
original value 1.0")
}
+ @Test
+ def testMorLogOnlyCompactionPreservesVectorMetadata(): Unit = {
+ val path = basePath + "/mor_log_only_vec"
+ val tableName = "mor_log_only_vec_test"
+ try {
+ spark.sql(
+ s"""
+ |create table $tableName (
+ | id int,
+ | embedding VECTOR(3),
+ | ts long
+ |) using hudi
+ | location '$path'
+ | tblproperties (
+ | primaryKey = 'id',
+ | type = 'mor',
+ | preCombineField = 'ts',
+ | hoodie.index.type = 'INMEMORY',
+ | hoodie.compact.inline = 'true',
+ | hoodie.clean.commits.retained = '1'
+ | )
+ """.stripMargin)
+
+ def readOrdered(): Seq[Row] =
+ spark.sql(s"select id, embedding, ts from $tableName order by
id").collect().toSeq
+
+ def embeddingOf(id: Int, rows: Seq[Row]): Seq[Float] =
+ rows.find(_.getInt(0) == id).get.getSeq[Float](1)
Review Comment:
🤖 nit: `rows.find(_.getInt(0) == id).get` will throw a bare
`NoSuchElementException: None.get` if the id isn't present, which gives no hint
about which id was missing. Could you use `.getOrElse(fail(s"No row with
id=$id"))` so a future test failure is immediately self-explanatory?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]