[GitHub] [hudi] n3nash commented on a change in pull request #2927: [HUDI-1129] Adding support to ingest records with old schema after table's schema is evolved

GitBox Mon, 10 May 2021 14:09:57 -0700


n3nash commented on a change in pull request #2927:
URL: https://github.com/apache/hudi/pull/2927#discussion_r629685145




##########
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##########
@@ -483,6 +483,17 @@ class HoodieSparkSqlWriterSuite extends FunSuite with 
Matchers {
           // ensure 2nd batch of updates matches.
           assert(df3.intersect(trimmedDf3).except(df3).count() == 0)
 
+          // ingest new batch with old schema.
+          records = DataSourceTestUtils.generateRandomRows(10)
+          recordsSeq = convertRowListToSeq(records)
+          val df4 = spark.createDataFrame(sc.parallelize(recordsSeq), 
structType)
+          // write to Hudi
+          HoodieSparkSqlWriter.write(sqlContext, SaveMode.Append, 
fooTableParams, df4)
+
+          val snapshotDF4 = spark.read.format("org.apache.hudi")
+            .load(path.toAbsolutePath.toString + "/*/*/*/*")
+          assertEquals(25, snapshotDF4.count())

Review comment:
       Can we also validate the schema of the newly written files is the same 
as the latest schema and not the older schema from the records ? That will also 
help test the avro -> df -> avro conversion flow or may be add a specific test 
for that too in HoodieSparkUtils ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] n3nash commented on a change in pull request #2927: [HUDI-1129] Adding support to ingest records with old schema after table's schema is evolved

Reply via email to