n3nash commented on a change in pull request #2927:
URL: https://github.com/apache/hudi/pull/2927#discussion_r629685145
##########
File path:
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##########
@@ -483,6 +483,17 @@ class HoodieSparkSqlWriterSuite extends FunSuite with
Matchers {
// ensure 2nd batch of updates matches.
assert(df3.intersect(trimmedDf3).except(df3).count() == 0)
+ // ingest new batch with old schema.
+ records = DataSourceTestUtils.generateRandomRows(10)
+ recordsSeq = convertRowListToSeq(records)
+ val df4 = spark.createDataFrame(sc.parallelize(recordsSeq),
structType)
+ // write to Hudi
+ HoodieSparkSqlWriter.write(sqlContext, SaveMode.Append,
fooTableParams, df4)
+
+ val snapshotDF4 = spark.read.format("org.apache.hudi")
+ .load(path.toAbsolutePath.toString + "/*/*/*/*")
+ assertEquals(25, snapshotDF4.count())
Review comment:
Can we also validate the schema of the newly written files is the same
as the latest schema and not the older schema from the records ? That will also
help test the avro -> df -> avro conversion flow or may be add a specific test
for that too in HoodieSparkUtils ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]