vinothchandar commented on issue #2515:
URL: https://github.com/apache/hudi/issues/2515#issuecomment-777592168
Thanks @rubenssoto for narrowing it down.
```
@Test
def testArchivalIssue(): Unit = {
println(s"Basepath : ${basePath}");
for (i <- 1 to 20) {
val records =
recordsToStrings(dataGen.generateInserts("%05d".format(i), 100)).toList
val inputDF = spark.read.json(spark.sparkContext.parallelize(records,
2))
inputDF.write.format("hudi")
.options(commonOpts)
.option("hoodie.keep.min.commits", "2")
.option("hoodie.keep.max.commits", "3")
.option("hoodie.cleaner.commits.retained", "1")
.option("hoodie.datasource.write.row.writer.enable", "true")
.option(DataSourceWriteOptions.OPERATION_OPT_KEY,
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
.mode(if (i == 0) SaveMode.Overwrite else SaveMode.Append)
.save(basePath)
println(s"Round ${i} of inserts.")
}
Thread.sleep(Int.MaxValue)
}
```
reproduces this. Issue is mostly that the row writer just passed in
Option.empty extra metadata, while the writeclient path passed in an empty
hashmap (so the fix I cited above was working).
For now, please just limit row writer to the initial bulk_insert (it only
works with bulk_insert atm anyway). I ll have a PR out soon.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]