vinothchandar commented on issue #2515:
URL: https://github.com/apache/hudi/issues/2515#issuecomment-777592168


   Thanks @rubenssoto for narrowing it down. 
   
   ```
     @Test
     def testArchivalIssue(): Unit = {
       println(s"Basepath : ${basePath}");
       for (i <- 1 to 20) {
         val records = 
recordsToStrings(dataGen.generateInserts("%05d".format(i), 100)).toList
         val inputDF = spark.read.json(spark.sparkContext.parallelize(records, 
2))
         inputDF.write.format("hudi")
           .options(commonOpts)
           .option("hoodie.keep.min.commits", "2")
           .option("hoodie.keep.max.commits", "3")
           .option("hoodie.cleaner.commits.retained", "1")
           .option("hoodie.datasource.write.row.writer.enable", "true")
           .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
           .mode(if (i == 0) SaveMode.Overwrite else SaveMode.Append)
           .save(basePath)
         println(s"Round ${i} of inserts.")
       }
       Thread.sleep(Int.MaxValue)
     }
   ```
   
   reproduces this. Issue is mostly that the row writer just passed in 
Option.empty extra metadata, while the writeclient path passed in an empty 
hashmap (so the fix I cited above was working). 
   
   For now, please just limit row writer to the initial bulk_insert (it only 
works with bulk_insert atm anyway). I ll have a PR out soon. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to