bhasudha commented on issue #1225: [MINOR] Adding util methods to assist in 
adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574785486
 
 
   > @bhasudha : I have changed the way we wanna generate deletes. Basically I 
pass in insert records for which delete records will be generated. If we go 
with previous approach of generating random deletes, I couldn't verify if 
deletes actually deleted some records. So, have taken this approach.
   > 
   > Steps I plan to add to Quick start is as follows
   > 
   > * Generate a new batch of inserts.
   > * Fetch all records from this new batch (// fix the rider value below 
since each batch will have unique rider value)
   >   val ds = spark.sql("select uuid, partitionPath from  hudi_ro_table where 
rider = 'rider-213'")
   > * Generate delete records
   >   val deletes = dataGen.generateDeletes(ds.collectAsList())
   > * Issue deletes
   >   val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2));
   >   df.write.format("org.apache.hudi").
   >   options(getQuickstartWriteConfigs).
   >   option(OPERATION_OPT_KEY,"delete").
   >   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
   >   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
   >   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
   >   option(TABLE_NAME, tableName).
   >   mode(Append).
   >   save(basePath);
   > * Same select query above should fetch 0 records since all records have 
been deleted.
   >   spark.sql("select uuid, partitionPath from  hudi_ro_table where rider = 
'rider-213'").count()
   
   Plan sounds good. I think there are some checkystyle issues in the build. 
Once you fix I will be able to approve and merge.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to