rjmblc opened a new issue, #6341:
URL: https://github.com/apache/hudi/issues/6341

   **Describe the problem you faced**
   I am trying to delete from hudi table using spark apis, but I am neither 
observing any exceptions nor the records are getting deleted.  deltacommit 
getting generated succesfully for the delete request under` .hoodie/` folder.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   1. Launch the pyspark shell from EMR
       `pyspark --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.sql.hive.convertMetastoreParquet=false" --jars 
/usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/spark/external/lib/spark-avro.jar`
   2. Read the hudi table as a dataframe
       `df = 
spark.read.format('hudi').load('s3://${bucket_name}/customer_event_3')
   
        df.show()
   
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name|customer_event_id|client_id| customer_id|event_id|         
event_date|       
created_date|updated_date|source_reference_id|total_amount|customer_event_status|created_by|updated_by|
   
+-------------------+--------------------+------------------+----------------------+--------------------+-----------------+---------+------------+--------+-------------------+-------------------+------------+-------------------+------------+---------------------+----------+----------+
   |  20220809121931258|20220809121931258...|            523312|                
523312|00f5d98d-13b4-4e4...|              266|   523312|618026952022|      
37|2022-06-14 08:10:00|2022-06-14 08:00:08|        null|               null|    
   30.00|                 null|      null|      null|
   |  20220809122023456|20220809122023456...|            523313|                
523313|2660d6d9-999a-4b1...|              267|   523313|618026952023|      
38|2022-06-14 08:10:00|2022-06-14 08:00:08|        null|               null|    
   30.00|                 null|      null|      null|
   
+-------------------+--------------------+------------------+----------------------+--------------------+-----------------+---------+------------+--------+-------------------+-------------------+------------+-------------------+------------+---------------------+----------+----------+
 ` 
   3. Filter records to be deleted as save a dataframe
       `df1 = df.filter('client_id = 523312')`
   4. Set delete config & submit delete request.
       `hudi_delete_options = {
       'hoodie.table.name': "customer_event_3",
        'hoodie.datasource.write.table.name': "customer_event_3",
       'hoodie.datasource.write.recordkey.field': "client_id",
       'hoodie.datasource.write.partitionpath.field': "client_id",
       'hoodie.datasource.write.operation': "delete"
        }
   
   df1.write.format("hudi"). \
     options(**hudi_delete_options). \
     option('hoodie.datasource.write.operation', 'delete'). \
     option('hoodie.datasource.write.payload.class', 
'org.apache.hudi.common.model.EmptyHoodieRecordPayload'). \
     mode("append"). \
     save('s3://offline-store-qa/customer_event_3')`
   5. Validate by reading the table as dataframe and displaying
      `df = 
spark.read.format('hudi').load('s3://${bucket_name}/customer_event_3')
       df.show()`
   
   **Expected behavior**
   The record with client_id 523312 should be removed from the table.
   
   **Environment Description**
   
   * Hudi version :0.10.1-amzn-0
   
   * Spark version :3.2.0
   
   * Hive version :3.1.2
   
   * Hadoop version :3.2.1
   
   * Storage (HDFS/S3/GCS..) :S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   `.hoodie/hoodie.properties` file
   #Properties saved on Tue Aug 09 12:19:27 IST 2022
   #Tue Aug 09 12:19:27 IST 2022
   hoodie.table.partition.fields=client_id
   hoodie.table.type=MERGE_ON_READ
   hoodie.archivelog.folder=archived
   
hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
   hoodie.timeline.layout.version=1
   hoodie.table.version=3
   hoodie.table.recordkey.fields=client_id
   hoodie.datasource.write.partitionpath.urlencode=false
   hoodie.table.keygenerator.class=
   hoodie.table.name=customer_event_3
   hoodie.datasource.write.hive_style_partitioning=false
   
   **Stacktrace**
   
   Its not throwing any error logs
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to