[GitHub] [hudi] ankur334 opened a new issue, #9217: [SUPPORT] Partial Update with Partition column is not working as expected.

via GitHub Mon, 17 Jul 2023 06:19:09 -0700


ankur334 opened a new issue, #9217:
URL: https://github.com/apache/hudi/issues/9217


   
   **Partial Update with some partition column/key is not working as expected.**
   
   Let's suppose I currently have the following event/message.
   
   ```
   {
     "id": 1,
     "language": "python",
      "created": "2023-07-12",
      "updated": "2023-07-12"
   }
   ```
   
   
   **primaryKey** = id
   **deDupKey/preCombine** = updated
   **partition** = created
   
   I am applying UPSERT as a writeOperation type.
   
   Now I want to apply the partial update when receiving a record from my 
source system/producer.
   
   The new incoming event is as follows.
   
   ```
   {
      "id": 1,
      "language": "scala",
      "updated": "2023-07-13"
   }
   ```
   
   Now after a partial update, I want to update only columns like language & 
updated column. But after applying the partial update, we are getting null in 
the CREATED column.
   
   The expected result after the merge/partial update should be
   
   ```
   {
     "id": 1,
     "language": "scala",
     "created": "2023-07-12",
     "updated": "2023-07-13"
   }
   ```
   
   But it is coming as
   
   ```
   {
      "id": 1,
      "language": "scala",
      "created": null,
      "updated": "2023-07-13"
   }
   ```
   
   Which is actually wrong. Will you please help us here? Are we doing 
something wrong?
   
   **Environment Description**
   
   Hudi version : 0.13.1
   Spark version: 3.1
   Hive version: 3.1
   Storage (HDFS/S3/GCS..) : GCS
   Running on Docker? (yes/no) : No, running on Dataproc
   
   **Hudi Configs**
   
   ```
   val hudiConfigs: Map[String, String] = Map(
     "hoodie.datasource.write.hive_style_partitioning" -> "true",
     "hoodie.datasource.write.drop.partition.columns" -> "true",
     "hoodie.partition.metafile.use.base.format" -> "true",
     "hoodie.metadata.enable" -> "true",
     "hoodie.datasource.write.reconcile.schema" -> "true",
     "hoodie.schema.on.read.enable" -> "true",
     "hoodie.upsert.shuffle.parallelism" -> "1000",
     "hoodie.bloom.index.parallelism" -> "1000",
     "hoodie.index.type" -> "GLOBAL_BLOOM",
     "hoodie.datasource.write.payload.class" -> 
"org.apache.hudi.common.model.PartialUpdateAvroPayload"
    )
   ```
   
   
   **To Reproduce**
   
   Steps to reproduce the behaviour:
   
   1. Initiate the spark session & pass the hudi configs mentioned above
   2. Choose ID as the primary Key, created a partition column & updated as 
deDup/preCombine field. 
   3. First insert the record by supplying all the columns.
   4. In a partial update, don't pass the created column, pass the schema which 
will make the `created` column as null. 
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] ankur334 opened a new issue, #9217: [SUPPORT] Partial Update with Partition column is not working as expected.

Reply via email to