a0x commented on issue #5792:
URL: https://github.com/apache/hudi/issues/5792#issuecomment-1149507752

   Here is my analysis.
   
   The key exception is **`java.lang.RuntimeException: Null-value for required 
field: note`**, which means the field `note` is not nullable. But I added 
`null` value in the first place, so it doesn't make any sense.
   
   After digging into the log and the parquet file, I found something 
interesting.
   
   1. After the last update was triggered, some data was written into the 
storage. (the last update was triggered at **June 08 2022, 12:48:35 PM**,  
which is not shown in the previous picture) 
       <img width="1333" alt="image" 
src="https://user-images.githubusercontent.com/3829546/172542368-3ae79dda-4d2a-46bc-a205-f6de90febe8c.png";>
       So I checked those files, and found they were exactly as the previous 
paragraph:
       <img width="1627" alt="image" 
src="https://user-images.githubusercontent.com/3829546/172543858-5c800cdb-a785-4053-a715-5e617908b37f.png";>
   2. In the stacktrace log, there's a full schema converted by avro, which is 
       ```json
       {
         "type" : "record",
         "name" : "update_null_test_cow_record",
         "namespace" : "hoodie.update_null_test_cow",
         "fields" : [ {
           "name" : "_hoodie_commit_time",
           "type" : [ "null", "string" ],
           "doc" : "",
           "default" : null
         }, {
           "name" : "_hoodie_commit_seqno",
           "type" : [ "null", "string" ],
           "doc" : "",
           "default" : null
         }, {
           "name" : "_hoodie_record_key",
           "type" : [ "null", "string" ],
           "doc" : "",
           "default" : null
         }, {
           "name" : "_hoodie_partition_path",
           "type" : [ "null", "string" ],
           "doc" : "",
           "default" : null
         }, {
           "name" : "_hoodie_file_name",
           "type" : [ "null", "string" ],
           "doc" : "",
           "default" : null
         }, {
           "name" : "id",
           "type" : [ "null", "long" ],
           "default" : null
         }, {
           "name" : "name",
           "type" : [ "null", "string" ],
           "default" : null
         }, {
           // 'note' column is not union type, and has no default `null` value
           "name" : "note",
           "type" : "string"
         }, {
           "name" : "ts",
           "type" : [ "null", "long" ],
           "default" : null
         }, {
           "name" : "dt",
           "type" : [ "null", "string" ],
           "default" : null
         } ]
       }
       ```
   
   I think this auto-generated schema is the direct reason for this failure.
   
   So how can I fix it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to