deepakpanda93 commented on issue #14306:
URL: https://github.com/apache/hudi/issues/14306#issuecomment-3664648803

   @ROOBALJINDAL 
   I used the same sample code with your updated schema/input file. Below is 
the outcome.
   
   ```
   === Loaded schema ===
   
{"fields":[{"metadata":{},"name":"Services","nullable":true,"type":{"fields":[{"metadata":{},"name":"BasisCalculation","nullable":true,"type":{"fields":[{"metadata":{},"name":"Pay","nullable":true,"type":"double"}],"type":"struct"}}],"type":"struct"}},{"metadata":{},"name":"TotalPay","nullable":true,"type":{"fields":[{"metadata":{},"name":"Pay","nullable":true,"type":{"fields":[{"metadata":{},"name":"Basis","nullable":true,"type":{"fields":[{"metadata":{},"name":"_value","nullable":true,"type":"string"},{"metadata":{},"name":"defvalue","nullable":true,"type":"string"}],"type":"struct"}},{"metadata":{},"name":"Rate","nullable":true,"type":{"fields":[{"metadata":{},"name":"_value","nullable":true,"type":"double"},{"metadata":{},"name":"defvalue","nullable":true,"type":"string"}],"type":"struct"}}],"type":"struct"}}],"type":"struct"}}],"type":"struct"}
   
   25/12/17 10:06:02 INFO FileInputFormat: Total input files to process : 1
   === FIRST HUDI WRITE (should succeed) ===
   === SECOND HUDI WRITE (expected to FAIL) ===
   root
    |-- _hoodie_commit_time: string (nullable = true)
    |-- _hoodie_commit_seqno: string (nullable = true)
    |-- _hoodie_record_key: string (nullable = true)
    |-- _hoodie_partition_path: string (nullable = true)
    |-- _hoodie_file_name: string (nullable = true)
    |-- Services: struct (nullable = true)
    |    |-- BasisCalculation: struct (nullable = true)
    |    |    |-- Pay: double (nullable = true)
    |-- TotalPay: struct (nullable = true)
    |    |-- Pay: struct (nullable = true)
    |    |    |-- Basis: struct (nullable = true)
    |    |    |    |-- _value: string (nullable = true)
    |    |    |    |-- defvalue: string (nullable = true)
    |    |    |-- Rate: struct (nullable = true)
    |    |    |    |-- _value: double (nullable = true)
    |    |    |    |-- defvalue: string (nullable = true)
    |-- record_key: string (nullable = false)
   
   
+-------------------+---------------------+------------------+----------------------+------------------------------------------------------------------------+----------+----------------------------------+----------+
   |_hoodie_commit_time|_hoodie_commit_seqno 
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                    
                                   |Services  |TotalPay                         
 |record_key|
   
+-------------------+---------------------+------------------+----------------------+------------------------------------------------------------------------+----------+----------------------------------+----------+
   |20251217100606010  |20251217100606010_0_0|1                 |               
       
|b9d5ba29-0478-49c3-b8b4-a4595e4bf455-0_0-52-76_20251217100606010.parquet|{{1.0E15}}|{{{twenty,
 null}, {1.0E15, null}}}|1         |
   
+-------------------+---------------------+------------------+----------------------+------------------------------------------------------------------------+----------+----------------------------------+----------+
   ```
   
   I have checked the basepath as well. It has 2 parquet files for 2 commits.
   ```
   root@6ba4293d5eb5:/opt# ls -l /tmp/hudi_xml_bug_table
   total 856
   -rw-r--r-- 1 root root 436514 Dec 17 10:06 
b9d5ba29-0478-49c3-b8b4-a4595e4bf455-0_0-24-37_20251217100602427.parquet
   -rw-r--r-- 1 root root 436509 Dec 17 10:06 
b9d5ba29-0478-49c3-b8b4-a4595e4bf455-0_0-52-76_20251217100606010.parquet
   ```
   
   Please do review my sample code (available in the previous comment) and let 
me know in case any modification. I am using Spark 3.4 with Hudi0.14.
   ```
   spark-submit --packages 
org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0,com.databricks:spark-xml_2.12:0.17.0
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
 --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' 
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' 
repro_hudi_bug_xml.py
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to