Leo zhang created HUDI-4459:
-------------------------------

             Summary: Corrupt parquet file created when syncing huge table with 
4000+ fields,using hudi cow table with bulk_insert type
                 Key: HUDI-4459
                 URL: https://issues.apache.org/jira/browse/HUDI-4459
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Leo zhang
         Attachments: statements.sql, table.ddl

I am trying to sync a huge table with 4000+ fields into hudi, using cow table 
with bulk_insert  operate type.

The job can finished without any exception,but when I am trying to read data 
from the table,I get empty result.The parquet file is corrupted, can't be read 
correctly. 

I had tried to  trace the problem, and found it was coused by SortOperator. 
After the record is serialized in the sorter, all the field get disorder and is 
deserialized into one field.And finally the wrong record is written into 
parquet file,and make the file unreadable.


Here's a few step to reproduce the bug ine the flink sql-client:

1、execute the table ddl(provided in the table.ddl file  in the attachments)

2、execute the insert statement (provided in the statement.sql file  in the 
attachments)

3、execute a select statement to query hudi table  (provided in the 
statement.sql file  in the attachments)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to