Leo zhang created HUDI-4459:
-------------------------------
Summary: Corrupt parquet file created when syncing huge table with
4000+ fields,using hudi cow table with bulk_insert type
Key: HUDI-4459
URL: https://issues.apache.org/jira/browse/HUDI-4459
Project: Apache Hudi
Issue Type: Bug
Reporter: Leo zhang
Attachments: statements.sql, table.ddl
I am trying to sync a huge table with 4000+ fields into hudi, using cow table
with bulk_insert operate type.
The job can finished without any exception,but when I am trying to read data
from the table,I get empty result.The parquet file is corrupted, can't be read
correctly.
I had tried to trace the problem, and found it was coused by SortOperator.
After the record is serialized in the sorter, all the field get disorder and is
deserialized into one field.And finally the wrong record is written into
parquet file,and make the file unreadable.
Here's a few step to reproduce the bug ine the flink sql-client:
1、execute the table ddl(provided in the table.ddl file in the attachments)
2、execute the insert statement (provided in the statement.sql file in the
attachments)
3、execute a select statement to query hudi table (provided in the
statement.sql file in the attachments)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)