[ 
https://issues.apache.org/jira/browse/HUDI-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-4459:
--------------------------------

    Assignee: Rajesh Mahindra  (was: Danny Chen)

> Corrupt parquet file created when syncing huge table with 4000+ fields,using 
> hudi cow table with bulk_insert type
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4459
>                 URL: https://issues.apache.org/jira/browse/HUDI-4459
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Leo zhang
>            Assignee: Rajesh Mahindra
>            Priority: Major
>         Attachments: statements.sql, table.ddl
>
>
> I am trying to sync a huge table with 4000+ fields into hudi, using cow table 
> with bulk_insert  operate type.
> The job can finished without any exception,but when I am trying to read data 
> from the table,I get empty result.The parquet file is corrupted, can't be 
> read correctly. 
> I had tried to  trace the problem, and found it was caused by SortOperator. 
> After the record is serialized in the sorter, all the field get disorder and 
> is deserialized into one field.And finally the wrong record is written into 
> parquet file,and make the file unreadable.
> Here's a few steps to reproduce the bug in the flink sql-client:
> 1、execute the table ddl(provided in the table.ddl file  in the attachments)
> 2、execute the insert statement (provided in the statement.sql file  in the 
> attachments)
> 3、execute a select statement to query hudi table  (provided in the 
> statement.sql file  in the attachments)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to