[
https://issues.apache.org/jira/browse/HUDI-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583439#comment-17583439
]
Danny Chen commented on HUDI-4459:
----------------------------------
Thanks for taking up this issue, assigned to you [~rmahindra] :)
> Corrupt parquet file created when syncing huge table with 4000+ fields,using
> hudi cow table with bulk_insert type
> -----------------------------------------------------------------------------------------------------------------
>
> Key: HUDI-4459
> URL: https://issues.apache.org/jira/browse/HUDI-4459
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Leo zhang
> Assignee: Rajesh Mahindra
> Priority: Major
> Attachments: statements.sql, table.ddl
>
>
> I am trying to sync a huge table with 4000+ fields into hudi, using cow table
> with bulk_insert operate type.
> The job can finished without any exception,but when I am trying to read data
> from the table,I get empty result.The parquet file is corrupted, can't be
> read correctly.
> I had tried to trace the problem, and found it was caused by SortOperator.
> After the record is serialized in the sorter, all the field get disorder and
> is deserialized into one field.And finally the wrong record is written into
> parquet file,and make the file unreadable.
> Here's a few steps to reproduce the bug in the flink sql-client:
> 1、execute the table ddl(provided in the table.ddl file in the attachments)
> 2、execute the insert statement (provided in the statement.sql file in the
> attachments)
> 3、execute a select statement to query hudi table (provided in the
> statement.sql file in the attachments)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)