[ 
https://issues.apache.org/jira/browse/HUDI-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391078#comment-17391078
 ] 

ASF GitHub Bot commented on HUDI-2247:
--------------------------------------

yuzhaojing commented on a change in pull request #3363:
URL: https://github.com/apache/hudi/pull/3363#discussion_r680428506



##########
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java
##########
@@ -131,7 +133,7 @@ public static void clean(String path) {
         })
         // filter out crushed files
         .filter(Objects::nonNull)
-        .filter(fileStatus -> fileStatus.getLen() > 0)
+        .filter(fileStatus -> fileStatus.getLen() > MAGIC.length)
         .collect(Collectors.toList());

Review comment:
       This only filter parquet file that footer not written, log file still 
filter by fileSize > 0 because we can't predict problems by file size.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Filter file where length less than parquet MAGIC length
> -------------------------------------------------------
>
>                 Key: HUDI-2247
>                 URL: https://issues.apache.org/jira/browse/HUDI-2247
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Flink Integration
>            Reporter: yuzhaojing
>            Assignee: yuzhaojing
>            Priority: Major
>              Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to