[
https://issues.apache.org/jira/browse/HUDI-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-4893:
--------------------------------------
Sprint: (was: 2022/09/19)
> More than 1 splits are created for a single log file for MOR table
> ------------------------------------------------------------------
>
> Key: HUDI-4893
> URL: https://issues.apache.org/jira/browse/HUDI-4893
> Project: Apache Hudi
> Issue Type: Bug
> Components: reader-core
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Blocker
> Fix For: 0.12.1
>
>
> While debugging a flaky test, realized that we are generating more than 1
> split for one log file itself. Root caused it to isSpllitable() that returns
> true for HoodieRealTimePath.
>
> [https://github.com/apache/hudi/blob/6dbe2960f2eaf0408dc0ef544991cad0190050a9/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java#L91]
>
> I made a quick fix locally and verified that only one split is generated per
> log file.
>
> {code:java}
> git diff
> hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java
> diff --git
> a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java
>
> b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java
> index bba44d5c66..d09dfdf753 100644
> ---
> a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java
> +++
> b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java
> @@ -89,7 +89,7 @@ public class HoodieRealtimePath extends Path {
> }
>
> public boolean isSplitable() {
> - return !toString().isEmpty() && !includeBootstrapFilePath();
> + return !toString().contains(".log") && !includeBootstrapFilePath();
> }
>
> public PathWithBootstrapFileStatus getPathWithBootstrapFileStatus() {
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)