[
https://issues.apache.org/jira/browse/HUDI-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425904#comment-17425904
]
Vinoth Chandar commented on HUDI-2363:
--------------------------------------
I think these are long fixed in recent releases. 0.7.0 IIRC. are you able to
try out newer versions
> COW : Listing leaf files and directories twice
> ----------------------------------------------
>
> Key: HUDI-2363
> URL: https://issues.apache.org/jira/browse/HUDI-2363
> Project: Apache Hudi
> Issue Type: Bug
> Components: Writer Core
> Reporter: selvaraj
> Priority: Major
> Attachments: Screen Shot 2021-08-25 at 5.36.52 PM.png
>
>
> Team,
> In our organization we are still using Hudi 0.5.0. We would upgrade to the
> latest version in couple of quarters.
> problem scenario :
> Many use cases in our project using COW and hive sync is disabled. One of
> the Hudi contains two years worth of data , which are partitioned by date.
> For every write on this table, i notice that Listing leaf files and
> directories job triggered twice. Normally it is triggered only once. Attache
> the screenshot.
>
> once the first listing leaf files and directories are done then another
> listing of leaf files and directories logs are rolled.
> I spent some time in investigating the source code but couldn't trace where
> exactly it is being invoked .
>
> How can it be avoided here? Unfortunately this one is adding up more latency
> in our flow.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)