[ https://issues.apache.org/jira/browse/BEAM-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Udi Meiri updated BEAM-4062: ---------------------------- Fix Version/s: 2.5.0 > Performance regression in FileBasedSink > --------------------------------------- > > Key: BEAM-4062 > URL: https://issues.apache.org/jira/browse/BEAM-4062 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Reporter: Udi Meiri > Assignee: Udi Meiri > Priority: Blocker > Fix For: 2.5.0 > > > [https://github.com/apache/beam/pull/4648] has added: > * 3 or more stat() calls per output file (in pre_finalize and > finalize_writes) > * serial unbatched delete()s (in pre_finalize) > Solution will be to list files in a batch operation (match()), and to > delete() in batch mode, or use multiple threads if that's not possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)