Sam Whittle created BEAM-12818:
----------------------------------
Summary: When writing to GCS, spread prefix of temporary files and
reuse autoscaling of the temporary directory
Key: BEAM-12818
URL: https://issues.apache.org/jira/browse/BEAM-12818
Project: Beam
Issue Type: Bug
Components: io-java-gcp
Reporter: Sam Whittle
Assignee: Sam Whittle
When writing files using FileIO, the given temporary directory has a
subdirectory created in it for each FileBasedSink. This is useful for
non-windowed output where the temporary directory can be matched to delete
leftover files that were lost during processing.
However for windowed writes such subdirectories are unnecessary and cause a
common prefix to be shared for the temporary files. Additionally this common
prefix varies per job and thus the autoscaling for the previous prefix is no
longer effective, see
https://cloud.google.com/storage/docs/request-rate#randomness_after_sequential_prefixes_is_not_as_effective
--
This message was sent by Atlassian Jira
(v8.3.4#803005)