Sam Whittle created BEAM-12818:
----------------------------------

             Summary: When writing to GCS, spread prefix of temporary files and 
reuse autoscaling of the temporary directory
                 Key: BEAM-12818
                 URL: https://issues.apache.org/jira/browse/BEAM-12818
             Project: Beam
          Issue Type: Bug
          Components: io-java-gcp
            Reporter: Sam Whittle
            Assignee: Sam Whittle


When writing files using FileIO, the given temporary directory has a 
subdirectory created in it for each FileBasedSink.  This is useful for 
non-windowed output where the temporary directory can be matched to delete 
leftover files that were lost during processing.

However for windowed writes such subdirectories are unnecessary and cause a 
common prefix to be shared for the temporary files. Additionally this common 
prefix varies per job and thus the autoscaling for the previous prefix is no 
longer effective, see
https://cloud.google.com/storage/docs/request-rate#randomness_after_sequential_prefixes_is_not_as_effective



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to