I think having a single, default, auto-created temporary bucket per
project for use in GCP (when running on Dataflow, or running elsewhere
but using GCS such as for this BQ load files example), though not
ideal, is the best user experience. If we don't want to be
automatically creating such things for users by default, another
option would be a single flag that opts-in to such auto-creation
(which could include other resources in the future).

On Tue, Jul 23, 2019 at 1:08 AM Pablo Estrada <pabl...@google.com> wrote:
>
> Hello all,
> I recently worked on a transform to load data into BigQuery by writing files 
> to GCS, and issuing Load File jobs to BQ. I did this for the Python SDK[1].
>
> This option requires the user to provide a GCS bucket to write the files:
>
> If the user provides a bucket to the transform, the SDK will use that bucket.
> If the user does not provide a bucket:
>
> When running in Dataflow, the SDK will borrow the temp_location of the 
> pipeline.
> When running in other runners, the pipeline will fail.
>
> The Java SDK has had functionality for File Loads into BQ for a long time; 
> and particularly, when users do not provide a bucket, it attempts to create a 
> default bucket[2]; and this bucket is used as temp_location (which then is 
> used by the BQ File Loads transform).
>
> I do not really like creating GCS buckets on behalf of users. In Java, the 
> outcome is that users will not have to pass a --tempLocation parameter when 
> submitting jobs to Dataflow - which is a nice convenience, but I'm not sure 
> that this is in-line with users' expectations.
>
> Currently, the options are:
>
> Adding support for bucket autocreation for Python SDK
> Deprecating support for bucket autocreation in Java SDK, and printing a 
> warning.
>
> I am personally inclined for #1. But what do others think?
>
> Best
> -P.
>
> [1] https://github.com/apache/beam/pull/7892
> [2] 
> https://github.com/apache/beam/blob/5b3807be717277e3e6880a760b036fecec3bc95d/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L294-L343

Reply via email to