clairemcginty opened a new pull request, #33368: URL: https://github.com/apache/beam/pull/33368
Rationale: I would like to use [gcs-connector](https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/master/gcs) 3.x, which supports the new Parquet VectorIO feature. However, gcs-connector 3.x also drops Java 8 and targets Java 11, which blocks us from upgrading it directly in Beam, since Beam is still targeting 8 (see https://github.com/apache/beam/pull/31898#issuecomment-2229819968). Additionally, as a Beam user, I can't just upgrade gcs-connector on my end, due to breaking changes in how `GoogleCloudStorageImpl` is instantiated: in 2.x it has [public constructors](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.26/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageImpl.java#L302-L394), but in 3.x it [drops the public constructors and enforces a Builder pattern](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v3.0.4/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageImpl.java#L2496-L2517). Therefore, when running on gcs-connector 3.x, my pipeline throws a NoSuchMethodError from `GcsUtil` when it tries to invoke the 2.x constructor: https://github.com/apache/beam/blob/v2.61.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L727 This PR adds a pipeline option for a GoogleCloudStorage Provider, so that users who want to use gcs-connector 3.x can be unblocked from doing so. It defaults to invoking the gcs-connector 2.x public constructor, but 3.x users can override it to use the Builder. ------------------------ GitHub Actions Tests Status (on master branch) ------------------------------------------------------------------------------------------------ [](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
