clairemcginty opened a new pull request, #33368:
URL: https://github.com/apache/beam/pull/33368

   Rationale:
   
   I would like to use 
[gcs-connector](https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/master/gcs)
 3.x, which supports the new Parquet VectorIO feature. However, gcs-connector 
3.x also drops Java 8 and targets Java 11, which blocks us from upgrading it 
directly in Beam, since Beam is still targeting 8 (see 
https://github.com/apache/beam/pull/31898#issuecomment-2229819968).
   
   Additionally, as a Beam user, I can't just upgrade gcs-connector on my end, 
due to breaking changes in how `GoogleCloudStorageImpl` is instantiated: in 2.x 
it has [public 
constructors](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.26/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageImpl.java#L302-L394),
 but in 3.x it [drops the public constructors and enforces a Builder 
pattern](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v3.0.4/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageImpl.java#L2496-L2517).
 
   
   Therefore, when running on gcs-connector 3.x, my pipeline throws a 
NoSuchMethodError from `GcsUtil` when it tries to invoke the 2.x constructor: 
https://github.com/apache/beam/blob/v2.61.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L727
   
   This PR adds a pipeline option for a GoogleCloudStorage Provider, so that 
users who want to use gcs-connector 3.x can be unblocked from doing so. It 
defaults to invoking the gcs-connector 2.x public constructor, but 3.x users 
can override it to use the Builder.
   
   ------------------------
   
   GitHub Actions Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to