MartijnVisser opened a new pull request, #28286:
URL: https://github.com/apache/flink/pull/28286

   ## What is the purpose of the change
   
   `flink-gs-fs-hadoop` bundled `google-cloud-storage` 2.29.1, which throws a
   `NullPointerException` instead of retrying certain GCS `503 Service 
Unavailable`
   errors during resumable uploads. This breaks checkpointing for jobs that 
write to
   `gs://` through a `RecoverableWriter` (e.g. the `FileSink`). The upstream 
fix is in
   
[googleapis/java-storage#2987](https://github.com/googleapis/java-storage/pull/2987),
   which is included in newer releases of the library.
   
   This PR takes over the stale #27679 (thanks @jonchase) and completes it by 
also
   regenerating the bundled-dependency `NOTICE`, which is required because this 
module
   shades all of its dependencies into the plugin jar.
   
   ## Brief change log
   
     - Bump `google-cloud-storage` 2.29.1 -> 2.68.0 (latest stable) and the 
matching
       grpc artifacts 1.59.1 -> 1.81.0 in `flink-gs-fs-hadoop/pom.xml`.
     - Regenerate `META-INF/NOTICE` to match the bundled dependency set 
produced by the
       shade plugin (verified to match exactly via `tools/ci/license_check.sh`).
     - Add the bundled license file for the newly bundled 
`org.codehaus.woodstox:stax2-api`.
     - Update the `google-cloud-storage` version link in the GCS filesystem docs
       (English + Chinese).
   
   ## Verifying this change
   
   This change is covered by the existing `flink-gs-fs-hadoop` tests (236 tests 
pass,
   including `GSRecoverableWriterTest`, the committer/serializer tests, and the
   `LocalStorageHelper`-based `GSBlobStorageImplTest`, confirming the pinned
   `google-cloud-nio` test dependency stays compatible).
   
   In addition, the license/NOTICE was validated locally with
   `tools/ci/license_check.sh` (no severe issues; the `NOTICE` matches the 98 
bundled
   dependencies exactly) and the `dependency-convergence` and 
`ban-unsafe-jackson`
   enforcers pass under `-Pcheck-convergence`.
   
   Finally, the upgraded SDK was manually verified against a real GCS bucket: a
   `RecoverableWriter` wrote a multi-chunk object, took a checkpoint via 
`persist()`,
   recovered from that checkpoint via `recover()`, committed, and the committed 
object
   was read back and asserted byte-for-byte (exactly-once), with no NPE/503 
failure.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): **yes** (upgrades
       `google-cloud-storage` and grpc, with corresponding `NOTICE`/license 
updates)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components),
       Checkpointing, Kubernetes/Yarn, ZooKeeper: **yes** (improves reliability 
of GCS
       `RecoverableWriter` checkpoint/recovery by fixing 503 retry handling)
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes (please specify the tool below)
   
   Generated-by: Claude Code (Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to