[GitHub] [flink] galenwarren commented on pull request #14875: [FLINK-11838][flink-gs-fs-hadoop] Add GCS FileSystem with RecoverableWriter

GitBox Fri, 05 Feb 2021 01:52:32 -0800


galenwarren commented on pull request #14875:
URL: https://github.com/apache/flink/pull/14875#issuecomment-773500994

I have a few questions for whomever might work with me on this:
- GoogleHadoopFilesystem is implemented in
[gcs-connector](https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector);
this is the same connector that can already be used in Flink to use the
```gs``` scheme for, say, checkpoints. The guidance I've seen elsewhere is that
this jar file should be copied directly into the Flink lib folder and not
pulled directly into a job jar file. So, I've currently added this dependency
to ```flink-gs-fs-hadoop``` with ```provided``` scope. Does this make sense, or
should I use ```compiled``` scope?
- Google's WriteChannel doesn't support flushing or syncing, so the
```flush``` and ```sync``` methods of
[RecoverableFsDataOutputStream](https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/core/fs/RecoverableFsDataOutputStream.html)
are noops. This strikes me as ok because WriteChannel *does* properly support
capture/restore of the channel state, with persistence guarantees, which is
used to support persist/recover in the recoverable writer. But I thought I'd
mention it in case I have that wrong ...
- I took a crack at updating the docs in file_sink.md and streaming_sink.md
in /docs/dev/connectors, to indicate Google Storage support. I see that both of
those files have Chinese translations; I'd need some help there. :)
- I may need a brand-new docs page to describe the new Flink options
(described above), but I wasn't sure where to put that?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] galenwarren commented on pull request #14875: [FLINK-11838][flink-gs-fs-hadoop] Add GCS FileSystem with RecoverableWriter

Reply via email to