galenwarren commented on pull request #14875: URL: https://github.com/apache/flink/pull/14875#issuecomment-773500994
I have a few questions for whomever might work with me on this: - GoogleHadoopFilesystem is implemented in [gcs-connector](https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector); this is the same connector that can already be used in Flink to use the ```gs``` scheme for, say, checkpoints. The guidance I've seen elsewhere is that this jar file should be copied directly into the Flink lib folder and not pulled directly into a job jar file. So, I've currently added this dependency to ```flink-gs-fs-hadoop``` with ```provided``` scope. Does this make sense, or should I use ```compiled``` scope? - Google's WriteChannel doesn't support flushing or syncing, so the ```flush``` and ```sync``` methods of [RecoverableFsDataOutputStream](https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/core/fs/RecoverableFsDataOutputStream.html) are noops. This strikes me as ok because WriteChannel *does* properly support capture/restore of the channel state, with persistence guarantees, which is used to support persist/recover in the recoverable writer. But I thought I'd mention it in case I have that wrong ... - I took a crack at updating the docs in file_sink.md and streaming_sink.md in /docs/dev/connectors, to indicate Google Storage support. I see that both of those files have Chinese translations; I'd need some help there. :) - I may need a brand-new docs page to describe the new Flink options (described above), but I wasn't sure where to put that? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
