galenwarren commented on a change in pull request #15599: URL: https://github.com/apache/flink/pull/15599#discussion_r619678149
########## File path: flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/storage/BlobStorage.java ########## @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.fs.gs.storage; + +import com.google.cloud.storage.BlobId; Review comment: Good suggestions on naming, will do. Regarding BlobId, I thought about this, too, while writing the code. You're right, the only thing truly specific to google storage in this interface is BlobId, and the only parts we use of that are bucket name and object name, so an abstraction would certainly be possible. I'm not sure it this is where you're going with this, but if we did that, in principle the BlobStorage abstraction could be used to implement a recoverable writer over any storage that could support BlobStorage. Looking at the interface, it seems plausible that other bucket-based storages could support that, though I think I'd need to rethink how options are supplied (i.e. ```setChunkSize``` is somewhat google specific), and I'd probably want to add a method to generate a checksum from a byte array, so that the specifics of how a checksum is generated could be provider specific and not fixed (in ChecksumUtils). One option I considered was making BlobStorage a generic interface, with a BLOBID type parameter, constrained appropriately to extend an interface that exposes bucket and object name. That would be really easy to do, maybe I should go ahead do at least that part? That would get BlobId out of there. Whether it makes sense to abstract away *everything* google-specific (i.e. options, checksum) such that this could be used with non-google storage, I'll leave that up to you. That wouldn't be hard to do either, but probably not worth it unless it's realistic that this might be used elsewhere. Honestly, my primary reason for this interface was for testability. But if you think it's worthwhile to make the interface more general, it would be pretty straightforward and I'd be happy to make that change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
