Hi Mayur, I know many object storage services have allowed communication using the Amazon S3 client by implementing the same protocol, like recently the Dell EMC ECS and Aliyun OSS. But ultimately there are functionality differences that could be optimized with a native FileIO, and the 2 examples I listed before both contributed their own FileIO implementations to Iceberg recently. I would imagine some native S3 features like ACL or SSE to not work for GCS, and some GCS features to be not supported in S3FileIO, so I think a specific GCS FileIO would likely be better for GCS support in the long term.
Could you describe how you configure S3FileIO to talk to GCS? Do you need to override the S3 endpoint or have any other configurations? And I am not an expert of GCS, do you see using S3FileIO for GCS as a feasible long-term solution? Are there any GCS specific features that you might need and could not be done through S3FileIO, and how widely used are those features? Best, Jack Ye On Wed, Dec 1, 2021 at 8:50 AM Daniel Weeks <daniel.c.we...@gmail.com> wrote: > The S3FileIO does use the AWS S3 V2 Client libraries and while there > appears to be some level of compatibility, it's not clear to me how far > that currently extends (some AWS features like encryption, IAM, etc. may > not have full support). > > I think it's great that there may be a path for more native GCS FileIO > support, but it might be a little early to rename the classes and except > that everything will work cleanly. > > Thanks for pointing this out, Mayur. It's really an interesting > development. > > -Dan > > On Wed, Dec 1, 2021 at 8:12 AM Piotr Findeisen <pi...@starburstdata.com> > wrote: > >> if S3FileIO is supposed to be used with other file systems, we should >> consider proper class renames. >> just my 2c >> >> On Wed, Dec 1, 2021 at 5:07 PM Mayur Srivastava < >> mayur.srivast...@twosigma.com> wrote: >> >>> Hi, >>> >>> >>> >>> We are using S3FileIO to talk to the GCS backend. GCS URIs are >>> compatible with the AWS S3 SDKs and if they are added to the list of >>> supported prefixes, they work with S3FileIO. >>> >>> >>> >>> Thanks, >>> >>> Mayur >>> >>> >>> >>> *From:* Piotr Findeisen <pi...@starburstdata.com> >>> *Sent:* Wednesday, December 1, 2021 10:58 AM >>> *To:* Iceberg Dev List <dev@iceberg.apache.org> >>> *Subject:* Re: Supporting gs:// prefix in S3URI for Google Cloud S3 >>> Storage >>> >>> >>> >>> Hi >>> >>> >>> >>> Just curious. S3URI seems aws s3-specific. What would be the goal of >>> using S3URI with google cloud storage urls? >>> >>> what problem are we solving? >>> >>> >>> >>> PF >>> >>> >>> >>> >>> >>> On Wed, Dec 1, 2021 at 4:56 PM Russell Spitzer < >>> russell.spit...@gmail.com> wrote: >>> >>> Sounds reasonable to me if they are compatible >>> >>> >>> >>> On Wed, Dec 1, 2021 at 8:27 AM Mayur Srivastava < >>> mayur.srivast...@twosigma.com> wrote: >>> >>> Hi, >>> >>> >>> >>> We have URIs starting with gs:// representing objects on GCS. Currently, >>> S3URI doesn’t support gs:// prefix (see >>> https://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java#L41). >>> Is there an existing JIRA for supporting this? Any objections to add “gs” >>> to the list of S3 prefixes? >>> >>> >>> >>> Thanks, >>> >>> Mayur >>> >>> >>> >>>