Hello all,
I was thinking that a filesystem with support for s3 would be great to have
in the Python SDK. If I am not wrong, it would simply involve implementing
the filesystem classes
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystem.py>
with
s3, right?
I am not familiar enough with s3, nor with filesystems, nor with AWS in
general - but I have some outstanding questions:
- Does this mean that we probably would need an extra [s3] target for
installing apache_beam, like we do with [gcp]?
- Not strictly necessary, but probably desirable...
- How do we handle KMS in GCS filesystem?
- Would the filesystem encapsulation make KMS support in an s3
filesystem difficult?
- Or even more... is the KMS support in AWS very different than in GCP?
- I'd love comments from anyone informed around this : )
- Is this project of an appropriate size for a GSoC student?
Thoughts?
Best
-P.