samredai opened a new pull request #3691:
URL: https://github.com/apache/iceberg/pull/3691


   This brings over the `FileIO` abstraction and includes an `S3FileIO` 
implementation. Implementing `FileIO` requires overriding the `__enter__()` and 
`__exit__()` methods where the `__enter__()` method sets a byte stream to 
`self.byte_stream`.
   
   There's been a few discussions lately around file io and hopefully, this PR 
helps continue that. I think maintaining the file io abstraction for all file 
io operations (metadata files, manifest lists, manifest files, and data files) 
and allowing the flexibility to plug in either an implementation that's 
packaged with the library or a custom implementation of FileIO that a user 
brings. An example of this pattern can be found in PR #3677 in the 
[from_file()](https://github.com/apache/iceberg/pull/3677/files#diff-0a2dcd19c1079e50703f2010f74a47309468503f3ddd753db3aca9951364810fR261)
 method.
   
   This still leaves an open question on how we manage dependencies for all of 
the implementations. For example, if a user does not plan on using `S3FileIO` 
or has their own s3 file io implementation that does not depend on boto3, it 
should not be forced as a hard dependency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to