mccheah commented on a change in pull request #73: Allow data output streams to be generated via custom mechanisms when given partitioning and file name URL: https://github.com/apache/incubator-iceberg/pull/73#discussion_r246999702
########## File path: api/src/main/java/com/netflix/iceberg/io/FileIO.java ########## @@ -47,4 +46,30 @@ * Delete the file at the given path. */ void deleteFile(String path); + + /** + * Get an {@link InputFile} to get the bytes for this table's metadata file with the given name. + */ + InputFile readMetadataFile(String fileName); + + /** + * Get an {@link OutputFile} to write bytes for a new table metadata file with the given name. + */ + OutputFile newMetadataFile(String fileName); + + /** + * Get an {@link OutputFile} for writing bytes to a new data file for this table. + * <p> + * The partition values of the rows in this file may be used to derive the final location of + * the file. + */ + OutputFile newPartitionedDataFile( Review comment: I was thinking about this a little bit more, and in contrast to what was discussed on the ticket, I propose creating the `OutputFile` instance directly instead of just returning the path alone. This comes up in the case where the file doesn't exist yet, which is always going to be the case for Spark (Spark is always creating new data files). In such a case, it is desirable for the plugin to not necessarily only know what the path is, but also to perhaps create the path itself, oftentimes by asking for some new entry in an external system that is a reference to this file we're about to write to. In other words, the semantics of this API are that we're not only asking for the location of the data file, but we're also asking to create the data file as part of this action. In practice most of the implementations should just derive this as `newOutputFile(getPath(args))`. Nevertheless I think this version of the API is more rich. Thoughts? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org