mccheah opened a new issue #12: File I/O Submodule for TableOperations
URL: https://github.com/apache/incubator-iceberg/issues/12
 
 
   In https://github.com/Netflix/iceberg/issues/107 it was discussed that 
`InputFile` and `OutputFile` instances should be pluggable. We discussed the 
fact that provision of `InputFile` and `OutputFile` instances should be handled 
by the `TableOperations` API. However, the Spark data source in particular only 
uses `HadoopInputFile#fromPath` for reading and `HadoopOutputFile#fromPath` for 
writing. Using `TableOperations#newInputFile` and 
`TableOperations#newOutputFile`, would also be difficult because calling these 
methods on the executors would require `TableOperations` instances to be 
`Serializable`.
   
   We propose having the `TableOperations` API provide a `FileIO` module that 
handles the narrow role of reading, creating / writing, and deleting files. We 
propose the following:
   
   ```
   interface FileIO {
     InputFile newInputFile(String path);
     OutputFile newOutputFile(String path);
     void deleteFile(String path);
   }
   ```
   
   Then the following method would be added to `TableOperations`, and we would 
remove `TableOperations#newInputFile` and `TableOperations#newMetadataFile`.
   
   ```
   interface TableOperations {
     FileIO fileIo();
     String resolveNewMetadataPath(String metadataFilename);
   }
   ```
   
   The need for `resolveNewMetadataPath` is because the new `FileIO` 
abstraction considers all locations as full paths, but the old method 
`TableOperations#newMetadataFile` assumes the argument is a file name, not a 
full path. Therefore now callers that used to call 
`TableOperations#newMetadataFile` should first retrieve the full path and then 
pass that along to `FileIO#newOutputFile`. For convenience we could add a 
helper default method like so:
   
   ```
   interface TableOperations {
     FileIO fileIo();
     String resolveNewMetadataPath(String metadataFilename);
     default OutputFile newMetadataFile(String fileName) {
       return fileIo().newOutputFile(resolveMetadataPath(fileName));
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to