Vivek, I think that it could make sense to have a FileIO implementation that can delegate depending on the URI scheme. Would you like to write up a proposal for that?
This would be useful for migrations because you could mix URIs that are handled by different FileIO implementations. On Tue, May 18, 2021 at 10:23 AM Jack Ye <[email protected]> wrote: > You can read > https://iceberg.apache.org/custom-catalog/#custom-file-io-implementation > for more details of loading your custom FileIO, and see > http://iceberg.apache.org/aws/#s3-fileio as an example. > -Jack > > On Tue, May 18, 2021 at 10:16 AM Vivekanand Vellanki <[email protected]> > wrote: > >> Is it possible to make the FileIO implementation extensible for a schema? >> >> For e.g. for schema hdfs://, can I ensure that Iceberg uses my custom >> implementation of FileIO at run time? >> >> On Tue, May 18, 2021 at 9:45 PM Daniel Weeks <[email protected]> wrote: >> >>> Hey Vivek, >>> >>> The file_path per spec is technically just a string, but the >>> representation is expected to be a URI. >>> >>> How this URI is interpreted is really up to the FileIO implementation. >>> So for example, the most common FileIO implementation is probably >>> HadoopFileIO, which is going to use whatever file system scheme mapping >>> you've defined in your configuration (typically via core-site.xml). >>> >>> For the Azure case (I'm not very familiar with this), it looks like >>> AdlFileSystem is the Hadoop FileSystem implementation. So, if you map wasb >>> -> AdlFileSystem, then you would want to use the URI format you described. >>> >>> There are more custom FileIO implementations (like S3FileIO), that are >>> more specific about URI representations, but HadoopFileIO approach is >>> probably more common at this point and relies on how Hadoop will resolve >>> the URI. >>> >>> The only other thing I would note is that at this point the paths still >>> need to be fully qualified (though there are some discussions ongoing about >>> relative paths). >>> >>> Hope that helps, >>> -Dan >>> >>> >>> >>> On Thu, May 13, 2021 at 5:30 AM Vivekanand Vellanki <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> We are trying to create Iceberg tables on ADLS. What is the format for >>>> referencing data files in ADLS from Manifest files? >>>> >>>> We are seeing Spark use something like: >>>> wasb://<container>@account/<file path> >>>> >>>> Is there a standard for how data files should be referenced within >>>> manifest files? >>>> >>>> Thanks >>>> Vivek >>>> >>>> -- Ryan Blue
