Is it possible to make the FileIO implementation extensible for a schema? For e.g. for schema hdfs://, can I ensure that Iceberg uses my custom implementation of FileIO at run time?
On Tue, May 18, 2021 at 9:45 PM Daniel Weeks <[email protected]> wrote: > Hey Vivek, > > The file_path per spec is technically just a string, but the > representation is expected to be a URI. > > How this URI is interpreted is really up to the FileIO implementation. So > for example, the most common FileIO implementation is probably > HadoopFileIO, which is going to use whatever file system scheme mapping > you've defined in your configuration (typically via core-site.xml). > > For the Azure case (I'm not very familiar with this), it looks like > AdlFileSystem is the Hadoop FileSystem implementation. So, if you map wasb > -> AdlFileSystem, then you would want to use the URI format you described. > > There are more custom FileIO implementations (like S3FileIO), that are > more specific about URI representations, but HadoopFileIO approach is > probably more common at this point and relies on how Hadoop will resolve > the URI. > > The only other thing I would note is that at this point the paths still > need to be fully qualified (though there are some discussions ongoing about > relative paths). > > Hope that helps, > -Dan > > > > On Thu, May 13, 2021 at 5:30 AM Vivekanand Vellanki <[email protected]> > wrote: > >> Hi, >> >> We are trying to create Iceberg tables on ADLS. What is the format for >> referencing data files in ADLS from Manifest files? >> >> We are seeing Spark use something like: >> wasb://<container>@account/<file path> >> >> Is there a standard for how data files should be referenced within >> manifest files? >> >> Thanks >> Vivek >> >>
