Hey Vivek, The file_path per spec is technically just a string, but the representation is expected to be a URI.
How this URI is interpreted is really up to the FileIO implementation. So for example, the most common FileIO implementation is probably HadoopFileIO, which is going to use whatever file system scheme mapping you've defined in your configuration (typically via core-site.xml). For the Azure case (I'm not very familiar with this), it looks like AdlFileSystem is the Hadoop FileSystem implementation. So, if you map wasb -> AdlFileSystem, then you would want to use the URI format you described. There are more custom FileIO implementations (like S3FileIO), that are more specific about URI representations, but HadoopFileIO approach is probably more common at this point and relies on how Hadoop will resolve the URI. The only other thing I would note is that at this point the paths still need to be fully qualified (though there are some discussions ongoing about relative paths). Hope that helps, -Dan On Thu, May 13, 2021 at 5:30 AM Vivekanand Vellanki <[email protected]> wrote: > Hi, > > We are trying to create Iceberg tables on ADLS. What is the format for > referencing data files in ADLS from Manifest files? > > We are seeing Spark use something like: > wasb://<container>@account/<file path> > > Is there a standard for how data files should be referenced within > manifest files? > > Thanks > Vivek > >
