Hey Vivek,

The file_path per spec is technically just a string, but the representation
is expected to be a URI.

How this URI is interpreted is really up to the FileIO implementation.  So
for example, the most common FileIO implementation is probably
HadoopFileIO, which is going to use whatever file system scheme mapping
you've defined in your configuration (typically via core-site.xml).

For the Azure case (I'm not very familiar with this), it looks like
AdlFileSystem is the Hadoop FileSystem implementation.  So, if you map wasb
-> AdlFileSystem, then you would want to use the URI format you described.

There are more custom FileIO implementations (like S3FileIO), that are more
specific about URI representations, but HadoopFileIO approach is probably
more common at this point and relies on how Hadoop will resolve the URI.

The only other thing I would note is that at this point the paths still
need to be fully qualified (though there are some discussions ongoing about
relative paths).

Hope that helps,
-Dan



On Thu, May 13, 2021 at 5:30 AM Vivekanand Vellanki <[email protected]>
wrote:

> Hi,
>
> We are trying to create Iceberg tables on ADLS. What is the format for
> referencing data files in ADLS from Manifest files?
>
> We are seeing Spark use something like:
> wasb://<container>@account/<file path>
>
> Is there a standard for how data files should be referenced within
> manifest files?
>
> Thanks
> Vivek
>
>

Reply via email to