Vivek, I think that it could make sense to have a FileIO implementation
that can delegate depending on the URI scheme. Would you like to write up a
proposal for that?

This would be useful for migrations because you could mix URIs that are
handled by different FileIO implementations.

On Tue, May 18, 2021 at 10:23 AM Jack Ye <[email protected]> wrote:

> You can read
> https://iceberg.apache.org/custom-catalog/#custom-file-io-implementation
> for more details of loading your custom FileIO, and see
> http://iceberg.apache.org/aws/#s3-fileio as an example.
> -Jack
>
> On Tue, May 18, 2021 at 10:16 AM Vivekanand Vellanki <[email protected]>
> wrote:
>
>> Is it possible to make the FileIO implementation extensible for a schema?
>>
>> For e.g. for schema hdfs://, can I ensure that Iceberg uses my custom
>> implementation of FileIO at run time?
>>
>> On Tue, May 18, 2021 at 9:45 PM Daniel Weeks <[email protected]> wrote:
>>
>>> Hey Vivek,
>>>
>>> The file_path per spec is technically just a string, but the
>>> representation is expected to be a URI.
>>>
>>> How this URI is interpreted is really up to the FileIO implementation.
>>> So for example, the most common FileIO implementation is probably
>>> HadoopFileIO, which is going to use whatever file system scheme mapping
>>> you've defined in your configuration (typically via core-site.xml).
>>>
>>> For the Azure case (I'm not very familiar with this), it looks like
>>> AdlFileSystem is the Hadoop FileSystem implementation.  So, if you map wasb
>>> -> AdlFileSystem, then you would want to use the URI format you described.
>>>
>>> There are more custom FileIO implementations (like S3FileIO), that are
>>> more specific about URI representations, but HadoopFileIO approach is
>>> probably more common at this point and relies on how Hadoop will resolve
>>> the URI.
>>>
>>> The only other thing I would note is that at this point the paths still
>>> need to be fully qualified (though there are some discussions ongoing about
>>> relative paths).
>>>
>>> Hope that helps,
>>> -Dan
>>>
>>>
>>>
>>> On Thu, May 13, 2021 at 5:30 AM Vivekanand Vellanki <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are trying to create Iceberg tables on ADLS. What is the format for
>>>> referencing data files in ADLS from Manifest files?
>>>>
>>>> We are seeing Spark use something like:
>>>> wasb://<container>@account/<file path>
>>>>
>>>> Is there a standard for how data files should be referenced within
>>>> manifest files?
>>>>
>>>> Thanks
>>>> Vivek
>>>>
>>>>

-- 
Ryan Blue

Reply via email to