You can read
https://iceberg.apache.org/custom-catalog/#custom-file-io-implementation
for more details of loading your custom FileIO, and see
http://iceberg.apache.org/aws/#s3-fileio as an example.
-Jack

On Tue, May 18, 2021 at 10:16 AM Vivekanand Vellanki <[email protected]>
wrote:

> Is it possible to make the FileIO implementation extensible for a schema?
>
> For e.g. for schema hdfs://, can I ensure that Iceberg uses my custom
> implementation of FileIO at run time?
>
> On Tue, May 18, 2021 at 9:45 PM Daniel Weeks <[email protected]> wrote:
>
>> Hey Vivek,
>>
>> The file_path per spec is technically just a string, but the
>> representation is expected to be a URI.
>>
>> How this URI is interpreted is really up to the FileIO implementation.
>> So for example, the most common FileIO implementation is probably
>> HadoopFileIO, which is going to use whatever file system scheme mapping
>> you've defined in your configuration (typically via core-site.xml).
>>
>> For the Azure case (I'm not very familiar with this), it looks like
>> AdlFileSystem is the Hadoop FileSystem implementation.  So, if you map wasb
>> -> AdlFileSystem, then you would want to use the URI format you described.
>>
>> There are more custom FileIO implementations (like S3FileIO), that are
>> more specific about URI representations, but HadoopFileIO approach is
>> probably more common at this point and relies on how Hadoop will resolve
>> the URI.
>>
>> The only other thing I would note is that at this point the paths still
>> need to be fully qualified (though there are some discussions ongoing about
>> relative paths).
>>
>> Hope that helps,
>> -Dan
>>
>>
>>
>> On Thu, May 13, 2021 at 5:30 AM Vivekanand Vellanki <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> We are trying to create Iceberg tables on ADLS. What is the format for
>>> referencing data files in ADLS from Manifest files?
>>>
>>> We are seeing Spark use something like:
>>> wasb://<container>@account/<file path>
>>>
>>> Is there a standard for how data files should be referenced within
>>> manifest files?
>>>
>>> Thanks
>>> Vivek
>>>
>>>

Reply via email to