metadata api

Zoltán Borók-Nagy Tue, 08 Jun 2021 03:51:05 -0700

Hey Yong,

I've created a design doc about write support:
https://docs.google.com/document/d/1_KL0YptDKwhiXvJyx4Vb-yZjggrPQAW2yjeGV4C0vMU/edit


We don't have an upstream release of Impala that supports Iceberg, but you
can checkout and build Impala master:
https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala

The Iceberg support is still under development and the syntax may change,
see:
https://lists.apache.org/thread.html/re89c80c8218439a2a431fc4c0d2530522841c86858290a4bf36b9805%40%3Cdev.impala.apache.org%3E

Therefore we don't have user docs for it yet, but you can take a look at
our tests to see how you can create Iceberg tables with the current DDL:
https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
Please note that there are other *.test files as well for Iceberg in the
QueryTest directory.

I hope this helps.

Cheers,
   Zoltan


On Tue, Jun 8, 2021 at 4:07 AM yong.sunny <yong.su...@163.com> wrote:

> Hi Zoltan，
>
> looks like you had worked on iceberg integration with impala.  is there
> any doc to introduce how to run iceberg in impala so that iI can play?
> and wondering if there is any design doc?
>
> thanks and best regards
> Yong
>
>
>
> yong.sunny
> 邮箱yong.su...@163.com from phone
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=yong.sunny&uid=yong.sunny%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fwzpmmc%2Fb04ea4676f5ca1dc236a340a5d9d3031.jpg&items=%5B%22%E9%82%AE%E7%AE%B1yong.sunny%40163.com+from+phone%22%5D>
>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制
>
> On 05/27/2021 16:54, Zoltán Borók-Nagy <borokna...@cloudera.com> wrote:
> Hi Yong Yang,
>
> It is supported by Iceberg, and this is exactly how Impala is working.
> I.e. Impala's Parquet writer writes the data files, then we use Iceberg's
> API to append them to the table.
> You can find the relevant code here:
>
> https://github.com/apache/impala/blob/822e8373d1f1737865899b80862c2be7b07cc950/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java#L197-L271
>
> For inserting files we use Iceberg's AppendFiles class. For overwriting a
> table/partitions we use Iceberg's ReplacePartitions class.
>
> One important thing that you need to do during writing the Parquet data
> files is to fill the 'field_id
> <https://github.com/apache/parquet-format/blob/473a3a7710f992b01af79095757d71e1fc68ef62/src/main/thrift/parquet.thrift#L398>'
> member for each schema element, corresponding to the Iceberg Column ID.
>
> Cheers,
>     Zoltan
>
>
> On Thu, May 27, 2021 at 7:39 AM Peter Vary <pv...@cloudera.com.invalid>
> wrote:
>
>> Hi Yong Yang,
>>
>> Your message is ended up in my spam folder claiming that many messages
>> from @163.com are spam messages, but your question seems legitimate.
>>
>> With the Java API you can add Parquet
>> files to the Iceberg tables where the files conform to the specification.
>>
>> For Parquet, take a look here:
>> http://iceberg.apache.org/spec/#parquet
>>
>> For the Java API, take a look at here: https://iceberg.apache.org/api/
>>
>> Thanks, Peter
>>
>> On Wed, 19 May 2021, 18:44 yong.sunny, <yong.su...@163.com> wrote:
>>
>>> Hi Iceberg Devs,
>>>
>>> I am new to the Iceberg. And I have a question about the iceberg
>>> manifest/manifest list/metadata api.
>>> I am wondering if the following is supported:
>>> 1. parquet file is writen by other apps
>>> 2. use the APIes of iceberg to create manifest file/manifest
>>> list/metadata(snapshot). The applicant to do that would be loading an
>>> independent, that is not loaded flink or spark.
>>>
>>> Could you please tell me if that is supported, and if that is supported,
>>> which APIes should I use?
>>>
>>> If I post the question in wrong channel, please tell me which one I
>>> should use.
>>>
>>> Thanks and Best regards,
>>> Yong Yang
>>>
>>>
>>>
>>>
>>

Re: question about the iceberg manifest/manifest list/metadata api

Reply via email to