Re: Iceberg table format

Atri Sharma Mon, 11 Dec 2017 20:52:04 -0800

Thanks.

Sorry for the brevity, I was on vacation and sending emails from my phone.


My main idea there was that since the proposed architecture is not
tied to Parquet in any manner, we can go ahead and allow other file
formats to hook in using an API.

Can you please share the code and spec so that we can start thinking
about extending it (specifically around deletes?)

Regards,

Atri

On Tue, Dec 12, 2017 at 12:55 AM, Ryan Blue <[email protected]> wrote:
> On Sat, Dec 9, 2017 at 3:38 AM, Atri Sharma <[email protected]> wrote:
>>
>> Thanks for the specification.
>>
>> A couple of questions:
>>
>> 1) what does this to parquet and not to any underlying store?
>> 2) If above is not true, can we expose an interface to install any
>> underlying file format?
>> 3) if we are defining snapshots, can we allow MVCC on top of the
>> snapshots?
>>
>> To elaborate on 3) I would like to see a full
>> set transactional file Format present which allows us to be generic and
>> performant.
>>
>> I would be interested in doing a specification for update in this format.
>> Can you please share the repository link and some internale documents to
>> Understand a bit more?
>
>
> The format uses a form of MVCC to ensure readers always use a consistent
> snapshot of the table without blocking writers. New versions show up
> atomically.
>
> However, this doesn't use a "transactional file format". It uses an atomic
> operation to replace a table's current metadata using immutable files. This
> is necessary for compatibility with file systems like HDFS and S3 where the
> files must be stored.
>
> I'm not sure what you mean by questions 1 or 2. Parquet is a data file
> format used in Iceberg tables. Avro is also allowed so that the tables
> support a write-optimized format (Avro) and a read-optimized format
> (Parquet). File formats are tracked on a per-file basis.
>
> rb
>



-- 
Regards,

Atri
l'apprenant

Re: Iceberg table format

Reply via email to