Re: Iceberg table format

Ryan Blue Mon, 11 Dec 2017 10:03:07 -0800

The reason this came up was the discussion on how to delete data. Delete
markers don't really fit with the file format layer, but should go
somewhere. I think it makes sense to handle them at the table level.


Iceberg is a separate project, but one that a lot of people on this list
are probably interested in. The table format uses Avro and Parquet as file
formats to start with.

rb

On Fri, Dec 8, 2017 at 10:49 AM, rahul challapalli <
[email protected]> wrote:

> Very Interesting. I was wondering how this is related to parquet. Are we
> talking about an abstraction on top of parquet which allows capabilities
> like Versioning, Snapshotting etc? Or a completely different format (may be
> API compatible with parquet)?
>
> - Rahul
>
> On Fri, Dec 8, 2017 at 8:42 AM, Ryan Blue <[email protected]>
> wrote:
>
>> I'm working on getting the code out to our open source github org,
>> probably
>> early next week. I'll set up a mailing list for it as well.
>>
>> rb
>>
>> On Thu, Dec 7, 2017 at 6:38 PM, Jacques Nadeau <[email protected]>
>> wrote:
>>
>> > Sounds super interesting. Would love to collaborate on this. Do you
>> have a
>> > repo or mailing list where you are working on this?
>> >
>> >
>> >
>> > On Wed, Dec 6, 2017 at 4:20 PM, Ryan Blue <[email protected]>
>> > wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> I mentioned in the sync-up this morning that I’d send out an
>> introduction
>> >> to the table format we’re working on, which we’re calling Iceberg.
>> >>
>> >> For anyone that wasn’t around here’s the background: there are several
>> >> problems with how we currently manage data files to make up a table in
>> the
>> >> Hadoop ecosystem. The one that came up today was that you can’t
>> actually
>> >> update a table atomically to, for example, rewrite a file and safely
>> >> delete
>> >> records. That’s because Hive tables track what files are currently
>> visible
>> >> by listing partition directories, and we don’t have (or want)
>> transactions
>> >> for changes in Hadoop file systems. This means that you can’t actually
>> >> have
>> >> isolated commits to a table and the result is that *query results from
>> >> Hive
>> >> tables can be wrong*, though rarely in practice.
>> >>
>> >> The problems with current tables are caused primarily by keeping state
>> >> about what files are in or not in a table in the file system. As I
>> said,
>> >> one problem is that there are no transactions but you also have to list
>> >> directories to plan jobs (bad on S3) and rename files from a temporary
>> >> location to a final location (really, really bad on S3).
>> >>
>> >> To avoid these problems we’ve been building the Iceberg format that
>> tracks
>> >> tracks every file in a table instead of tracking directories. Iceberg
>> >> maintains snapshots of all the files in a dataset and atomically swaps
>> >> snapshots and other metadata to commit. There are a few benefits to
>> doing
>> >> it this way:
>> >>
>> >>    - *Snapshot isolation*: Readers always use a consistent snapshot of
>> the
>> >>    table, without needing to hold a lock. All updates are atomic.
>> >>    - *O(1) RPCs to plan*: Instead of listing O(n) directories in a
>> table
>> >> to
>> >>    plan a job, reading a snapshot requires O(1) RPC calls
>> >>    - *Distributed planning*: File pruning and predicate push-down is
>> >>    distributed to jobs, removing the metastore bottleneck
>> >>    - *Version history and rollback*: Table snapshots are kept around
>> and
>> >> it
>> >>    is possible to roll back if a job has a bug and commits
>> >>    - *Finer granularity partitioning*: Distributed planning and O(1)
>> RPC
>> >>    calls remove the current barriers to finer-grained partitioning
>> >>
>> >> We’re also taking this opportunity to fix a few other problems:
>> >>
>> >>    - Schema evolution: columns are tracked by ID to support
>> >> add/drop/rename
>> >>    - Types: a core set of types, thoroughly tested to work consistently
>> >>    across all of the supported data formats
>> >>    - Metrics: cost-based optimization metrics are kept in the snapshots
>> >>    - Portable spec: tables should not be tied to Java and should have a
>> >>    simple and clear specification for other implementers
>> >>
>> >> We have the core library to track files done, along with most of a
>> >> specification, and a Spark datasource (v2) that can read Iceberg
>> tables.
>> >> I’ll be working on the write path next and we plan to build a Presto
>> >> implementation soon.
>> >>
>> >> I think this should be useful to others and it would be great to
>> >> collaborate with anyone that is interested.
>> >>
>> >> rb
>> >> 
>> >> --
>> >> Ryan Blue
>> >> Software Engineer
>> >> Netflix
>> >>
>> >
>> >
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg table format

Reply via email to