Very Interesting. I was wondering how this is related to parquet. Are we
talking about an abstraction on top of parquet which allows capabilities
like Versioning, Snapshotting etc? Or a completely different format (may be
API compatible with parquet)?

- Rahul

On Fri, Dec 8, 2017 at 8:42 AM, Ryan Blue <[email protected]> wrote:

> I'm working on getting the code out to our open source github org, probably
> early next week. I'll set up a mailing list for it as well.
>
> rb
>
> On Thu, Dec 7, 2017 at 6:38 PM, Jacques Nadeau <[email protected]> wrote:
>
> > Sounds super interesting. Would love to collaborate on this. Do you have
> a
> > repo or mailing list where you are working on this?
> >
> >
> >
> > On Wed, Dec 6, 2017 at 4:20 PM, Ryan Blue <[email protected]>
> > wrote:
> >
> >> Hi everyone,
> >>
> >> I mentioned in the sync-up this morning that I’d send out an
> introduction
> >> to the table format we’re working on, which we’re calling Iceberg.
> >>
> >> For anyone that wasn’t around here’s the background: there are several
> >> problems with how we currently manage data files to make up a table in
> the
> >> Hadoop ecosystem. The one that came up today was that you can’t actually
> >> update a table atomically to, for example, rewrite a file and safely
> >> delete
> >> records. That’s because Hive tables track what files are currently
> visible
> >> by listing partition directories, and we don’t have (or want)
> transactions
> >> for changes in Hadoop file systems. This means that you can’t actually
> >> have
> >> isolated commits to a table and the result is that *query results from
> >> Hive
> >> tables can be wrong*, though rarely in practice.
> >>
> >> The problems with current tables are caused primarily by keeping state
> >> about what files are in or not in a table in the file system. As I said,
> >> one problem is that there are no transactions but you also have to list
> >> directories to plan jobs (bad on S3) and rename files from a temporary
> >> location to a final location (really, really bad on S3).
> >>
> >> To avoid these problems we’ve been building the Iceberg format that
> tracks
> >> tracks every file in a table instead of tracking directories. Iceberg
> >> maintains snapshots of all the files in a dataset and atomically swaps
> >> snapshots and other metadata to commit. There are a few benefits to
> doing
> >> it this way:
> >>
> >>    - *Snapshot isolation*: Readers always use a consistent snapshot of
> the
> >>    table, without needing to hold a lock. All updates are atomic.
> >>    - *O(1) RPCs to plan*: Instead of listing O(n) directories in a table
> >> to
> >>    plan a job, reading a snapshot requires O(1) RPC calls
> >>    - *Distributed planning*: File pruning and predicate push-down is
> >>    distributed to jobs, removing the metastore bottleneck
> >>    - *Version history and rollback*: Table snapshots are kept around and
> >> it
> >>    is possible to roll back if a job has a bug and commits
> >>    - *Finer granularity partitioning*: Distributed planning and O(1) RPC
> >>    calls remove the current barriers to finer-grained partitioning
> >>
> >> We’re also taking this opportunity to fix a few other problems:
> >>
> >>    - Schema evolution: columns are tracked by ID to support
> >> add/drop/rename
> >>    - Types: a core set of types, thoroughly tested to work consistently
> >>    across all of the supported data formats
> >>    - Metrics: cost-based optimization metrics are kept in the snapshots
> >>    - Portable spec: tables should not be tied to Java and should have a
> >>    simple and clear specification for other implementers
> >>
> >> We have the core library to track files done, along with most of a
> >> specification, and a Spark datasource (v2) that can read Iceberg tables.
> >> I’ll be working on the write path next and we plan to build a Presto
> >> implementation soon.
> >>
> >> I think this should be useful to others and it would be great to
> >> collaborate with anyone that is interested.
> >>
> >> rb
> >> ​
> >> --
> >> Ryan Blue
> >> Software Engineer
> >> Netflix
> >>
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to