Happy new year! I'm interested as well. Did you get to publish your code on github? Thanks
On Fri, Dec 8, 2017 at 8:42 AM, Ryan Blue <[email protected]> wrote: > I'm working on getting the code out to our open source github org, probably > early next week. I'll set up a mailing list for it as well. > > rb > > On Thu, Dec 7, 2017 at 6:38 PM, Jacques Nadeau <[email protected]> wrote: > > > Sounds super interesting. Would love to collaborate on this. Do you have > a > > repo or mailing list where you are working on this? > > > > > > > > On Wed, Dec 6, 2017 at 4:20 PM, Ryan Blue <[email protected]> > > wrote: > > > >> Hi everyone, > >> > >> I mentioned in the sync-up this morning that I’d send out an > introduction > >> to the table format we’re working on, which we’re calling Iceberg. > >> > >> For anyone that wasn’t around here’s the background: there are several > >> problems with how we currently manage data files to make up a table in > the > >> Hadoop ecosystem. The one that came up today was that you can’t actually > >> update a table atomically to, for example, rewrite a file and safely > >> delete > >> records. That’s because Hive tables track what files are currently > visible > >> by listing partition directories, and we don’t have (or want) > transactions > >> for changes in Hadoop file systems. This means that you can’t actually > >> have > >> isolated commits to a table and the result is that *query results from > >> Hive > >> tables can be wrong*, though rarely in practice. > >> > >> The problems with current tables are caused primarily by keeping state > >> about what files are in or not in a table in the file system. As I said, > >> one problem is that there are no transactions but you also have to list > >> directories to plan jobs (bad on S3) and rename files from a temporary > >> location to a final location (really, really bad on S3). > >> > >> To avoid these problems we’ve been building the Iceberg format that > tracks > >> tracks every file in a table instead of tracking directories. Iceberg > >> maintains snapshots of all the files in a dataset and atomically swaps > >> snapshots and other metadata to commit. There are a few benefits to > doing > >> it this way: > >> > >> - *Snapshot isolation*: Readers always use a consistent snapshot of > the > >> table, without needing to hold a lock. All updates are atomic. > >> - *O(1) RPCs to plan*: Instead of listing O(n) directories in a table > >> to > >> plan a job, reading a snapshot requires O(1) RPC calls > >> - *Distributed planning*: File pruning and predicate push-down is > >> distributed to jobs, removing the metastore bottleneck > >> - *Version history and rollback*: Table snapshots are kept around and > >> it > >> is possible to roll back if a job has a bug and commits > >> - *Finer granularity partitioning*: Distributed planning and O(1) RPC > >> calls remove the current barriers to finer-grained partitioning > >> > >> We’re also taking this opportunity to fix a few other problems: > >> > >> - Schema evolution: columns are tracked by ID to support > >> add/drop/rename > >> - Types: a core set of types, thoroughly tested to work consistently > >> across all of the supported data formats > >> - Metrics: cost-based optimization metrics are kept in the snapshots > >> - Portable spec: tables should not be tied to Java and should have a > >> simple and clear specification for other implementers > >> > >> We have the core library to track files done, along with most of a > >> specification, and a Spark datasource (v2) that can read Iceberg tables. > >> I’ll be working on the write path next and we plan to build a Presto > >> implementation soon. > >> > >> I think this should be useful to others and it would be great to > >> collaborate with anyone that is interested. > >> > >> rb > >> > >> -- > >> Ryan Blue > >> Software Engineer > >> Netflix > >> > > > > > > > -- > Ryan Blue > Software Engineer > Netflix >
