Replies inline.

On Sat, Dec 9, 2017 at 2:00 AM, Julian Hyde <[email protected]> wrote:

> Since you’re dealing with change (atomic updates and so forth) are we
> still just talking about a *format* or does the architecture also now
> include pieces that could be described as *servers* and *protocols*?
>

We're talking primarily about a format. Iceberg defines a way to maintain
table contents in metadata files. The scheme requires some atomic operation
to change the latest metadata file, which I've prototyped with an atomic
rename in HDFS. We are going to use an atomic swap (check-and-put) in our
metastore for the purpose. Otherwise, it's a format.


> When I first spoke to the CarbonData folks I observed that they were
> maintaining a global index and therefore were something more than just a
> format. Are you taking parquet into the same territory? And if not, what
> assumptions did you make to keep things simple?
>

Although this is on the Parquet list, it is a higher-level project that
we're starting up. From our discussion on the sync-up, people on this list
are interested so I thought I'd throw it out to the community. I'll start a
list for this project and stop spamming the Parquet dev list soon.

I'm not sure how to answer your question about assumptions we've made.
Maybe it will be more clear when we get more docs out. Until then, I'm
happy to answer questions about how it works.


> I hope I’m not being too negative. If you’re solving distributed metastore
> problems that would be huge. But it sounds a bit too good to be true.
>

You're not at all too negative. I realize that this is new and we have yet
to prove out portions of it. What we have so far works.

My initial tests are on our job metrics table, which has 10 months of
task-level data from our production Hadoop clusters. Each day is is about
400m rows in 30GB across 250 Parquet files. In all, the test table is about
65,000 files. The current Iceberg library takes about 800ms to select the
files for a query, using a single thread (the format supports breaking the
work across multiple threads). And because this is scanning a manifest of
all the files in a table, it can plan any query on that table in this
amount of time. Planning in 1s isn't incredible, but it is a major
improvement over the current Hive layout considering that some of our table
scans take 10 minutes to plan (using 8 threads) due to directory listing in
S3.

rb

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to