Replies inline. On Sat, Dec 9, 2017 at 2:00 AM, Julian Hyde <[email protected]> wrote:
> Since you’re dealing with change (atomic updates and so forth) are we > still just talking about a *format* or does the architecture also now > include pieces that could be described as *servers* and *protocols*? > We're talking primarily about a format. Iceberg defines a way to maintain table contents in metadata files. The scheme requires some atomic operation to change the latest metadata file, which I've prototyped with an atomic rename in HDFS. We are going to use an atomic swap (check-and-put) in our metastore for the purpose. Otherwise, it's a format. > When I first spoke to the CarbonData folks I observed that they were > maintaining a global index and therefore were something more than just a > format. Are you taking parquet into the same territory? And if not, what > assumptions did you make to keep things simple? > Although this is on the Parquet list, it is a higher-level project that we're starting up. From our discussion on the sync-up, people on this list are interested so I thought I'd throw it out to the community. I'll start a list for this project and stop spamming the Parquet dev list soon. I'm not sure how to answer your question about assumptions we've made. Maybe it will be more clear when we get more docs out. Until then, I'm happy to answer questions about how it works. > I hope I’m not being too negative. If you’re solving distributed metastore > problems that would be huge. But it sounds a bit too good to be true. > You're not at all too negative. I realize that this is new and we have yet to prove out portions of it. What we have so far works. My initial tests are on our job metrics table, which has 10 months of task-level data from our production Hadoop clusters. Each day is is about 400m rows in 30GB across 250 Parquet files. In all, the test table is about 65,000 files. The current Iceberg library takes about 800ms to select the files for a query, using a single thread (the format supports breaking the work across multiple threads). And because this is scanning a manifest of all the files in a table, it can plan any query on that table in this amount of time. Planning in 1s isn't incredible, but it is a major improvement over the current Hive layout considering that some of our table scans take 10 minutes to plan (using 8 threads) due to directory listing in S3. rb -- Ryan Blue Software Engineer Netflix
