Re: Migrating plain parquet tables to iceberg

Kruger, Scott Tue, 03 Nov 2020 11:52:24 -0800

Awesome, this is working for us, although we had to modify our code to also use 
the NameMapping when grabbing parquet file metrics. Thanks!

From: Ryan Blue <rb...@netflix.com.INVALID>
Reply-To: "dev@iceberg.apache.org" <dev@iceberg.apache.org>, 
"rb...@netflix.com" <rb...@netflix.com>
Date: Friday, October 30, 2020 at 5:55 PM
To: "sckru...@paypal.com.invalid" <sckru...@paypal.com.invalid>
Cc: "dev@iceberg.apache.org" <dev@iceberg.apache.org>
Subject: Re: Migrating plain parquet tables to iceberg

This message is from an external sender.

For existing tables that use name-based column resolution, you can add a 
name-to-id mapping that is applied when reading files with no field IDs. There 
is a utility to generate the name mapping from an existing schema (using the 
current names) and then you just need to store that in a table property.

NameMapping mapping = MappingUtil.create(table.schema());

table.updateProperties()

    .set("schema.name-mapping.default", NameMappingParser.toJson(mapping))

    .commit()

I think there is also an issue to add a name mapping by default when importing 
data.

On Fri, Oct 30, 2020 at 3:46 PM Kruger, Scott <sckru...@paypal.com.invalid> 
wrote:
I’m looking to migrate a partitioned parquet table to use iceberg. The issue 
I’ve run into is that the column order for the data varies wildly, which isn’t 
a problem for us normally (we just set mergeSchemas=true when reading), but 
presents a problem with iceberg because the iceberg.schema field isn’t set in 
the parquet footer. Is there any way to migrate this data over without 
rewriting the entire dataset?

--
Ryan Blue
Software Engineer
Netflix

Re: Migrating plain parquet tables to iceberg

Reply via email to