Awesome, this is working for us, although we had to modify our code to also use the NameMapping when grabbing parquet file metrics. Thanks!
From: Ryan Blue <rb...@netflix.com.INVALID> Reply-To: "dev@iceberg.apache.org" <dev@iceberg.apache.org>, "rb...@netflix.com" <rb...@netflix.com> Date: Friday, October 30, 2020 at 5:55 PM To: "sckru...@paypal.com.invalid" <sckru...@paypal.com.invalid> Cc: "dev@iceberg.apache.org" <dev@iceberg.apache.org> Subject: Re: Migrating plain parquet tables to iceberg This message is from an external sender. For existing tables that use name-based column resolution, you can add a name-to-id mapping that is applied when reading files with no field IDs. There is a utility to generate the name mapping from an existing schema (using the current names) and then you just need to store that in a table property. NameMapping mapping = MappingUtil.create(table.schema()); table.updateProperties() .set("schema.name-mapping.default", NameMappingParser.toJson(mapping)) .commit() I think there is also an issue to add a name mapping by default when importing data. On Fri, Oct 30, 2020 at 3:46 PM Kruger, Scott <sckru...@paypal.com.invalid> wrote: I’m looking to migrate a partitioned parquet table to use iceberg. The issue I’ve run into is that the column order for the data varies wildly, which isn’t a problem for us normally (we just set mergeSchemas=true when reading), but presents a problem with iceberg because the iceberg.schema field isn’t set in the parquet footer. Is there any way to migrate this data over without rewriting the entire dataset? -- Ryan Blue Software Engineer Netflix