You can't change parquet schema without reencoding the data as you need to recalculate the footer index data. You can manually do what SPARK-3851 <https://issues.apache.org/jira/browse/SPARK-3851> is going to do today however.
Consider two schemas: Old Schema: (a: Int, b: String) New Schema, where I've dropped and added a column: (a: Int, c: Long) parquetFile(old).registerTempTable("old") parquetFile(new).registerTempTable("new") sql(""" SELECT a, b, CAST(null AS LONG) AS c FROM old UNION ALL SELECT a, CAST(null AS STRING) AS b, c FROM new """).registerTempTable("unifiedData") Because of filter/column pushdown past UNIONs this should executed as desired even if you write more complicated queries on top of "unifiedData". Its a little onerous but should work for now. This can also support things like column renaming which would be much harder to do automatically. On Fri, Oct 31, 2014 at 1:49 PM, Gary Malouf <malouf.g...@gmail.com> wrote: > Outside of what is discussed here > <https://issues.apache.org/jira/browse/SPARK-3851> as a future solution, > is > there any path for being able to modify a Parquet schema once some data has > been written? This seems like the kind of thing that should make people > pause when considering whether or not to use Parquet+Spark... >