Hi, All. A data schema can evolve in several ways and Apache Spark 2.3 already supports the followings for file-based data sources like CSV/JSON/ORC/Parquet.
1. Add a column 2. Remove a column 3. Change a column position 4. Change a column type Can we guarantee users some schema evolution coverage on file-based data sources by adding schema evolution test suites explicitly? So far, there are some test cases. For simplicity, I have several assumptions on schema evolution. 1. A safe evolution without data loss. - e.g. from small types to larger types like int-to-long, not vice versa. 2. Final schema is given by users (or Hive) 3. Simple Spark data types supported by Spark vectorized execution. I made a test case PR to receive your opinions for this. [SPARK-23007][SQL][TEST] Add schema evolution test suite for file-based data sources - https://github.com/apache/spark/pull/20208 Could you take a look and give some opinions? Bests, Dongjoon.