Yaohua Zhao created SPARK-38314: ----------------------------------- Summary: Fail to read parquet files after writing the hidden file metadata in Key: SPARK-38314 URL: https://issues.apache.org/jira/browse/SPARK-38314 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Yaohua Zhao
Selecting and then writing df containing hidden file metadata column `_metadata` into a file format like `parquet`, `delta` will still keep the internal `Attribute` metadata information. Then when reading those `parquet`, `delta` files again, it will actually break the code, because it wrongly thinks user data schema named `_metadata` is a hidden file source metadata column. Reproducible code: ``` // prepare a file source df df.select("*", "_metadata") .write.format("parquet").save(path) spark.read.format("parquet").load(path) .select("*").show() ``` -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org