Hi Mark, Added a comment to Jira to provide more clarity to Description
When encountering mixed schema rows, the current error message "{actual} is not a valid external type for schema of {expected}" lacks sufficient detail to identify the problematic column. This ambiguity hinders troubleshooting and increases development time. To enhance error clarity, we propose incorporating the source column name into the error message. For example: "Column 'my_column' has an actual type of {actual} which is not a valid external type for the expected schema of {expected}." By providing this additional context, developers can more efficiently pinpoint and resolve schema mismatches. HTH Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Tue, 20 Aug 2024 at 21:59, Mark Andreev <mark.andr...@gmail.com> wrote: > Hi, > > Could you review my small PR [SPARK-49044][SQL] ValidateExternalType > should return a child in error ( > https://github.com/apache/spark/pull/47522 )? Changes contain tests that > verify results. > > TLDR: After fix error message will contain extra information: [B is not a > valid external type for schema of string at > getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, > true]), 1, f3) > If you need more information, please let me know. If you're busy, please > let me know the best time to reach you again. > > On Mon, 29 Jul 2024 at 18:15, Mark Andreev <mark.andr...@gmail.com> wrote: > >> Hi Spark Devs, >> >> Please review my PR [ https://github.com/apache/spark/pull/47522 ] that >> relates to ticket [ https://issues.apache.org/jira/browse/SPARK-49044 ]. >> >> Context: When we have mixed schema rows, the error message "{actual} is >> not a valid external type for schema of {expected}" doesn't help to >> understand the column with the problem. I suggest adding information about >> the source column. >> >> Example: >> https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala >> >> Before fix: [B is not a valid external type for schema of string >> After fix: [B is not a valid external type for schema of string at >> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, >> true]), 1, f3) >> >> -- >> Best regards, >> Mark Andreev >> > > > -- > Best regards, > Mark Andreev >