Re: Please review (ValidateExternalType should return child in error)

Mark Andreev Sun, 25 Aug 2024 10:19:18 -0700

Hi Michael,

I would really appreciate it if you could review my PR [
https://github.com/apache/spark/pull/47522 ], as your expertise in the SQL
part of Apache Spark is invaluable. Would you mind taking a look at my
changes?




On Sun, 25 Aug 2024 at 18:15, Mark Andreev <[email protected]> wrote:

> Thank you Bjørn.
>
> My PR [ https://github.com/apache/spark/pull/47522 ] was updated to be
> aligned with the guideline.
>
> + What changes were proposed in this pull request?
> + Why are the changes needed?
> + Does this PR introduce any user-facing change?
> + How was this patch tested?
> + Was this patch authored or co-authored using generative AI tooling?
>
>
>
> On Sun, 25 Aug 2024 at 15:47, Bjørn Jørgensen <[email protected]>
> wrote:
>
>> Apache spark does have a template for PR's
>> https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE
>>
>>
>> søn. 25. aug. 2024 kl. 13:41 skrev Mich Talebzadeh <
>> [email protected]>:
>>
>>> Unfortunately it is not that straight forward
>>>
>>>
>>>    1. Committer Votes: The PR needs a sufficient number of "+1" votes
>>>    from *committers.*
>>>    2. Review Process: Address feedback from the community and
>>>    committers to ensure the PR meets the necessary standards.
>>>    3. Approval: Once approved by committers, the PR can be merged into
>>>    the main codebase.
>>>
>>>
>>> HTH
>>>
>>>
>>>
>>> On Sun, 25 Aug 2024 at 08:17, Mark Andreev <[email protected]>
>>> wrote:
>>>
>>>> Thank you for your review.
>>>>
>>>> Could you explain how to merge this commit into the upstream? I don't
>>>> want this PR to be abandoned.
>>>>
>>>> Best regards,
>>>> Mark Andreev
>>>>
>>>>
>>>> On Wed, 21 Aug 2024 at 23:08, Mich Talebzadeh <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> You have already done that and have made the request for review.
>>>>>
>>>>> +1 for me
>>>>>
>>>>> Mich Talebzadeh,
>>>>>
>>>>> Architect | Data Engineer | Data Science | Financial Crime
>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London>
>>>>>
>>>>> London, United Kingdom
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>> expert opinions (Werner
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>
>>>>>
>>>>> On Wed, 21 Aug 2024 at 22:20, Mark Andreev <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thank you, Mich.
>>>>>>
>>>>>> What is the correct procedure to request a review?
>>>>>>
>>>>>> On Tue, 20 Aug 2024 at 22:57, Mich Talebzadeh <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Mark,
>>>>>>>
>>>>>>> Added a comment to Jira to provide more clarity to Description
>>>>>>>
>>>>>>> When encountering mixed schema rows, the current error message
>>>>>>> "{actual} is not a valid external type for schema of {expected}" lacks
>>>>>>> sufficient detail to identify the problematic column. This ambiguity
>>>>>>> hinders troubleshooting and increases development time.
>>>>>>>
>>>>>>> To enhance error clarity, we propose incorporating the source column
>>>>>>> name into the error message. For example: "Column 'my_column' has an 
>>>>>>> actual
>>>>>>> type of {actual} which is not a valid external type for the expected 
>>>>>>> schema
>>>>>>> of {expected}."
>>>>>>>
>>>>>>> By providing this additional context, developers can more
>>>>>>> efficiently pinpoint and resolve schema mismatches.
>>>>>>>
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>>
>>>>>>> Architect | Data Engineer | Data Science | Financial Crime
>>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>>>>>> College London
>>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London>
>>>>>>> London, United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>>> expert opinions (Werner
>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 20 Aug 2024 at 21:59, Mark Andreev <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Could you review my small PR [SPARK-49044][SQL]
>>>>>>>> ValidateExternalType should return a child in error (
>>>>>>>> https://github.com/apache/spark/pull/47522 )?  Changes contain
>>>>>>>> tests that verify results.
>>>>>>>>
>>>>>>>> TLDR: After fix error message will contain extra information: [B
>>>>>>>> is not a valid external type for schema of string at
>>>>>>>> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row,
>>>>>>>> true]), 1, f3)
>>>>>>>> If you need more information, please let me know. If you're busy,
>>>>>>>> please let me know the best time to reach you again.
>>>>>>>>
>>>>>>>> On Mon, 29 Jul 2024 at 18:15, Mark Andreev <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Spark Devs,
>>>>>>>>>
>>>>>>>>> Please review my PR [ https://github.com/apache/spark/pull/47522
>>>>>>>>> ] that relates to ticket [
>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-49044 ].
>>>>>>>>>
>>>>>>>>> Context: When we have mixed schema rows, the error message
>>>>>>>>> "{actual} is not a valid external type for schema of {expected}" 
>>>>>>>>> doesn't
>>>>>>>>> help to understand the column with the problem. I suggest adding
>>>>>>>>> information about the source column.
>>>>>>>>>
>>>>>>>>> Example:
>>>>>>>>> https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala
>>>>>>>>>
>>>>>>>>> Before fix: [B is not a valid external type for schema of string
>>>>>>>>> After fix: [B is not a valid external type for schema of string
>>>>>>>>> at getexternalrowfield(assertnotnull(input[0, 
>>>>>>>>> org.apache.spark.sql.Row,
>>>>>>>>> true]), 1, f3)
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> Mark Andreev
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Mark Andreev
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Mark Andreev
>>>>>>
>>>>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4, 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
>
>
> --
> Best regards,
> Mark Andreev
>


-- 
Best regards,
Mark Andreev

Re: Please review (ValidateExternalType should return child in error)

Reply via email to