Re: Please review (ValidateExternalType should return child in error)

Mark Andreev Wed, 28 Aug 2024 14:02:39 -0700

Hi Maksim,

I would really appreciate it if you could review my PR [
https://github.com/apache/spark/pull/47522 ]. Would you mind taking a look
at my changes? How can I improve my PR flow for a better view from the
reviewer's perspective?


On Sun, 25 Aug 2024 at 18:18, Mark Andreev <[email protected]> wrote:

> Hi Michael,
>
> I would really appreciate it if you could review my PR [
> https://github.com/apache/spark/pull/47522 ], as your expertise in the
> SQL part of Apache Spark is invaluable. Would you mind taking a look at my
> changes?
>
>
>
> On Sun, 25 Aug 2024 at 18:15, Mark Andreev <[email protected]> wrote:
>
>> Thank you Bjørn.
>>
>> My PR [ https://github.com/apache/spark/pull/47522 ] was updated to be
>> aligned with the guideline.
>>
>> + What changes were proposed in this pull request?
>> + Why are the changes needed?
>> + Does this PR introduce any user-facing change?
>> + How was this patch tested?
>> + Was this patch authored or co-authored using generative AI tooling?
>>
>>
>>
>> On Sun, 25 Aug 2024 at 15:47, Bjørn Jørgensen <[email protected]>
>> wrote:
>>
>>> Apache spark does have a template for PR's
>>> https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE
>>>
>>>
>>> søn. 25. aug. 2024 kl. 13:41 skrev Mich Talebzadeh <
>>> [email protected]>:
>>>
>>>> Unfortunately it is not that straight forward
>>>>
>>>>
>>>>    1. Committer Votes: The PR needs a sufficient number of "+1" votes
>>>>    from *committers.*
>>>>    2. Review Process: Address feedback from the community and
>>>>    committers to ensure the PR meets the necessary standards.
>>>>    3. Approval: Once approved by committers, the PR can be merged into
>>>>    the main codebase.
>>>>
>>>>
>>>> HTH
>>>>
>>>>
>>>>
>>>> On Sun, 25 Aug 2024 at 08:17, Mark Andreev <[email protected]>
>>>> wrote:
>>>>
>>>>> Thank you for your review.
>>>>>
>>>>> Could you explain how to merge this commit into the upstream? I don't
>>>>> want this PR to be abandoned.
>>>>>
>>>>> Best regards,
>>>>> Mark Andreev
>>>>>
>>>>>
>>>>> On Wed, 21 Aug 2024 at 23:08, Mich Talebzadeh <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Mark,
>>>>>>
>>>>>> You have already done that and have made the request for review.
>>>>>>
>>>>>> +1 for me
>>>>>>
>>>>>> Mich Talebzadeh,
>>>>>>
>>>>>> Architect | Data Engineer | Data Science | Financial Crime
>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>>>>> College London
>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London>
>>>>>> London, United Kingdom
>>>>>>
>>>>>>
>>>>>>    view my Linkedin profile
>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>
>>>>>>
>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>> expert opinions (Werner
>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>
>>>>>>
>>>>>> On Wed, 21 Aug 2024 at 22:20, Mark Andreev <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you, Mich.
>>>>>>>
>>>>>>> What is the correct procedure to request a review?
>>>>>>>
>>>>>>> On Tue, 20 Aug 2024 at 22:57, Mich Talebzadeh <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Mark,
>>>>>>>>
>>>>>>>> Added a comment to Jira to provide more clarity to Description
>>>>>>>>
>>>>>>>> When encountering mixed schema rows, the current error message
>>>>>>>> "{actual} is not a valid external type for schema of {expected}" lacks
>>>>>>>> sufficient detail to identify the problematic column. This ambiguity
>>>>>>>> hinders troubleshooting and increases development time.
>>>>>>>>
>>>>>>>> To enhance error clarity, we propose incorporating the source
>>>>>>>> column name into the error message. For example: "Column 'my_column' 
>>>>>>>> has an
>>>>>>>> actual type of {actual} which is not a valid external type for the 
>>>>>>>> expected
>>>>>>>> schema of {expected}."
>>>>>>>>
>>>>>>>> By providing this additional context, developers can more
>>>>>>>> efficiently pinpoint and resolve schema mismatches.
>>>>>>>>
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>> Mich Talebzadeh,
>>>>>>>>
>>>>>>>> Architect | Data Engineer | Data Science | Financial Crime
>>>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>>>>>>> College London
>>>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London>
>>>>>>>> London, United Kingdom
>>>>>>>>
>>>>>>>>
>>>>>>>>    view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* The information provided is correct to the best of
>>>>>>>> my knowledge but of course cannot be guaranteed . It is essential to 
>>>>>>>> note
>>>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>>>> expert opinions (Werner
>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 20 Aug 2024 at 21:59, Mark Andreev <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Could you review my small PR [SPARK-49044][SQL]
>>>>>>>>> ValidateExternalType should return a child in error (
>>>>>>>>> https://github.com/apache/spark/pull/47522 )?  Changes contain
>>>>>>>>> tests that verify results.
>>>>>>>>>
>>>>>>>>> TLDR: After fix error message will contain extra information: [B
>>>>>>>>> is not a valid external type for schema of string at
>>>>>>>>> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row,
>>>>>>>>> true]), 1, f3)
>>>>>>>>> If you need more information, please let me know. If you're busy,
>>>>>>>>> please let me know the best time to reach you again.
>>>>>>>>>
>>>>>>>>> On Mon, 29 Jul 2024 at 18:15, Mark Andreev <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Spark Devs,
>>>>>>>>>>
>>>>>>>>>> Please review my PR [ https://github.com/apache/spark/pull/47522
>>>>>>>>>> ] that relates to ticket [
>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-49044 ].
>>>>>>>>>>
>>>>>>>>>> Context: When we have mixed schema rows, the error message
>>>>>>>>>> "{actual} is not a valid external type for schema of {expected}" 
>>>>>>>>>> doesn't
>>>>>>>>>> help to understand the column with the problem. I suggest adding
>>>>>>>>>> information about the source column.
>>>>>>>>>>
>>>>>>>>>> Example:
>>>>>>>>>> https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala
>>>>>>>>>>
>>>>>>>>>> Before fix: [B is not a valid external type for schema of string
>>>>>>>>>> After fix: [B is not a valid external type for schema of string
>>>>>>>>>> at getexternalrowfield(assertnotnull(input[0, 
>>>>>>>>>> org.apache.spark.sql.Row,
>>>>>>>>>> true]), 1, f3)
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best regards,
>>>>>>>>>> Mark Andreev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> Mark Andreev
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Mark Andreev
>>>>>>>
>>>>>>
>>>
>>> --
>>> Bjørn Jørgensen
>>> Vestre Aspehaug 4, 6010 Ålesund
>>> Norge
>>>
>>> +47 480 94 297
>>>
>>
>>
>> --
>> Best regards,
>> Mark Andreev
>>
>
>
> --
> Best regards,
> Mark Andreev
>


-- 
Best regards,
Mark Andreev

Re: Please review (ValidateExternalType should return child in error)

Reply via email to