Hi Maksim, I would really appreciate it if you could review my PR [ https://github.com/apache/spark/pull/47522 ]. Would you mind taking a look at my changes? How can I improve my PR flow for a better view from the reviewer's perspective?
On Sun, 25 Aug 2024 at 18:18, Mark Andreev <mark.andr...@gmail.com> wrote: > Hi Michael, > > I would really appreciate it if you could review my PR [ > https://github.com/apache/spark/pull/47522 ], as your expertise in the > SQL part of Apache Spark is invaluable. Would you mind taking a look at my > changes? > > > > On Sun, 25 Aug 2024 at 18:15, Mark Andreev <mark.andr...@gmail.com> wrote: > >> Thank you Bjørn. >> >> My PR [ https://github.com/apache/spark/pull/47522 ] was updated to be >> aligned with the guideline. >> >> + What changes were proposed in this pull request? >> + Why are the changes needed? >> + Does this PR introduce any user-facing change? >> + How was this patch tested? >> + Was this patch authored or co-authored using generative AI tooling? >> >> >> >> On Sun, 25 Aug 2024 at 15:47, Bjørn Jørgensen <bjornjorgen...@gmail.com> >> wrote: >> >>> Apache spark does have a template for PR's >>> https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE >>> >>> >>> søn. 25. aug. 2024 kl. 13:41 skrev Mich Talebzadeh < >>> mich.talebza...@gmail.com>: >>> >>>> Unfortunately it is not that straight forward >>>> >>>> >>>> 1. Committer Votes: The PR needs a sufficient number of "+1" votes >>>> from *committers.* >>>> 2. Review Process: Address feedback from the community and >>>> committers to ensure the PR meets the necessary standards. >>>> 3. Approval: Once approved by committers, the PR can be merged into >>>> the main codebase. >>>> >>>> >>>> HTH >>>> >>>> >>>> >>>> On Sun, 25 Aug 2024 at 08:17, Mark Andreev <mark.andr...@gmail.com> >>>> wrote: >>>> >>>>> Thank you for your review. >>>>> >>>>> Could you explain how to merge this commit into the upstream? I don't >>>>> want this PR to be abandoned. >>>>> >>>>> Best regards, >>>>> Mark Andreev >>>>> >>>>> >>>>> On Wed, 21 Aug 2024 at 23:08, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Hi Mark, >>>>>> >>>>>> You have already done that and have made the request for review. >>>>>> >>>>>> +1 for me >>>>>> >>>>>> Mich Talebzadeh, >>>>>> >>>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>>> College London >>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>>> London, United Kingdom >>>>>> >>>>>> >>>>>> view my Linkedin profile >>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>> >>>>>> >>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> *Disclaimer:* The information provided is correct to the best of my >>>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>>> expert opinions (Werner >>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>> >>>>>> >>>>>> On Wed, 21 Aug 2024 at 22:20, Mark Andreev <mark.andr...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thank you, Mich. >>>>>>> >>>>>>> What is the correct procedure to request a review? >>>>>>> >>>>>>> On Tue, 20 Aug 2024 at 22:57, Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Mark, >>>>>>>> >>>>>>>> Added a comment to Jira to provide more clarity to Description >>>>>>>> >>>>>>>> When encountering mixed schema rows, the current error message >>>>>>>> "{actual} is not a valid external type for schema of {expected}" lacks >>>>>>>> sufficient detail to identify the problematic column. This ambiguity >>>>>>>> hinders troubleshooting and increases development time. >>>>>>>> >>>>>>>> To enhance error clarity, we propose incorporating the source >>>>>>>> column name into the error message. For example: "Column 'my_column' >>>>>>>> has an >>>>>>>> actual type of {actual} which is not a valid external type for the >>>>>>>> expected >>>>>>>> schema of {expected}." >>>>>>>> >>>>>>>> By providing this additional context, developers can more >>>>>>>> efficiently pinpoint and resolve schema mismatches. >>>>>>>> >>>>>>>> >>>>>>>> HTH >>>>>>>> >>>>>>>> Mich Talebzadeh, >>>>>>>> >>>>>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>>>>> College London >>>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>>>>> London, United Kingdom >>>>>>>> >>>>>>>> >>>>>>>> view my Linkedin profile >>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>> >>>>>>>> >>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Disclaimer:* The information provided is correct to the best of >>>>>>>> my knowledge but of course cannot be guaranteed . It is essential to >>>>>>>> note >>>>>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>>>>> expert opinions (Werner >>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 20 Aug 2024 at 21:59, Mark Andreev <mark.andr...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Could you review my small PR [SPARK-49044][SQL] >>>>>>>>> ValidateExternalType should return a child in error ( >>>>>>>>> https://github.com/apache/spark/pull/47522 )? Changes contain >>>>>>>>> tests that verify results. >>>>>>>>> >>>>>>>>> TLDR: After fix error message will contain extra information: [B >>>>>>>>> is not a valid external type for schema of string at >>>>>>>>> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, >>>>>>>>> true]), 1, f3) >>>>>>>>> If you need more information, please let me know. If you're busy, >>>>>>>>> please let me know the best time to reach you again. >>>>>>>>> >>>>>>>>> On Mon, 29 Jul 2024 at 18:15, Mark Andreev <mark.andr...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Spark Devs, >>>>>>>>>> >>>>>>>>>> Please review my PR [ https://github.com/apache/spark/pull/47522 >>>>>>>>>> ] that relates to ticket [ >>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-49044 ]. >>>>>>>>>> >>>>>>>>>> Context: When we have mixed schema rows, the error message >>>>>>>>>> "{actual} is not a valid external type for schema of {expected}" >>>>>>>>>> doesn't >>>>>>>>>> help to understand the column with the problem. I suggest adding >>>>>>>>>> information about the source column. >>>>>>>>>> >>>>>>>>>> Example: >>>>>>>>>> https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala >>>>>>>>>> >>>>>>>>>> Before fix: [B is not a valid external type for schema of string >>>>>>>>>> After fix: [B is not a valid external type for schema of string >>>>>>>>>> at getexternalrowfield(assertnotnull(input[0, >>>>>>>>>> org.apache.spark.sql.Row, >>>>>>>>>> true]), 1, f3) >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best regards, >>>>>>>>>> Mark Andreev >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best regards, >>>>>>>>> Mark Andreev >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> Mark Andreev >>>>>>> >>>>>> >>> >>> -- >>> Bjørn Jørgensen >>> Vestre Aspehaug 4, 6010 Ålesund >>> Norge >>> >>> +47 480 94 297 >>> >> >> >> -- >> Best regards, >> Mark Andreev >> > > > -- > Best regards, > Mark Andreev > -- Best regards, Mark Andreev