Vlad, let's open a PR and discuss it there. We have many other committees to review / help with as well.
On Fri, Mar 28, 2025 at 6:28 AM Rozov, Vlad <vro...@amazon.com.invalid> wrote: > Hi Hyukjin, > > I open https://issues.apache.org/jira/browse/SPARK-51643 and > https://issues.apache.org/jira/browse/SPARK-51644. Please add more > details to the first JIRA. As far as I can see > https://github.com/vrozov/spark/tree/spark-shell should fix both JIRAs > and if not I’d like to understand why the first one would not be fixed by > adding back relocation of guava to the connect client. > > Thank you, > > Vlad > > > > On Mar 26, 2025, at 6:15 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: > > Nah, I wasn't clear. > > Maven and SBT builds are synced for this special code path, e.g., > https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0 > . > If you build Maven and SBT, the results are almost the same. > > Now, the fix you landed in Maven (and indeed it was a Maven specific fix). > However, it changes the syncing part between Maven and SBT, and it results > in a different output. > The CI in the PR passed because we run SBT in PR builders. > > My suggestion is to fix both at the same time as usual PRs have been so > also considering that the fix had an issue. > It would also need some closer looks who are used to SBT as well to make > sure the same issue does not happen. > > > > On Thu, 27 Mar 2025 at 10:06, Rozov, Vlad <vro...@amazon.com.invalid> > wrote: > >> Sorry, but I still don’t follow. My PR broke Maven and the fix I provided >> fixes Maven. SBT was never broken except there is inconsistency between SBT >> and Maven builds. Can the inconsistency be fixed in a follow up PR? >> >> Thank you, >> >> Vlad >> >> On Mar 26, 2025, at 5:57 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: >> >> It is not broken ... because we run SBT in PR builders for ASF resource >> restrictions and faster build. We use Maven for release so it was found out >> now. >> >> CI did not test your change. The part you are fixing is a special path .. >> >> On Thu, Mar 27, 2025 at 9:53 AM Rozov, Vlad <vro...@amazon.com.invalid> >> wrote: >> >>> If it is not broken, can the sync between maven and SBT >>> dependencies/shadow be done in a follow up PR? >>> >>> Thank you, >>> >>> Vlad >>> >>> >>> On Mar 26, 2025, at 5:44 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: >>> >>> It is not broken. The fix you applied would not be applied in SBT. For >>> example, the lines you changed (added in >>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0 >>> ): >>> >>> diff``` >>> - <relocation> >>> - <pattern>com.google.common</pattern> >>> - >>> <shadedPattern>${spark.shade.packageName}.connect.guava</shadedPattern> >>> - <includes> >>> - <include>com.google.common.**</include> >>> - </includes> >>> - </relocation> >>> ``` >>> >>> The companion part of this has to be removed in SBT as well, for example >>> here >>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3eR674 >>> >>> Adding simple dependency changes usually are shared between SBT and >>> Maven so it is not a tricky part. However, some changes like this have to >>> be made in both places. >>> Both Maven and SBT builds are usually sync, and changes are mode >>> together. Let me know if I am underding your change wrongly. >>> >>> >>> On Thu, 27 Mar 2025 at 09:25, Rozov, Vlad <vro...@amazon.com.invalid> >>> wrote: >>> >>>> Hi Hyukjin, >>>> >>>> Can you clarify what is broken in SBT? As I already mentioned in my >>>> previous e-mail, I tested both maven and SBT builds and both works. Is >>>> there a test that I can run that shows what is broken? Please file JIRA, so >>>> we can communicate on the JIRA and document what is broken. >>>> >>>> Thank you, >>>> >>>> Vlad >>>> >>>> On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: >>>> >>>> That only fixes Maven. Both SBT build and Maven build should work in >>>> the same or similar wat. Let's make sure both work. >>>> >>>> On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad <vro...@amazon.com.invalid> >>>> wrote: >>>> >>>>> Please see https://github.com/vrozov/spark/tree/spark-shell. I tested >>>>> only spark-shell —remote local after building with maven and sbt. It may >>>>> not be a complete fix and there is no PR. I’ll look into SBT build issue >>>>> (assuming that there is still one after the fix) once you file JIRA. >>>>> >>>>> Thank you, >>>>> >>>>> Vlad >>>>> >>>>> >>>>> On Mar 26, 2025, at 8:27 AM, Hyukjin Kwon <gurwls...@apache.org> >>>>> wrote: >>>>> >>>>> Vlad, >>>>> >>>>> - Please show me if there is a simple fix. If that's the case, yes, I >>>>> will revert this out from the master branch. That works for me. >>>>> - If not, let's make a new PR. >>>>> - If you feel this is an issue, let's start a vote. Let me know. >>>>> >>>>> >>>>> On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad <vro...@amazon.com.invalid> >>>>> wrote: >>>>> >>>>>> Every graduated from incubating Apache project has guards against >>>>>> what you name “chaotic” and what other name breaking best development >>>>>> practices. Such guards include JIRA, unit tests and PR review. Instead of >>>>>> reverting commit, I would expect you to open JIRA and outline what is >>>>>> broken. If you filed JIRA, please provide the link, if not, please open >>>>>> one. >>>>>> >>>>>> It took me 2 hours to fix Spark shells, so should you open JIRA >>>>>> instead of spending time to identify the commit and reverting it, you >>>>>> will >>>>>> save time as well. I’ll post fix once JIRA is open and I validate that my >>>>>> understanding of what is broken is the same as yours. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Vlad >>>>>> >>>>>> On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon <gurwls...@apache.org> >>>>>> wrote: >>>>>> >>>>>> Rozov, please test the patch, see if there is a relevant test or not, >>>>>> and add a test if not there. If it is difficult to add a test, describe >>>>>> it >>>>>> in the PR description, and how you manually tested. >>>>>> This is what I think you need to do instead of reverting the revert. >>>>>> Imagine that there are many of such PRs living in the master branch. >>>>>> Thankfully this time you are alone but imagine many people do the same >>>>>> things - it will be chaotic. To me, it took 4 hours to identify this >>>>>> single >>>>>> commit. >>>>>> >>>>>> About the change itself, you need to fix SBT together, and it will >>>>>> need the changes such as >>>>>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0 >>>>>> . >>>>>> Should also test Spark shells, and describe how you tested it as well. >>>>>> >>>>>> This is what I expect: >>>>>> - Please show me if there is a simple fix. If that's the case, yes, I >>>>>> will revert this out from the master branch. That works for me. >>>>>> - If not, let's make a new PR. >>>>>> - If you feel this is an issue, let's start a vote. Let me know. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, 26 Mar 2025 at 15:07, Reynold Xin <r...@databricks.com.invalid> >>>>>> wrote: >>>>>> >>>>>>> Sorry Vlad - I disagree. Where is the simple fix? As a new >>>>>>> contributor, you should not be coming in guns blazing blaming committers >>>>>>> who are trying to keep the master branch sane and clean. >>>>>>> >>>>>>> On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad >>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>> >>>>>>>> There is a simple fix. This is exactly what I outlined in the >>>>>>>> e-mail. Prior to reverting commit (on master) it was necessary to see >>>>>>>> if an >>>>>>>> easy fix exists. The PR that introduced the error was merged into >>>>>>>> master 3 weeks ago, so I still don’t get why it was reverted >>>>>>>> overnight. It >>>>>>>> was also necessary to open JIRA and provide reproducible steps. I >>>>>>>> assume >>>>>>>> that steps on the PR is what needs to be fixed, but it will be better >>>>>>>> to >>>>>>>> avoid guessing. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> >>>>>>>> Vlad >>>>>>>> >>>>>>>> On Mar 25, 2025, at 10:05 PM, Reynold Xin >>>>>>>> <r...@databricks.com.INVALID> wrote: >>>>>>>> >>>>>>>> Is there a fix already available or a very simple fix a >>>>>>>> committer can create quickly? If yes, we can merge the fix. If there >>>>>>>> isn't, >>>>>>>> for major functionality breaking change, we should just revert. That's >>>>>>>> fairly basic software engineering practices. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon <gurwls...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> With the change, the main entry points, Spark shalls, don't work >>>>>>>>> and developers cannot debug and test. The snapshots become uesless. >>>>>>>>> >>>>>>>>> The tests passed because you did not fix SBT. It needs a larger >>>>>>>>> change. >>>>>>>>> >>>>>>>>> Such change cannot be in the source. I can start a vote if you >>>>>>>>> think this is an issue. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 26, 2025 at 12:32 PM Rozov, Vlad >>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>> >>>>>>>>>> This does not make any sense. >>>>>>>>>> >>>>>>>>>> 1. There are no broken tests introduced by >>>>>>>>>> https://github.com/apache/spark/pull/49971 >>>>>>>>>> 2. There are no JIRA filed for “the main entry point” >>>>>>>>>> 3. “The main entry point” that does not have any unit test >>>>>>>>>> suggests that it is not the main entry point. >>>>>>>>>> 4. It is not practical to revert every commit that introduced >>>>>>>>>> some functional issue or regression. If regression can’t be fixed, >>>>>>>>>> it is >>>>>>>>>> reverted, if it can be reasonably fixed, the fix is applied to the >>>>>>>>>> regression. Just as an example, >>>>>>>>>> https://github.com/apache/spark/pull/42823 introduced a >>>>>>>>>> regression. It was fixed by >>>>>>>>>> https://github.com/apache/spark/pull/50348. Nobody asked for the >>>>>>>>>> revert. >>>>>>>>>> 5. The way how revert was handled is not "Apache way”. >>>>>>>>>> >>>>>>>>>> It may be easier for you to handle it in a single PR, but it is >>>>>>>>>> easier for me to provide PR on top of existing changes and it will >>>>>>>>>> also >>>>>>>>>> provide JIRA and a commit that documents why there is an issue. >>>>>>>>>> Otherwise >>>>>>>>>> it will be easier to introduce the error back again. >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> >>>>>>>>>> Vlad >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mar 25, 2025, at 4:58 PM, Hyukjin Kwon <gurwls...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Rozov, this broke the main entry points of release, Spark shells. >>>>>>>>>> Even in the mast branch, you build a Spark, and cannot use Spark >>>>>>>>>> shells. >>>>>>>>>> Why don't you submit a PR that contains the proper fix? It is >>>>>>>>>> easier to have one PR that has no issue, e.g., reverting backporting >>>>>>>>>> etc. >>>>>>>>>> >>>>>>>>>> On Wed, 26 Mar 2025 at 00:17, Rozov, Vlad >>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> I kind of understand why >>>>>>>>>>> https://github.com/apache/spark/pull/49971 was reverted on the >>>>>>>>>>> branch-4.0 to allow testing of 4.0 release. Why was it also >>>>>>>>>>> reverted on the >>>>>>>>>>> master branch? I don’t see any JIRA being open for the failure. >>>>>>>>>>> AFAIK, the >>>>>>>>>>> proper way to handle the issue in Apache project: >>>>>>>>>>> >>>>>>>>>>> - revert https://github.com/apache/spark/pull/49971 on the >>>>>>>>>>> branch-4.0 (note that PR targeted master branch and SPARK-51229 >>>>>>>>>>> targets 4.1 >>>>>>>>>>> as affected version) >>>>>>>>>>> - open JIRA and provide details of how to reproduce the error on >>>>>>>>>>> the master branch >>>>>>>>>>> - wait for the author of >>>>>>>>>>> https://github.com/apache/spark/pull/49971 (in this case it was >>>>>>>>>>> me) to reply in a timely manner and only then revert the commit >>>>>>>>>>> >>>>>>>>>>> If you agree with the approach, please undo the revert and file >>>>>>>>>>> JIRA. There is an easy fix to the issue. >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> >>>>>>>>>>> Vlad >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> >