MonkeyCanCode commented on PR #4535: URL: https://github.com/apache/polaris/pull/4535#issuecomment-4554949316
> Thanks for tackling the Spark 4.0 changes. > > Most of the added files for Spark 4.0 are byte-identical to Spark 3.5 - both for the production and integration tests code. I think, the identical code should move to a common source directory and the files that differ might benefit from an abstract base class extended by a concrete Spark version specific implementation class. > > The regtests infrastructure code, which is more than simple wrapper/invoker scripts, is also mostly duplicated with a few version differences. > > Overall, this appears to increase the total CI duration by 10+ minutes. I think we should be cautious about increasing CI duration, because the resources (GH runners) are shared with all Apache projects using GH actions. > > Generally, not new to this PR, the container image build alone takes between 4 and 7 minutes and is performed by many jobs: for every Python version, both regression tests jobs, although we already have a pre-built Polaris server image available (for example, the `docker-image-scan` job uses that already). Hello @snazy, Thanks for review. For sure there are more refactor which can be done, however, I am not sure if we should as Iceberg is doing the same for different spark versions as there are some different across versions (but as you called out, we can overwrite those). If we want to proceed with this route, I can work on the refactor more by moving common classes into the common module as what we did with the rest one. Regarding regression test, the change between 3.5 and 4.0 is minimal, so I am not too worried with refactor this one out into a common location. Lastly, what is the guideline around the extra CI runtime? Those regression themself took quite a bit of time to run. @dimas-b @snazy should we proceed with this refactor? Thanks, Yong Zheng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
