MonkeyCanCode commented on PR #4535:
URL: https://github.com/apache/polaris/pull/4535#issuecomment-4554949316

   > Thanks for tackling the Spark 4.0 changes.
   > 
   > Most of the added files for Spark 4.0 are byte-identical to Spark 3.5 - 
both for the production and integration tests code. I think, the identical code 
should move to a common source directory and the files that differ might 
benefit from an abstract base class extended by a concrete Spark version 
specific implementation class.
   > 
   > The regtests infrastructure code, which is more than simple 
wrapper/invoker scripts, is also mostly duplicated with a few version 
differences.
   > 
   > Overall, this appears to increase the total CI duration by 10+ minutes. I 
think we should be cautious about increasing CI duration, because the resources 
(GH runners) are shared with all Apache projects using GH actions.
   > 
   > Generally, not new to this PR, the container image build alone takes 
between 4 and 7 minutes and is performed by many jobs: for every Python 
version, both regression tests jobs, although we already have a pre-built 
Polaris server image available (for example, the `docker-image-scan` job uses 
that already).
   
   Hello @snazy,
   
   Thanks for review. For sure there are more refactor which can be done, 
however, I am not sure if we should as Iceberg is doing the same for different 
spark versions as there are some different across versions (but as you called 
out, we can overwrite those). If we want to proceed with this route, I can work 
on the refactor more by moving common classes into the common module as what we 
did with the rest one.
   
   Regarding regression test, the change between 3.5 and 4.0 is minimal, so I 
am not too worried with refactor this one out into a common location.
   
   Lastly, what is the guideline around the extra CI runtime? Those regression 
themself took quite a bit of time to run.
   
   @dimas-b @snazy should we proceed with this refactor?
   
   Thanks,
   Yong Zheng


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to