I think Spark should start shading it’s problematic deps similar to how it’s done in Flink
On Mon, 4 Dec 2023 at 2:57 Sean Owen <sro...@gmail.com> wrote: > I am not sure we can control that - the Scala _x.y suffix has particular > meaning in the Scala ecosystem for artifacts and thus the naming of .jar > files. And we need to work with the Scala ecosystem. > > What can't handle these files, Spring Boot? does it somehow assume the > .jar file name relates to Java modules? > > By the by, Spark 4 is already moving to the jakarta.* packages for similar > reasons. > > I don't think Spark does or can really leverage Java modules. It started > waaay before that and expect that it has some structural issues that are > incompatible with Java modules, like multiple places declaring code in the > same Java package. > > As in all things, if there's a change that doesn't harm anything else and > helps support for Java modules, sure, suggest it. If it has the conflicts I > think it will, probably not possible and not really a goal I think. > > > On Sun, Dec 3, 2023 at 11:30 AM Marc Le Bihan <mlebiha...@gmail.com> > wrote: > >> Hello, >> >> Last month, I've attempted the experience of upgrading my Spring-Boot >> 2 Java project, that relies heavily on Spark 3.4.2, to Spring-Boot 3. It >> didn't succeed yet, but was informative. >> >> Spring-Boot 2 → 3 means especially javax.* becoming jakarka.* : >> javax.activation, javax.ws.rs, javax.persistence, javax.validation, >> javax.servlet... all of these have to change their packages and >> dependencies. >> Apart of that, they were some trouble with ANTLR 4 against ANTLR 3, >> and few things with SFL4 and Log4J. >> >> It was not easy, and I guessed that going into modules could be a >> key. But when I'm near the Spark submodules of my project, it fail with >> messages such as: >> package org.apache.spark.sql.types is declared in the unnamed >> module, but module fr.ecoemploi.outbound.spark.core does not read it >> >> But I can't handle the spark dependencies easily, because they have >> an "invalid name" for Java. It's a matter that it doesn't want the "_" that >> is in the "_2.13" suffix of the jars. >> [WARNING] Can't extract module name from >> breeze-macros_2.13-2.1.0.jar: breeze.macros.2.13: Invalid module name: '2' >> is not a Java identifier >> [WARNING] Can't extract module name from >> spark-tags_2.13-3.4.2.jar: spark.tags.2.13: Invalid module name: '2' is not >> a Java identifier >> [WARNING] Can't extract module name from >> spark-unsafe_2.13-3.4.2.jar: spark.unsafe.2.13: Invalid module name: '2' is >> not a Java identifier >> [WARNING] Can't extract module name from >> spark-mllib_2.13-3.4.2.jar: spark.mllib.2.13: Invalid module name: '2' is >> not a Java identifier >> [... around 30 ...] >> >> I think that changing the naming pattern of the Spark jars for the >> 4.x could be a good idea, >> but beyond that, what about attempting to integrate Spark into >> modules, it's submodules defining module-info.java? >> >> Is it something that you think that [must | should | might | should >> not | must not] be done? >> >> Regards, >> >> Marc Le Bihan >> >