As for the following point I believe that regardless of the method of including the Client into Spark runtime, the code has to be exactly the same.... and I doubt it is the same now. WDYT?
The code included in the jar for Spark Client is different now with the change, because it now uses a class in a different package, even though they do the same thing. However, I think it is a good change, it simplifies our dependency and avoids potential compatibility issue due to the shading of iceberg-spark-runtime. I definitely agree we should also include this also in 1.0. Best Regards, Yun On Fri, Jun 20, 2025 at 9:47 AM yun zou <yunzou.colost...@gmail.com> wrote: > > *-- What is the maven artifact that Spark can automatically pull > (via--packages)* > > Our spark client pulls the following: > > org.apache.polaris#polaris-spark-3.5_2.12 > > org.apache.polaris#polaris-core > > org.apache.polaris#polaris-api-management-model > > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12 > > > Prior to the change, it also pulled iceberg-core and avro 1.20.0. > > > *-- Does that artifact use shaded dependencies* > > Any usage of classes from iceberg-spark-runtime uses the shaded libraries > shipped along with the artifacts. > > > > *-- Does that artifact depend on the Iceberg Spark bundle?* > > If you are referring to our spark client, it depends on iceberg-spark-runtime, > not other bundles. > > > > *-- Is the _code_ running in Spark the same when the Polaris Spark Client > ispulled via --packages and via --jars?* > > > yes, the jar and package will use the same code, where the jar simply > packs everything > > for the user and there is no need to download any other dependency. > > > Best Regards, > > Yun > > > > On Fri, Jun 20, 2025 at 9:18 AM Dmitri Bourlatchkov <di...@apache.org> > wrote: > >> Some questions for clarification: >> >> * What is the maven artifact that Spark can automatically pull (via >> --packages)? >> * Does that artifact use shaded dependencies? >> * Does that artifact depend on the Iceberg Spark bundle? >> * Is the _code_ running in Spark the same when the Polaris Spark Client is >> pulled via --packages and via --jars? >> >> I know I could have figured that out from code, but I'm asking here >> because >> I think we may need to review our approach to publishing these artifacts. >> >> I believe that regardless of the method of including the Client into Spark >> runtime, the code has to be exactly the same.... and I doubt it is the >> same >> now. WDYT? >> >> Thanks, >> Dmitri. >> >> >> On Fri, Jun 20, 2025 at 10:15 AM Dmitri Bourlatchkov <di...@apache.org> >> wrote: >> >> > Hi All, >> > >> > Re: PR [1908] let's use this thread to clarify the problems we're trying >> > to solve and options for solutions. >> > >> > As for me, it looks like some refactoring in the way the Spark Client is >> > built and published may be needed. >> > >> > I think it makes sense to clarify this before 1.0 to avoid changes to >> > Maven coordinates right after 1.0 >> > >> > [1908] https://github.com/apache/polaris/pull/1908 >> > >> > Thanks, >> > Dmitri. >> > >> > >> >