A bit more context on what [1908] is trying to resolve: some Iceberg table operations may fail when the `--packages` config was used to pull Polaris Spark client. IIRC, the write to Iceberg table failed due to the jar conflicts. The details is in the PR description: "the iceberg requires avro 1.12.0, but the one provided by spark is 1.11.4." However, the `--jar` option works well. It'd be nice to fix the `--package` option in 1.0. However, I'm OK either way. Without [1908], we will need to clarify that the `--packages` option isn't recommended.
[1908] https://github.com/apache/polaris/pull/1908 Yufei On Fri, Jun 20, 2025 at 9:51 AM yun zou <yunzou.colost...@gmail.com> wrote: > As for the following point > I believe that regardless of the method of including the Client into Spark > runtime, the code has to be exactly the same.... and I doubt it is the same > now. WDYT? > > The code included in the jar for Spark Client is different now with the > change, because it > now uses a class in a different package, even though they do the same > thing. However, > I think it is a good change, it simplifies our dependency and avoids > potential compatibility issue > due to the shading of iceberg-spark-runtime. I definitely agree we should > also include this also in 1.0. > > Best Regards, > Yun > > On Fri, Jun 20, 2025 at 9:47 AM yun zou <yunzou.colost...@gmail.com> > wrote: > > > > > *-- What is the maven artifact that Spark can automatically pull > > (via--packages)* > > > > Our spark client pulls the following: > > > > org.apache.polaris#polaris-spark-3.5_2.12 > > > > org.apache.polaris#polaris-core > > > > org.apache.polaris#polaris-api-management-model > > > > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12 > > > > > > Prior to the change, it also pulled iceberg-core and avro 1.20.0. > > > > > > *-- Does that artifact use shaded dependencies* > > > > Any usage of classes from iceberg-spark-runtime uses the shaded libraries > > shipped along with the artifacts. > > > > > > > > *-- Does that artifact depend on the Iceberg Spark bundle?* > > > > If you are referring to our spark client, it depends on > iceberg-spark-runtime, > > not other bundles. > > > > > > > > *-- Is the _code_ running in Spark the same when the Polaris Spark Client > > ispulled via --packages and via --jars?* > > > > > > yes, the jar and package will use the same code, where the jar simply > > packs everything > > > > for the user and there is no need to download any other dependency. > > > > > > Best Regards, > > > > Yun > > > > > > > > On Fri, Jun 20, 2025 at 9:18 AM Dmitri Bourlatchkov <di...@apache.org> > > wrote: > > > >> Some questions for clarification: > >> > >> * What is the maven artifact that Spark can automatically pull (via > >> --packages)? > >> * Does that artifact use shaded dependencies? > >> * Does that artifact depend on the Iceberg Spark bundle? > >> * Is the _code_ running in Spark the same when the Polaris Spark Client > is > >> pulled via --packages and via --jars? > >> > >> I know I could have figured that out from code, but I'm asking here > >> because > >> I think we may need to review our approach to publishing these > artifacts. > >> > >> I believe that regardless of the method of including the Client into > Spark > >> runtime, the code has to be exactly the same.... and I doubt it is the > >> same > >> now. WDYT? > >> > >> Thanks, > >> Dmitri. > >> > >> > >> On Fri, Jun 20, 2025 at 10:15 AM Dmitri Bourlatchkov <di...@apache.org> > >> wrote: > >> > >> > Hi All, > >> > > >> > Re: PR [1908] let's use this thread to clarify the problems we're > trying > >> > to solve and options for solutions. > >> > > >> > As for me, it looks like some refactoring in the way the Spark Client > is > >> > built and published may be needed. > >> > > >> > I think it makes sense to clarify this before 1.0 to avoid changes to > >> > Maven coordinates right after 1.0 > >> > > >> > [1908] https://github.com/apache/polaris/pull/1908 > >> > > >> > Thanks, > >> > Dmitri. > >> > > >> > > >> > > >