I'd really prefer to get to the bottom of this issue and support --packages.
Using --jar is inconvenient in many cases as it requires manually changing the spark installation on disk. Cheers, Dmitri. On Fri, Jun 20, 2025 at 1:07 PM Yufei Gu <flyrain...@gmail.com> wrote: > A bit more context on what [1908] is trying to resolve: some Iceberg table > operations may fail when the `--packages` config was used to pull Polaris > Spark client. IIRC, the write to Iceberg table failed due to the jar > conflicts. The details is in the PR description: "the iceberg requires avro > 1.12.0, but the one provided by spark is 1.11.4." However, the `--jar` > option works well. It'd be nice to fix the `--package` option in 1.0. > However, I'm OK either way. Without [1908], we will need to clarify that > the `--packages` option isn't recommended. > > [1908] https://github.com/apache/polaris/pull/1908 > Yufei > > > On Fri, Jun 20, 2025 at 9:51 AM yun zou <yunzou.colost...@gmail.com> > wrote: > > > As for the following point > > I believe that regardless of the method of including the Client into > Spark > > runtime, the code has to be exactly the same.... and I doubt it is the > same > > now. WDYT? > > > > The code included in the jar for Spark Client is different now with the > > change, because it > > now uses a class in a different package, even though they do the same > > thing. However, > > I think it is a good change, it simplifies our dependency and avoids > > potential compatibility issue > > due to the shading of iceberg-spark-runtime. I definitely agree we should > > also include this also in 1.0. > > > > Best Regards, > > Yun > > > > On Fri, Jun 20, 2025 at 9:47 AM yun zou <yunzou.colost...@gmail.com> > > wrote: > > > > > > > > *-- What is the maven artifact that Spark can automatically pull > > > (via--packages)* > > > > > > Our spark client pulls the following: > > > > > > org.apache.polaris#polaris-spark-3.5_2.12 > > > > > > org.apache.polaris#polaris-core > > > > > > org.apache.polaris#polaris-api-management-model > > > > > > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12 > > > > > > > > > Prior to the change, it also pulled iceberg-core and avro 1.20.0. > > > > > > > > > *-- Does that artifact use shaded dependencies* > > > > > > Any usage of classes from iceberg-spark-runtime uses the shaded > libraries > > > shipped along with the artifacts. > > > > > > > > > > > > *-- Does that artifact depend on the Iceberg Spark bundle?* > > > > > > If you are referring to our spark client, it depends on > > iceberg-spark-runtime, > > > not other bundles. > > > > > > > > > > > > *-- Is the _code_ running in Spark the same when the Polaris Spark > Client > > > ispulled via --packages and via --jars?* > > > > > > > > > yes, the jar and package will use the same code, where the jar simply > > > packs everything > > > > > > for the user and there is no need to download any other dependency. > > > > > > > > > Best Regards, > > > > > > Yun > > > > > > > > > > > > On Fri, Jun 20, 2025 at 9:18 AM Dmitri Bourlatchkov <di...@apache.org> > > > wrote: > > > > > >> Some questions for clarification: > > >> > > >> * What is the maven artifact that Spark can automatically pull (via > > >> --packages)? > > >> * Does that artifact use shaded dependencies? > > >> * Does that artifact depend on the Iceberg Spark bundle? > > >> * Is the _code_ running in Spark the same when the Polaris Spark > Client > > is > > >> pulled via --packages and via --jars? > > >> > > >> I know I could have figured that out from code, but I'm asking here > > >> because > > >> I think we may need to review our approach to publishing these > > artifacts. > > >> > > >> I believe that regardless of the method of including the Client into > > Spark > > >> runtime, the code has to be exactly the same.... and I doubt it is the > > >> same > > >> now. WDYT? > > >> > > >> Thanks, > > >> Dmitri. > > >> > > >> > > >> On Fri, Jun 20, 2025 at 10:15 AM Dmitri Bourlatchkov < > di...@apache.org> > > >> wrote: > > >> > > >> > Hi All, > > >> > > > >> > Re: PR [1908] let's use this thread to clarify the problems we're > > trying > > >> > to solve and options for solutions. > > >> > > > >> > As for me, it looks like some refactoring in the way the Spark > Client > > is > > >> > built and published may be needed. > > >> > > > >> > I think it makes sense to clarify this before 1.0 to avoid changes > to > > >> > Maven coordinates right after 1.0 > > >> > > > >> > [1908] https://github.com/apache/polaris/pull/1908 > > >> > > > >> > Thanks, > > >> > Dmitri. > > >> > > > >> > > > >> > > > > > >