I'd really prefer to get to the bottom of this issue and support --packages.

Using --jar is inconvenient in many cases as it requires manually changing
the spark installation on disk.

Cheers,
Dmitri.

On Fri, Jun 20, 2025 at 1:07 PM Yufei Gu <flyrain...@gmail.com> wrote:

> A bit more context on what [1908] is trying to resolve: some Iceberg table
> operations may fail when the `--packages` config was used to pull Polaris
> Spark client. IIRC, the write to Iceberg table failed due to the jar
> conflicts. The details is in the PR description: "the iceberg requires avro
> 1.12.0, but the one provided by spark is 1.11.4." However, the `--jar`
> option works well. It'd be nice to fix the `--package` option in 1.0.
> However, I'm OK either way. Without [1908], we will need to clarify that
> the `--packages` option isn't recommended.
>
> [1908] https://github.com/apache/polaris/pull/1908
> Yufei
>
>
> On Fri, Jun 20, 2025 at 9:51 AM yun zou <yunzou.colost...@gmail.com>
> wrote:
>
> > As for the following point
> > I believe that regardless of the method of including the Client into
> Spark
> > runtime, the code has to be exactly the same.... and I doubt it is the
> same
> > now. WDYT?
> >
> > The code included in the jar for Spark Client is different now with the
> > change, because it
> > now uses a class in a different package, even though they do the same
> > thing. However,
> > I think it is a good change, it simplifies our dependency and avoids
> > potential compatibility issue
> > due to the shading of iceberg-spark-runtime. I definitely agree we should
> > also include this also in 1.0.
> >
> > Best Regards,
> > Yun
> >
> > On Fri, Jun 20, 2025 at 9:47 AM yun zou <yunzou.colost...@gmail.com>
> > wrote:
> >
> > >
> > > *-- What is the maven artifact that Spark can automatically pull
> > > (via--packages)*
> > >
> > > Our spark client pulls the following:
> > >
> > > org.apache.polaris#polaris-spark-3.5_2.12
> > >
> > > org.apache.polaris#polaris-core
> > >
> > > org.apache.polaris#polaris-api-management-model
> > >
> > > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12
> > >
> > >
> > > Prior to the change, it also pulled iceberg-core and avro 1.20.0.
> > >
> > >
> > > *-- Does that artifact use shaded dependencies*
> > >
> > > Any usage of classes from iceberg-spark-runtime uses the shaded
> libraries
> > > shipped along with the artifacts.
> > >
> > >
> > >
> > > *-- Does that artifact depend on the Iceberg Spark bundle?*
> > >
> > > If you are referring to our spark client, it depends on
> > iceberg-spark-runtime,
> > > not other bundles.
> > >
> > >
> > >
> > > *-- Is the _code_ running in Spark the same when the Polaris Spark
> Client
> > > ispulled via --packages and via --jars?*
> > >
> > >
> > > yes, the jar and package will use the same code, where the jar simply
> > > packs everything
> > >
> > > for the user and there is no need to download any other dependency.
> > >
> > >
> > > Best Regards,
> > >
> > > Yun
> > >
> > >
> > >
> > > On Fri, Jun 20, 2025 at 9:18 AM Dmitri Bourlatchkov <di...@apache.org>
> > > wrote:
> > >
> > >> Some questions for clarification:
> > >>
> > >> * What is the maven artifact that Spark can automatically pull (via
> > >> --packages)?
> > >> * Does that artifact use shaded dependencies?
> > >> * Does that artifact depend on the Iceberg Spark bundle?
> > >> * Is the _code_ running in Spark the same when the Polaris Spark
> Client
> > is
> > >> pulled via --packages and via --jars?
> > >>
> > >> I know I could have figured that out from code, but I'm asking here
> > >> because
> > >> I think we may need to review our approach to publishing these
> > artifacts.
> > >>
> > >> I believe that regardless of the method of including the Client into
> > Spark
> > >> runtime, the code has to be exactly the same.... and I doubt it is the
> > >> same
> > >> now. WDYT?
> > >>
> > >> Thanks,
> > >> Dmitri.
> > >>
> > >>
> > >> On Fri, Jun 20, 2025 at 10:15 AM Dmitri Bourlatchkov <
> di...@apache.org>
> > >> wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > Re: PR [1908] let's use this thread to clarify the problems we're
> > trying
> > >> > to solve and options for solutions.
> > >> >
> > >> > As for me, it looks like some refactoring in the way the Spark
> Client
> > is
> > >> > built and published may be needed.
> > >> >
> > >> > I think it makes sense to clarify this before 1.0 to avoid changes
> to
> > >> > Maven coordinates right after 1.0
> > >> >
> > >> > [1908] https://github.com/apache/polaris/pull/1908
> > >> >
> > >> > Thanks,
> > >> > Dmitri.
> > >> >
> > >> >
> > >>
> > >
> >
>

Reply via email to