A bit more context on what [1908] is trying to resolve: some Iceberg table
operations may fail when the `--packages` config was used to pull Polaris
Spark client. IIRC, the write to Iceberg table failed due to the jar
conflicts. The details is in the PR description: "the iceberg requires avro
1.12.0, but the one provided by spark is 1.11.4." However, the `--jar`
option works well. It'd be nice to fix the `--package` option in 1.0.
However, I'm OK either way. Without [1908], we will need to clarify that
the `--packages` option isn't recommended.

[1908] https://github.com/apache/polaris/pull/1908
Yufei


On Fri, Jun 20, 2025 at 9:51 AM yun zou <yunzou.colost...@gmail.com> wrote:

> As for the following point
> I believe that regardless of the method of including the Client into Spark
> runtime, the code has to be exactly the same.... and I doubt it is the same
> now. WDYT?
>
> The code included in the jar for Spark Client is different now with the
> change, because it
> now uses a class in a different package, even though they do the same
> thing. However,
> I think it is a good change, it simplifies our dependency and avoids
> potential compatibility issue
> due to the shading of iceberg-spark-runtime. I definitely agree we should
> also include this also in 1.0.
>
> Best Regards,
> Yun
>
> On Fri, Jun 20, 2025 at 9:47 AM yun zou <yunzou.colost...@gmail.com>
> wrote:
>
> >
> > *-- What is the maven artifact that Spark can automatically pull
> > (via--packages)*
> >
> > Our spark client pulls the following:
> >
> > org.apache.polaris#polaris-spark-3.5_2.12
> >
> > org.apache.polaris#polaris-core
> >
> > org.apache.polaris#polaris-api-management-model
> >
> > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12
> >
> >
> > Prior to the change, it also pulled iceberg-core and avro 1.20.0.
> >
> >
> > *-- Does that artifact use shaded dependencies*
> >
> > Any usage of classes from iceberg-spark-runtime uses the shaded libraries
> > shipped along with the artifacts.
> >
> >
> >
> > *-- Does that artifact depend on the Iceberg Spark bundle?*
> >
> > If you are referring to our spark client, it depends on
> iceberg-spark-runtime,
> > not other bundles.
> >
> >
> >
> > *-- Is the _code_ running in Spark the same when the Polaris Spark Client
> > ispulled via --packages and via --jars?*
> >
> >
> > yes, the jar and package will use the same code, where the jar simply
> > packs everything
> >
> > for the user and there is no need to download any other dependency.
> >
> >
> > Best Regards,
> >
> > Yun
> >
> >
> >
> > On Fri, Jun 20, 2025 at 9:18 AM Dmitri Bourlatchkov <di...@apache.org>
> > wrote:
> >
> >> Some questions for clarification:
> >>
> >> * What is the maven artifact that Spark can automatically pull (via
> >> --packages)?
> >> * Does that artifact use shaded dependencies?
> >> * Does that artifact depend on the Iceberg Spark bundle?
> >> * Is the _code_ running in Spark the same when the Polaris Spark Client
> is
> >> pulled via --packages and via --jars?
> >>
> >> I know I could have figured that out from code, but I'm asking here
> >> because
> >> I think we may need to review our approach to publishing these
> artifacts.
> >>
> >> I believe that regardless of the method of including the Client into
> Spark
> >> runtime, the code has to be exactly the same.... and I doubt it is the
> >> same
> >> now. WDYT?
> >>
> >> Thanks,
> >> Dmitri.
> >>
> >>
> >> On Fri, Jun 20, 2025 at 10:15 AM Dmitri Bourlatchkov <di...@apache.org>
> >> wrote:
> >>
> >> > Hi All,
> >> >
> >> > Re: PR [1908] let's use this thread to clarify the problems we're
> trying
> >> > to solve and options for solutions.
> >> >
> >> > As for me, it looks like some refactoring in the way the Spark Client
> is
> >> > built and published may be needed.
> >> >
> >> > I think it makes sense to clarify this before 1.0 to avoid changes to
> >> > Maven coordinates right after 1.0
> >> >
> >> > [1908] https://github.com/apache/polaris/pull/1908
> >> >
> >> > Thanks,
> >> > Dmitri.
> >> >
> >> >
> >>
> >
>

Reply via email to