Hi Yun,

I'm thinking about this approach:

* We compile against unallocated jars.
* We relocate Jackson the same way as Iceberg in our build.
* We publish the relocated Spark Client jar (thin), without any bundled
classes.
* The relocated Spark Client jar depends on Iceberg Spark runtime.
* The relocated Spark Client jar does _not_ have a classifier.

The exact version of Jackson in our build and Iceberg build may differer,
but Jackson is pretty good at keeping backward compatibility, so this
should not be a problem.

WDYT?

Thanks,
Dmitri.

On Fri, Jun 20, 2025 at 8:01 PM yun zou <yunzou.colost...@gmail.com> wrote:

> Hi Dmitri,
>
> Thanks a lot for the information! So it seems after my previous PR [1857]
> that reuses the current shadowJar
> publish, it just publishes the shadow jar, which is included in the module
> files.
>
> It turns out that the POM file we generated have following like once shadow
> plugins is used
>
> <!-- do_not_remove: published-with-gradle-metadata -->
>
>
> which indicates prefer to resolve from .module file, which directly
> includes our bundle jar.
>
>
> This should also work, but may not be the exact behavior we want to be. To
> be safe I
>
> can put on a different PR to revert the previous PR 1857 and see if there
> is a better way
>
> to reuse the shadowJar plugin later.
>
>
> WDYT?
>
>
> Best Regards,
>
> Yun
>
> On Fri, Jun 20, 2025 at 3:56 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
> > Hi Yun,
> >
> > I do not see a non-bundle jar published to my local Maven
> > repo
> >
> .m2/repository/org/apache/polaris/polaris-spark-3.5_2.12/1.1.0-incubating-SNAPSHOT
> >
> > maven-metadata-local.xml
> > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
> > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.jar
> > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.module
> > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.pom
> > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-sources.jar
> >
> > ... but Spark still works with --packages
> > org.apache.polaris:polaris-spark-3.5_2.12:1.1.0-incubating-SNAPSHOT
> >
> > Cheers,
> > Dmitri.
> >
> > On Fri, Jun 20, 2025 at 6:42 PM yun zou <yunzou.colost...@gmail.com>
> > wrote:
> >
> > > Hi Dmitri,
> > >
> > > I think there might be a misunderstanding about how jars and packages
> are
> > > published, the shadowJar
> > > job is used to publish the bundle jar for the jar use cases, where all
> > > dependency are packed and users uses
> > > with spark like following:
> > > --jar  polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
> > >
> > > This is different from when user uses --packages, it uses the regular
> > > project jar without classifier, and all dependency
> > > are resolved and downloaded on installation time. Once formally
> released,
> > > spark users can use it like following:
> > > --package org.apache.polaris:polaris-spark-3.5_2.12:1.1.0
> > >
> > > Note that the regular project jar can not be directly used as --jar
> > without
> > > manually adding other dependency jars, because
> > > it doesn't pack other necessary dependencies. That is why we are
> pushing
> > > the bundle jar also, which is used to help
> > > the direct jar use cases.
> > >
> > > You might be confused by my previous PR
> > > <https://github.com/apache/polaris/pull/1857> where I thought I needed
> > to
> > > remove the classifier to make the package use case
> > > work, i believe I later clarified that it was a false alarm, where we
> do
> > > not need the bundle jar for the `--package` use case.
> > >
> > > I have manually verified both use cases, and we have test automation
> for
> > > the jar use case, and I have followed up to investigate
> > > how to add a regression test for the package use case also.
> > >
> > > Best Regards,
> > > Yun
> > >
> > >
> > > On Fri, Jun 20, 2025 at 3:23 PM Dmitri Bourlatchkov <di...@apache.org>
> > > wrote:
> > >
> > > > Hi Yun,
> > > >
> > > > Re: --packages, what I meant to say is that even with PR 1908, the
> > > > published version has the "bundle" classifier.
> > > > <metadata modelVersion="1.1.0">
> > > >   <groupId>org.apache.polaris</groupId>
> > > >   <artifactId>polaris-spark-3.5_2.12</artifactId>
> > > >   <versioning>
> > > >     <lastUpdated>20250620185923</lastUpdated>
> > > >     <snapshot>
> > > >       <localCopy>true</localCopy>
> > > >     </snapshot>
> > > >     <snapshotVersions>
> > > >       <snapshotVersion>
> > > >         <classifier>bundle</classifier>
> > > >         <extension>jar</extension>
> > > >         <value>1.1.0-incubating-SNAPSHOT</value>
> > > >         <updated>20250620185923</updated>
> > > >       </snapshotVersion>
> > > >
> > > > I manually tested with Spark locally and it seems to work. However,
> I
> > > > thought that caused issues before. WDYT?
> > > >
> > > > Re: compiling against shaded packages - I still believe that it is
> not
> > > nice
> > > > from the maintenance POV. Yet, I do not insist on reworking this.
> > > >
> > > > Cheers,
> > > > Dmitri.
> > > >
> > > >
> > > > On Fri, Jun 20, 2025 at 5:09 PM yun zou <yunzou.colost...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Dmitri,
> > > > >
> > > > > Regarding to this question:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *Current docs [1] suggest using
> > > > > `--packagesorg.apache.polaris:polaris-spark-3.5_2.12:1.0.0` but PR
> > 1908
> > > > >
> produces`polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar`
> > > > > (note:bundle, disregard version).*
> > > > >
> > > > > The version number used in the bundle jar is produced with the
> > version
> > > > > number in the
> > > > > current version file in the repo, therefore the one you see is
> > > > > xxx-incubating-SNAPSHOT-bundle.jar.
> > > > > Furthermore, the bundle jar is published for the jar use case, not
> > for
> > > > the
> > > > > package use case. There are
> > > > > two ways to use the Spark Client with Spark:
> > > > > 1) use --packages, where the dependencies are downloaded
> > automatically
> > > > > 2) use --jar, the bundle jar will contain everything user needed
> > > without
> > > > > doing extra dependency download
> > > > >
> > > > > When the user uses packages, it is using the package we formally
> > > publish
> > > > to
> > > > > maven, which I
> > > > > believe will not have "incubating-SNAPSHOT" in the version anymore,
> > so
> > > > > 1.0.0 will be the right version for
> > > > > actual use when we release 1.0.0. Furthermore, what we give in the
> > doc
> > > is
> > > > > always just an example, where we phase it like
> > > > > "
> > > > > Assume the released Polaris Spark client you want to use is
> > > > > `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0`
> > > > > "
> > > > > So it is up to the user to pick up the version they want to use
> among
> > > the
> > > > > published versions, which will only be
> > > > > 1.0.0 now, but later we might publish 1.1.0, 1,2,0 etc.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *Instead of compiling against relocated classes, why don't we
> > > > > compileagainst the original Jackson jar, and later relocate the
> Spark
> > > > > Client to"org.apache.iceberg.shaded.com.fasterxml.jackson.*" ?*
> > > > >
> > > > > Regarding to this, i think it is correct for the Spark Client to
> use
> > > > shaded
> > > > > jar in iceberg spark client, because our Spark Client
> > > > > is suppose to be fully depend and compatible with the
> > > > > iceberg-spark-runtime, where we intended to use all libraries
> > directly
> > > > > shipped from iceberg-spark-runtime to avoid any potential
> > > > compatibilities,
> > > > > includes RESTClient, Iceberg RestRequest etc.
> > > > > If we are using our own jackson library and relocate it to
> > > > > org.apache.iceberg, first of all, i don't know if it will work or
> > not,
> > > > > other
> > > > > than this, it also potentially end with two different jackson
> > version,
> > > > > which might potentially introduce compatibility issues,
> > > > > especially we use the RESTClient shipped along with the
> > > > > iceberg-spark-runtime. Furthermore, it is very confusing that
> > > > > we are relocating it to namespace org.apache.iceberg*, to me, that
> is
> > > > even
> > > > > worse than skipping the shaded check.
> > > > > In my point of view, it is correct for the spark client to use the
> > > shaded
> > > > > library from iceberg-spark-client, we should not be so
> > > > > concerned about skipping the import check for the spark client
> > project
> > > as
> > > > > far as we are clear about the goal we are trying to achieve.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Best Regards,
> > > > > Yun
> > > > >
> > > > >
> > > > > On Fri, Jun 20, 2025 at 12:58 PM Yufei Gu <flyrain...@gmail.com>
> > > wrote:
> > > > >
> > > > > > It's simpler to maintain one version for the same dependency
> > instead
> > > of
> > > > > > two. There is no confusion for developers -- I can foresee anyone
> > > > looking
> > > > > > at the build script will ask what the Jackson Spark client
> > eventually
> > > > > > shipped. Upgrading the version is straightforward. But I'd like
> to
> > > know
> > > > > > more details why compiling against a shaded package is preferable
> > > here.
> > > > > > Would you mind providing these details?
> > > > > >
> > > > > > Yufei
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 20, 2025 at 12:32 PM Dmitri Bourlatchkov <
> > > di...@apache.org
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > In any case, IMHO, even updating jackson version numbers in two
> > > > places
> > > > > is
> > > > > > > preferable to compiling against shaded packages.
> > > > > > >
> > > > > > > On Fri, Jun 20, 2025 at 3:25 PM Dmitri Bourlatchkov <
> > > > di...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I suppose we should be able to get the version of Jackson
> used
> > by
> > > > > > Iceberg
> > > > > > > > from Iceberg POM information, right?
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Dmitri.
> > > > > > > >
> > > > > > > > On Fri, Jun 20, 2025 at 3:08 PM Yufei Gu <
> flyrain...@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > >> That's an interesting idea. But it requires us to maintain
> the
> > > > > > > consistency
> > > > > > > >> of the Jackson version in two places instead of one. The
> > > original
> > > > > > > Jackson
> > > > > > > >> version has to match with the one shaded in Iceberg spark
> > > runtime.
> > > > > > Every
> > > > > > > >> time we update one, we have to remember to update another.
> I'm
> > > not
> > > > > > sure
> > > > > > > if
> > > > > > > >> it improves the situation.
> > > > > > > >>
> > > > > > > >> Yufei
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Fri, Jun 20, 2025 at 11:43 AM Dmitri Bourlatchkov <
> > > > > > di...@apache.org>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Yun and Yufei,
> > > > > > > >> >
> > > > > > > >> > > Specifically, why does CreateGenericTableRESTRequest use
> > the
> > > > > > shaded
> > > > > > > >> > Jackson?
> > > > > > > >> >
> > > > > > > >> > As discussed off list, request / response payload classes
> > have
> > > > to
> > > > > > work
> > > > > > > >> with
> > > > > > > >> > the version of Jackson included with the Iceberg Spark
> jars
> > > > > (because
> > > > > > > >> they
> > > > > > > >> > own the RESTClient).
> > > > > > > >> >
> > > > > > > >> > That in itself is fine.
> > > > > > > >> >
> > > > > > > >> > I'd like to propose a different approach to implementing
> > that
> > > in
> > > > > > > >> Polaris,
> > > > > > > >> > though.
> > > > > > > >> >
> > > > > > > >> > Instead of compiling against relocated classes, why don't
> we
> > > > > compile
> > > > > > > >> > against the original Jackson jar, and later relocate the
> > Spark
> > > > > > Client
> > > > > > > to
> > > > > > > >> > "org.apache.iceberg.shaded.com.fasterxml.jackson.*" ?
> > > > > > > >> >
> > > > > > > >> > I believe Jackson is the only relocation concern.
> > > > > > > >> >
> > > > > > > >> > After relocation we can publish both the "thin" client for
> > use
> > > > > with
> > > > > > > >> > --package in Spark, and the "fat" jar for use with --jar.
> > Both
> > > > > > > artifacts
> > > > > > > >> > will depend on the relocated Iceberg artifacts.
> > > > > > > >> >
> > > > > > > >> > WDYT?
> > > > > > > >> >
> > > > > > > >> > Cheers,
> > > > > > > >> > Dmitri.
> > > > > > > >> >
> > > > > > > >> > On Fri, Jun 20, 2025 at 1:05 PM Dmitri Bourlatchkov <
> > > > > > di...@apache.org
> > > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Thanks for the quick response, Yun!
> > > > > > > >> > >
> > > > > > > >> > > > org.apache.polaris#polaris-core
> > > > > > > >> > > > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12
> > > > > > > >> > >
> > > > > > > >> > > IIRC, polaris-core uses Jackson. iceberg-spark-runtime
> > also
> > > > uses
> > > > > > > >> Jackson,
> > > > > > > >> > > but it shades it.
> > > > > > > >> > >
> > > > > > > >> > > I believe I saw issues with using both shaded and
> > non-shaded
> > > > > > Jackson
> > > > > > > >> in
> > > > > > > >> > > the same Spark env. with Iceberg.
> > > > > > > >> > >
> > > > > > > >> > > This may or may not be a concern for our Spark Client.
> > What
> > > I
> > > > > mean
> > > > > > > is
> > > > > > > >> > that
> > > > > > > >> > > it may need some more consideration to be sure.
> > > > > > > >> > >
> > > > > > > >> > > Specifically, why does CreateGenericTableRESTRequest use
> > the
> > > > > > shaded
> > > > > > > >> > > Jackson?
> > > > > > > >> > >
> > > > > > > >> > > WDYT?
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Dmitri.
> > > > > > > >> > >
> > > > > > > >> > > On Fri, Jun 20, 2025 at 12:47 PM yun zou <
> > > > > > > yunzou.colost...@gmail.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > >> *-- What is the maven artifact that Spark can
> > automatically
> > > > > pull
> > > > > > > >> > >> (via--packages)*
> > > > > > > >> > >>
> > > > > > > >> > >> Our spark client pulls the following:
> > > > > > > >> > >>
> > > > > > > >> > >> org.apache.polaris#polaris-spark-3.5_2.12
> > > > > > > >> > >>
> > > > > > > >> > >> org.apache.polaris#polaris-core
> > > > > > > >> > >>
> > > > > > > >> > >> org.apache.polaris#polaris-api-management-model
> > > > > > > >> > >>
> > > > > > > >> > >> org.apache.iceberg#iceberg-spark-runtime-3.5_2.12
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> Prior to the change, it also pulled iceberg-core and
> avro
> > > > > 1.20.0.
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> *-- Does that artifact use shaded dependencies*
> > > > > > > >> > >>
> > > > > > > >> > >> Any usage of classes from iceberg-spark-runtime uses
> the
> > > > shaded
> > > > > > > >> > libraries
> > > > > > > >> > >> shipped along with the artifacts.
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> *-- Does that artifact depend on the Iceberg Spark
> > bundle?*
> > > > > > > >> > >>
> > > > > > > >> > >> If you are referring to our spark client, it depends on
> > > > > > > >> > >> iceberg-spark-runtime,
> > > > > > > >> > >> not other bundles.
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> *-- Is the _code_ running in Spark the same when the
> > > Polaris
> > > > > > Spark
> > > > > > > >> > Client
> > > > > > > >> > >> ispulled via --packages and via --jars?*
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> yes, the jar and package will use the same code, where
> > the
> > > > jar
> > > > > > > simply
> > > > > > > >> > >> packs
> > > > > > > >> > >> everything
> > > > > > > >> > >>
> > > > > > > >> > >> for the user and there is no need to download any other
> > > > > > dependency.
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> Best Regards,
> > > > > > > >> > >>
> > > > > > > >> > >> Yun
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> On Fri, Jun 20, 2025 at 9:18 AM Dmitri Bourlatchkov <
> > > > > > > >> di...@apache.org>
> > > > > > > >> > >> wrote:
> > > > > > > >> > >>
> > > > > > > >> > >> > Some questions for clarification:
> > > > > > > >> > >> >
> > > > > > > >> > >> > * What is the maven artifact that Spark can
> > automatically
> > > > > pull
> > > > > > > (via
> > > > > > > >> > >> > --packages)?
> > > > > > > >> > >> > * Does that artifact use shaded dependencies?
> > > > > > > >> > >> > * Does that artifact depend on the Iceberg Spark
> > bundle?
> > > > > > > >> > >> > * Is the _code_ running in Spark the same when the
> > > Polaris
> > > > > > Spark
> > > > > > > >> > Client
> > > > > > > >> > >> is
> > > > > > > >> > >> > pulled via --packages and via --jars?
> > > > > > > >> > >> >
> > > > > > > >> > >> > I know I could have figured that out from code, but
> I'm
> > > > > asking
> > > > > > > here
> > > > > > > >> > >> because
> > > > > > > >> > >> > I think we may need to review our approach to
> > publishing
> > > > > these
> > > > > > > >> > >> artifacts.
> > > > > > > >> > >> >
> > > > > > > >> > >> > I believe that regardless of the method of including
> > the
> > > > > Client
> > > > > > > >> into
> > > > > > > >> > >> Spark
> > > > > > > >> > >> > runtime, the code has to be exactly the same.... and
> I
> > > > doubt
> > > > > it
> > > > > > > is
> > > > > > > >> the
> > > > > > > >> > >> same
> > > > > > > >> > >> > now. WDYT?
> > > > > > > >> > >> >
> > > > > > > >> > >> > Thanks,
> > > > > > > >> > >> > Dmitri.
> > > > > > > >> > >> >
> > > > > > > >> > >> >
> > > > > > > >> > >> > On Fri, Jun 20, 2025 at 10:15 AM Dmitri Bourlatchkov
> <
> > > > > > > >> > di...@apache.org>
> > > > > > > >> > >> > wrote:
> > > > > > > >> > >> >
> > > > > > > >> > >> > > Hi All,
> > > > > > > >> > >> > >
> > > > > > > >> > >> > > Re: PR [1908] let's use this thread to clarify the
> > > > problems
> > > > > > > we're
> > > > > > > >> > >> trying
> > > > > > > >> > >> > > to solve and options for solutions.
> > > > > > > >> > >> > >
> > > > > > > >> > >> > > As for me, it looks like some refactoring in the
> way
> > > the
> > > > > > Spark
> > > > > > > >> > Client
> > > > > > > >> > >> is
> > > > > > > >> > >> > > built and published may be needed.
> > > > > > > >> > >> > >
> > > > > > > >> > >> > > I think it makes sense to clarify this before 1.0
> to
> > > > avoid
> > > > > > > >> changes
> > > > > > > >> > to
> > > > > > > >> > >> > > Maven coordinates right after 1.0
> > > > > > > >> > >> > >
> > > > > > > >> > >> > > [1908] https://github.com/apache/polaris/pull/1908
> > > > > > > >> > >> > >
> > > > > > > >> > >> > > Thanks,
> > > > > > > >> > >> > > Dmitri.
> > > > > > > >> > >> > >
> > > > > > > >> > >> > >
> > > > > > > >> > >> >
> > > > > > > >> > >>
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to