Hi Dmitri,

Thanks a lot for the information! So it seems after my previous PR [1857]
that reuses the current shadowJar
publish, it just publishes the shadow jar, which is included in the module
files.

It turns out that the POM file we generated have following like once shadow
plugins is used

<!-- do_not_remove: published-with-gradle-metadata -->


which indicates prefer to resolve from .module file, which directly
includes our bundle jar.


This should also work, but may not be the exact behavior we want to be. To
be safe I

can put on a different PR to revert the previous PR 1857 and see if there
is a better way

to reuse the shadowJar plugin later.


WDYT?


Best Regards,

Yun

On Fri, Jun 20, 2025 at 3:56 PM Dmitri Bourlatchkov <di...@apache.org>
wrote:

> Hi Yun,
>
> I do not see a non-bundle jar published to my local Maven
> repo
> .m2/repository/org/apache/polaris/polaris-spark-3.5_2.12/1.1.0-incubating-SNAPSHOT
>
> maven-metadata-local.xml
> polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
> polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.jar
> polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.module
> polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.pom
> polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-sources.jar
>
> ... but Spark still works with --packages
> org.apache.polaris:polaris-spark-3.5_2.12:1.1.0-incubating-SNAPSHOT
>
> Cheers,
> Dmitri.
>
> On Fri, Jun 20, 2025 at 6:42 PM yun zou <yunzou.colost...@gmail.com>
> wrote:
>
> > Hi Dmitri,
> >
> > I think there might be a misunderstanding about how jars and packages are
> > published, the shadowJar
> > job is used to publish the bundle jar for the jar use cases, where all
> > dependency are packed and users uses
> > with spark like following:
> > --jar  polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
> >
> > This is different from when user uses --packages, it uses the regular
> > project jar without classifier, and all dependency
> > are resolved and downloaded on installation time. Once formally released,
> > spark users can use it like following:
> > --package org.apache.polaris:polaris-spark-3.5_2.12:1.1.0
> >
> > Note that the regular project jar can not be directly used as --jar
> without
> > manually adding other dependency jars, because
> > it doesn't pack other necessary dependencies. That is why we are pushing
> > the bundle jar also, which is used to help
> > the direct jar use cases.
> >
> > You might be confused by my previous PR
> > <https://github.com/apache/polaris/pull/1857> where I thought I needed
> to
> > remove the classifier to make the package use case
> > work, i believe I later clarified that it was a false alarm, where we do
> > not need the bundle jar for the `--package` use case.
> >
> > I have manually verified both use cases, and we have test automation for
> > the jar use case, and I have followed up to investigate
> > how to add a regression test for the package use case also.
> >
> > Best Regards,
> > Yun
> >
> >
> > On Fri, Jun 20, 2025 at 3:23 PM Dmitri Bourlatchkov <di...@apache.org>
> > wrote:
> >
> > > Hi Yun,
> > >
> > > Re: --packages, what I meant to say is that even with PR 1908, the
> > > published version has the "bundle" classifier.
> > > <metadata modelVersion="1.1.0">
> > >   <groupId>org.apache.polaris</groupId>
> > >   <artifactId>polaris-spark-3.5_2.12</artifactId>
> > >   <versioning>
> > >     <lastUpdated>20250620185923</lastUpdated>
> > >     <snapshot>
> > >       <localCopy>true</localCopy>
> > >     </snapshot>
> > >     <snapshotVersions>
> > >       <snapshotVersion>
> > >         <classifier>bundle</classifier>
> > >         <extension>jar</extension>
> > >         <value>1.1.0-incubating-SNAPSHOT</value>
> > >         <updated>20250620185923</updated>
> > >       </snapshotVersion>
> > >
> > > I manually tested with Spark locally and it seems to work. However,  I
> > > thought that caused issues before. WDYT?
> > >
> > > Re: compiling against shaded packages - I still believe that it is not
> > nice
> > > from the maintenance POV. Yet, I do not insist on reworking this.
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > >
> > > On Fri, Jun 20, 2025 at 5:09 PM yun zou <yunzou.colost...@gmail.com>
> > > wrote:
> > >
> > > > Hi Dmitri,
> > > >
> > > > Regarding to this question:
> > > >
> > > >
> > > >
> > > >
> > > > *Current docs [1] suggest using
> > > > `--packagesorg.apache.polaris:polaris-spark-3.5_2.12:1.0.0` but PR
> 1908
> > > > produces`polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar`
> > > > (note:bundle, disregard version).*
> > > >
> > > > The version number used in the bundle jar is produced with the
> version
> > > > number in the
> > > > current version file in the repo, therefore the one you see is
> > > > xxx-incubating-SNAPSHOT-bundle.jar.
> > > > Furthermore, the bundle jar is published for the jar use case, not
> for
> > > the
> > > > package use case. There are
> > > > two ways to use the Spark Client with Spark:
> > > > 1) use --packages, where the dependencies are downloaded
> automatically
> > > > 2) use --jar, the bundle jar will contain everything user needed
> > without
> > > > doing extra dependency download
> > > >
> > > > When the user uses packages, it is using the package we formally
> > publish
> > > to
> > > > maven, which I
> > > > believe will not have "incubating-SNAPSHOT" in the version anymore,
> so
> > > > 1.0.0 will be the right version for
> > > > actual use when we release 1.0.0. Furthermore, what we give in the
> doc
> > is
> > > > always just an example, where we phase it like
> > > > "
> > > > Assume the released Polaris Spark client you want to use is
> > > > `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0`
> > > > "
> > > > So it is up to the user to pick up the version they want to use among
> > the
> > > > published versions, which will only be
> > > > 1.0.0 now, but later we might publish 1.1.0, 1,2,0 etc.
> > > >
> > > >
> > > >
> > > >
> > > > *Instead of compiling against relocated classes, why don't we
> > > > compileagainst the original Jackson jar, and later relocate the Spark
> > > > Client to"org.apache.iceberg.shaded.com.fasterxml.jackson.*" ?*
> > > >
> > > > Regarding to this, i think it is correct for the Spark Client to use
> > > shaded
> > > > jar in iceberg spark client, because our Spark Client
> > > > is suppose to be fully depend and compatible with the
> > > > iceberg-spark-runtime, where we intended to use all libraries
> directly
> > > > shipped from iceberg-spark-runtime to avoid any potential
> > > compatibilities,
> > > > includes RESTClient, Iceberg RestRequest etc.
> > > > If we are using our own jackson library and relocate it to
> > > > org.apache.iceberg, first of all, i don't know if it will work or
> not,
> > > > other
> > > > than this, it also potentially end with two different jackson
> version,
> > > > which might potentially introduce compatibility issues,
> > > > especially we use the RESTClient shipped along with the
> > > > iceberg-spark-runtime. Furthermore, it is very confusing that
> > > > we are relocating it to namespace org.apache.iceberg*, to me, that is
> > > even
> > > > worse than skipping the shaded check.
> > > > In my point of view, it is correct for the spark client to use the
> > shaded
> > > > library from iceberg-spark-client, we should not be so
> > > > concerned about skipping the import check for the spark client
> project
> > as
> > > > far as we are clear about the goal we are trying to achieve.
> > > >
> > > > WDYT?
> > > >
> > > > Best Regards,
> > > > Yun
> > > >
> > > >
> > > > On Fri, Jun 20, 2025 at 12:58 PM Yufei Gu <flyrain...@gmail.com>
> > wrote:
> > > >
> > > > > It's simpler to maintain one version for the same dependency
> instead
> > of
> > > > > two. There is no confusion for developers -- I can foresee anyone
> > > looking
> > > > > at the build script will ask what the Jackson Spark client
> eventually
> > > > > shipped. Upgrading the version is straightforward. But I'd like to
> > know
> > > > > more details why compiling against a shaded package is preferable
> > here.
> > > > > Would you mind providing these details?
> > > > >
> > > > > Yufei
> > > > >
> > > > >
> > > > > On Fri, Jun 20, 2025 at 12:32 PM Dmitri Bourlatchkov <
> > di...@apache.org
> > > >
> > > > > wrote:
> > > > >
> > > > > > In any case, IMHO, even updating jackson version numbers in two
> > > places
> > > > is
> > > > > > preferable to compiling against shaded packages.
> > > > > >
> > > > > > On Fri, Jun 20, 2025 at 3:25 PM Dmitri Bourlatchkov <
> > > di...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > I suppose we should be able to get the version of Jackson used
> by
> > > > > Iceberg
> > > > > > > from Iceberg POM information, right?
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Dmitri.
> > > > > > >
> > > > > > > On Fri, Jun 20, 2025 at 3:08 PM Yufei Gu <flyrain...@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > >> That's an interesting idea. But it requires us to maintain the
> > > > > > consistency
> > > > > > >> of the Jackson version in two places instead of one. The
> > original
> > > > > > Jackson
> > > > > > >> version has to match with the one shaded in Iceberg spark
> > runtime.
> > > > > Every
> > > > > > >> time we update one, we have to remember to update another. I'm
> > not
> > > > > sure
> > > > > > if
> > > > > > >> it improves the situation.
> > > > > > >>
> > > > > > >> Yufei
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Jun 20, 2025 at 11:43 AM Dmitri Bourlatchkov <
> > > > > di...@apache.org>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi Yun and Yufei,
> > > > > > >> >
> > > > > > >> > > Specifically, why does CreateGenericTableRESTRequest use
> the
> > > > > shaded
> > > > > > >> > Jackson?
> > > > > > >> >
> > > > > > >> > As discussed off list, request / response payload classes
> have
> > > to
> > > > > work
> > > > > > >> with
> > > > > > >> > the version of Jackson included with the Iceberg Spark jars
> > > > (because
> > > > > > >> they
> > > > > > >> > own the RESTClient).
> > > > > > >> >
> > > > > > >> > That in itself is fine.
> > > > > > >> >
> > > > > > >> > I'd like to propose a different approach to implementing
> that
> > in
> > > > > > >> Polaris,
> > > > > > >> > though.
> > > > > > >> >
> > > > > > >> > Instead of compiling against relocated classes, why don't we
> > > > compile
> > > > > > >> > against the original Jackson jar, and later relocate the
> Spark
> > > > > Client
> > > > > > to
> > > > > > >> > "org.apache.iceberg.shaded.com.fasterxml.jackson.*" ?
> > > > > > >> >
> > > > > > >> > I believe Jackson is the only relocation concern.
> > > > > > >> >
> > > > > > >> > After relocation we can publish both the "thin" client for
> use
> > > > with
> > > > > > >> > --package in Spark, and the "fat" jar for use with --jar.
> Both
> > > > > > artifacts
> > > > > > >> > will depend on the relocated Iceberg artifacts.
> > > > > > >> >
> > > > > > >> > WDYT?
> > > > > > >> >
> > > > > > >> > Cheers,
> > > > > > >> > Dmitri.
> > > > > > >> >
> > > > > > >> > On Fri, Jun 20, 2025 at 1:05 PM Dmitri Bourlatchkov <
> > > > > di...@apache.org
> > > > > > >
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > Thanks for the quick response, Yun!
> > > > > > >> > >
> > > > > > >> > > > org.apache.polaris#polaris-core
> > > > > > >> > > > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12
> > > > > > >> > >
> > > > > > >> > > IIRC, polaris-core uses Jackson. iceberg-spark-runtime
> also
> > > uses
> > > > > > >> Jackson,
> > > > > > >> > > but it shades it.
> > > > > > >> > >
> > > > > > >> > > I believe I saw issues with using both shaded and
> non-shaded
> > > > > Jackson
> > > > > > >> in
> > > > > > >> > > the same Spark env. with Iceberg.
> > > > > > >> > >
> > > > > > >> > > This may or may not be a concern for our Spark Client.
> What
> > I
> > > > mean
> > > > > > is
> > > > > > >> > that
> > > > > > >> > > it may need some more consideration to be sure.
> > > > > > >> > >
> > > > > > >> > > Specifically, why does CreateGenericTableRESTRequest use
> the
> > > > > shaded
> > > > > > >> > > Jackson?
> > > > > > >> > >
> > > > > > >> > > WDYT?
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Dmitri.
> > > > > > >> > >
> > > > > > >> > > On Fri, Jun 20, 2025 at 12:47 PM yun zou <
> > > > > > yunzou.colost...@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > >
> > > > > > >> > >> *-- What is the maven artifact that Spark can
> automatically
> > > > pull
> > > > > > >> > >> (via--packages)*
> > > > > > >> > >>
> > > > > > >> > >> Our spark client pulls the following:
> > > > > > >> > >>
> > > > > > >> > >> org.apache.polaris#polaris-spark-3.5_2.12
> > > > > > >> > >>
> > > > > > >> > >> org.apache.polaris#polaris-core
> > > > > > >> > >>
> > > > > > >> > >> org.apache.polaris#polaris-api-management-model
> > > > > > >> > >>
> > > > > > >> > >> org.apache.iceberg#iceberg-spark-runtime-3.5_2.12
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >> Prior to the change, it also pulled iceberg-core and avro
> > > > 1.20.0.
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >> *-- Does that artifact use shaded dependencies*
> > > > > > >> > >>
> > > > > > >> > >> Any usage of classes from iceberg-spark-runtime uses the
> > > shaded
> > > > > > >> > libraries
> > > > > > >> > >> shipped along with the artifacts.
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >> *-- Does that artifact depend on the Iceberg Spark
> bundle?*
> > > > > > >> > >>
> > > > > > >> > >> If you are referring to our spark client, it depends on
> > > > > > >> > >> iceberg-spark-runtime,
> > > > > > >> > >> not other bundles.
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >> *-- Is the _code_ running in Spark the same when the
> > Polaris
> > > > > Spark
> > > > > > >> > Client
> > > > > > >> > >> ispulled via --packages and via --jars?*
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >> yes, the jar and package will use the same code, where
> the
> > > jar
> > > > > > simply
> > > > > > >> > >> packs
> > > > > > >> > >> everything
> > > > > > >> > >>
> > > > > > >> > >> for the user and there is no need to download any other
> > > > > dependency.
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >> Best Regards,
> > > > > > >> > >>
> > > > > > >> > >> Yun
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >> On Fri, Jun 20, 2025 at 9:18 AM Dmitri Bourlatchkov <
> > > > > > >> di...@apache.org>
> > > > > > >> > >> wrote:
> > > > > > >> > >>
> > > > > > >> > >> > Some questions for clarification:
> > > > > > >> > >> >
> > > > > > >> > >> > * What is the maven artifact that Spark can
> automatically
> > > > pull
> > > > > > (via
> > > > > > >> > >> > --packages)?
> > > > > > >> > >> > * Does that artifact use shaded dependencies?
> > > > > > >> > >> > * Does that artifact depend on the Iceberg Spark
> bundle?
> > > > > > >> > >> > * Is the _code_ running in Spark the same when the
> > Polaris
> > > > > Spark
> > > > > > >> > Client
> > > > > > >> > >> is
> > > > > > >> > >> > pulled via --packages and via --jars?
> > > > > > >> > >> >
> > > > > > >> > >> > I know I could have figured that out from code, but I'm
> > > > asking
> > > > > > here
> > > > > > >> > >> because
> > > > > > >> > >> > I think we may need to review our approach to
> publishing
> > > > these
> > > > > > >> > >> artifacts.
> > > > > > >> > >> >
> > > > > > >> > >> > I believe that regardless of the method of including
> the
> > > > Client
> > > > > > >> into
> > > > > > >> > >> Spark
> > > > > > >> > >> > runtime, the code has to be exactly the same.... and I
> > > doubt
> > > > it
> > > > > > is
> > > > > > >> the
> > > > > > >> > >> same
> > > > > > >> > >> > now. WDYT?
> > > > > > >> > >> >
> > > > > > >> > >> > Thanks,
> > > > > > >> > >> > Dmitri.
> > > > > > >> > >> >
> > > > > > >> > >> >
> > > > > > >> > >> > On Fri, Jun 20, 2025 at 10:15 AM Dmitri Bourlatchkov <
> > > > > > >> > di...@apache.org>
> > > > > > >> > >> > wrote:
> > > > > > >> > >> >
> > > > > > >> > >> > > Hi All,
> > > > > > >> > >> > >
> > > > > > >> > >> > > Re: PR [1908] let's use this thread to clarify the
> > > problems
> > > > > > we're
> > > > > > >> > >> trying
> > > > > > >> > >> > > to solve and options for solutions.
> > > > > > >> > >> > >
> > > > > > >> > >> > > As for me, it looks like some refactoring in the way
> > the
> > > > > Spark
> > > > > > >> > Client
> > > > > > >> > >> is
> > > > > > >> > >> > > built and published may be needed.
> > > > > > >> > >> > >
> > > > > > >> > >> > > I think it makes sense to clarify this before 1.0 to
> > > avoid
> > > > > > >> changes
> > > > > > >> > to
> > > > > > >> > >> > > Maven coordinates right after 1.0
> > > > > > >> > >> > >
> > > > > > >> > >> > > [1908] https://github.com/apache/polaris/pull/1908
> > > > > > >> > >> > >
> > > > > > >> > >> > > Thanks,
> > > > > > >> > >> > > Dmitri.
> > > > > > >> > >> > >
> > > > > > >> > >> > >
> > > > > > >> > >> >
> > > > > > >> > >>
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to