Hi Yun, I'm thinking about this approach:
* We compile against unallocated jars. * We relocate Jackson the same way as Iceberg in our build. * We publish the relocated Spark Client jar (thin), without any bundled classes. * The relocated Spark Client jar depends on Iceberg Spark runtime. * The relocated Spark Client jar does _not_ have a classifier. The exact version of Jackson in our build and Iceberg build may differer, but Jackson is pretty good at keeping backward compatibility, so this should not be a problem. WDYT? Thanks, Dmitri. On Fri, Jun 20, 2025 at 8:01 PM yun zou <yunzou.colost...@gmail.com> wrote: > Hi Dmitri, > > Thanks a lot for the information! So it seems after my previous PR [1857] > that reuses the current shadowJar > publish, it just publishes the shadow jar, which is included in the module > files. > > It turns out that the POM file we generated have following like once shadow > plugins is used > > <!-- do_not_remove: published-with-gradle-metadata --> > > > which indicates prefer to resolve from .module file, which directly > includes our bundle jar. > > > This should also work, but may not be the exact behavior we want to be. To > be safe I > > can put on a different PR to revert the previous PR 1857 and see if there > is a better way > > to reuse the shadowJar plugin later. > > > WDYT? > > > Best Regards, > > Yun > > On Fri, Jun 20, 2025 at 3:56 PM Dmitri Bourlatchkov <di...@apache.org> > wrote: > > > Hi Yun, > > > > I do not see a non-bundle jar published to my local Maven > > repo > > > .m2/repository/org/apache/polaris/polaris-spark-3.5_2.12/1.1.0-incubating-SNAPSHOT > > > > maven-metadata-local.xml > > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar > > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.jar > > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.module > > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.pom > > polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-sources.jar > > > > ... but Spark still works with --packages > > org.apache.polaris:polaris-spark-3.5_2.12:1.1.0-incubating-SNAPSHOT > > > > Cheers, > > Dmitri. > > > > On Fri, Jun 20, 2025 at 6:42 PM yun zou <yunzou.colost...@gmail.com> > > wrote: > > > > > Hi Dmitri, > > > > > > I think there might be a misunderstanding about how jars and packages > are > > > published, the shadowJar > > > job is used to publish the bundle jar for the jar use cases, where all > > > dependency are packed and users uses > > > with spark like following: > > > --jar polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar > > > > > > This is different from when user uses --packages, it uses the regular > > > project jar without classifier, and all dependency > > > are resolved and downloaded on installation time. Once formally > released, > > > spark users can use it like following: > > > --package org.apache.polaris:polaris-spark-3.5_2.12:1.1.0 > > > > > > Note that the regular project jar can not be directly used as --jar > > without > > > manually adding other dependency jars, because > > > it doesn't pack other necessary dependencies. That is why we are > pushing > > > the bundle jar also, which is used to help > > > the direct jar use cases. > > > > > > You might be confused by my previous PR > > > <https://github.com/apache/polaris/pull/1857> where I thought I needed > > to > > > remove the classifier to make the package use case > > > work, i believe I later clarified that it was a false alarm, where we > do > > > not need the bundle jar for the `--package` use case. > > > > > > I have manually verified both use cases, and we have test automation > for > > > the jar use case, and I have followed up to investigate > > > how to add a regression test for the package use case also. > > > > > > Best Regards, > > > Yun > > > > > > > > > On Fri, Jun 20, 2025 at 3:23 PM Dmitri Bourlatchkov <di...@apache.org> > > > wrote: > > > > > > > Hi Yun, > > > > > > > > Re: --packages, what I meant to say is that even with PR 1908, the > > > > published version has the "bundle" classifier. > > > > <metadata modelVersion="1.1.0"> > > > > <groupId>org.apache.polaris</groupId> > > > > <artifactId>polaris-spark-3.5_2.12</artifactId> > > > > <versioning> > > > > <lastUpdated>20250620185923</lastUpdated> > > > > <snapshot> > > > > <localCopy>true</localCopy> > > > > </snapshot> > > > > <snapshotVersions> > > > > <snapshotVersion> > > > > <classifier>bundle</classifier> > > > > <extension>jar</extension> > > > > <value>1.1.0-incubating-SNAPSHOT</value> > > > > <updated>20250620185923</updated> > > > > </snapshotVersion> > > > > > > > > I manually tested with Spark locally and it seems to work. However, > I > > > > thought that caused issues before. WDYT? > > > > > > > > Re: compiling against shaded packages - I still believe that it is > not > > > nice > > > > from the maintenance POV. Yet, I do not insist on reworking this. > > > > > > > > Cheers, > > > > Dmitri. > > > > > > > > > > > > On Fri, Jun 20, 2025 at 5:09 PM yun zou <yunzou.colost...@gmail.com> > > > > wrote: > > > > > > > > > Hi Dmitri, > > > > > > > > > > Regarding to this question: > > > > > > > > > > > > > > > > > > > > > > > > > *Current docs [1] suggest using > > > > > `--packagesorg.apache.polaris:polaris-spark-3.5_2.12:1.0.0` but PR > > 1908 > > > > > > produces`polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar` > > > > > (note:bundle, disregard version).* > > > > > > > > > > The version number used in the bundle jar is produced with the > > version > > > > > number in the > > > > > current version file in the repo, therefore the one you see is > > > > > xxx-incubating-SNAPSHOT-bundle.jar. > > > > > Furthermore, the bundle jar is published for the jar use case, not > > for > > > > the > > > > > package use case. There are > > > > > two ways to use the Spark Client with Spark: > > > > > 1) use --packages, where the dependencies are downloaded > > automatically > > > > > 2) use --jar, the bundle jar will contain everything user needed > > > without > > > > > doing extra dependency download > > > > > > > > > > When the user uses packages, it is using the package we formally > > > publish > > > > to > > > > > maven, which I > > > > > believe will not have "incubating-SNAPSHOT" in the version anymore, > > so > > > > > 1.0.0 will be the right version for > > > > > actual use when we release 1.0.0. Furthermore, what we give in the > > doc > > > is > > > > > always just an example, where we phase it like > > > > > " > > > > > Assume the released Polaris Spark client you want to use is > > > > > `org.apache.polaris:polaris-spark-3.5_2.12:1.0.0` > > > > > " > > > > > So it is up to the user to pick up the version they want to use > among > > > the > > > > > published versions, which will only be > > > > > 1.0.0 now, but later we might publish 1.1.0, 1,2,0 etc. > > > > > > > > > > > > > > > > > > > > > > > > > *Instead of compiling against relocated classes, why don't we > > > > > compileagainst the original Jackson jar, and later relocate the > Spark > > > > > Client to"org.apache.iceberg.shaded.com.fasterxml.jackson.*" ?* > > > > > > > > > > Regarding to this, i think it is correct for the Spark Client to > use > > > > shaded > > > > > jar in iceberg spark client, because our Spark Client > > > > > is suppose to be fully depend and compatible with the > > > > > iceberg-spark-runtime, where we intended to use all libraries > > directly > > > > > shipped from iceberg-spark-runtime to avoid any potential > > > > compatibilities, > > > > > includes RESTClient, Iceberg RestRequest etc. > > > > > If we are using our own jackson library and relocate it to > > > > > org.apache.iceberg, first of all, i don't know if it will work or > > not, > > > > > other > > > > > than this, it also potentially end with two different jackson > > version, > > > > > which might potentially introduce compatibility issues, > > > > > especially we use the RESTClient shipped along with the > > > > > iceberg-spark-runtime. Furthermore, it is very confusing that > > > > > we are relocating it to namespace org.apache.iceberg*, to me, that > is > > > > even > > > > > worse than skipping the shaded check. > > > > > In my point of view, it is correct for the spark client to use the > > > shaded > > > > > library from iceberg-spark-client, we should not be so > > > > > concerned about skipping the import check for the spark client > > project > > > as > > > > > far as we are clear about the goal we are trying to achieve. > > > > > > > > > > WDYT? > > > > > > > > > > Best Regards, > > > > > Yun > > > > > > > > > > > > > > > On Fri, Jun 20, 2025 at 12:58 PM Yufei Gu <flyrain...@gmail.com> > > > wrote: > > > > > > > > > > > It's simpler to maintain one version for the same dependency > > instead > > > of > > > > > > two. There is no confusion for developers -- I can foresee anyone > > > > looking > > > > > > at the build script will ask what the Jackson Spark client > > eventually > > > > > > shipped. Upgrading the version is straightforward. But I'd like > to > > > know > > > > > > more details why compiling against a shaded package is preferable > > > here. > > > > > > Would you mind providing these details? > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > On Fri, Jun 20, 2025 at 12:32 PM Dmitri Bourlatchkov < > > > di...@apache.org > > > > > > > > > > > wrote: > > > > > > > > > > > > > In any case, IMHO, even updating jackson version numbers in two > > > > places > > > > > is > > > > > > > preferable to compiling against shaded packages. > > > > > > > > > > > > > > On Fri, Jun 20, 2025 at 3:25 PM Dmitri Bourlatchkov < > > > > di...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > I suppose we should be able to get the version of Jackson > used > > by > > > > > > Iceberg > > > > > > > > from Iceberg POM information, right? > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > On Fri, Jun 20, 2025 at 3:08 PM Yufei Gu < > flyrain...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > >> That's an interesting idea. But it requires us to maintain > the > > > > > > > consistency > > > > > > > >> of the Jackson version in two places instead of one. The > > > original > > > > > > > Jackson > > > > > > > >> version has to match with the one shaded in Iceberg spark > > > runtime. > > > > > > Every > > > > > > > >> time we update one, we have to remember to update another. > I'm > > > not > > > > > > sure > > > > > > > if > > > > > > > >> it improves the situation. > > > > > > > >> > > > > > > > >> Yufei > > > > > > > >> > > > > > > > >> > > > > > > > >> On Fri, Jun 20, 2025 at 11:43 AM Dmitri Bourlatchkov < > > > > > > di...@apache.org> > > > > > > > >> wrote: > > > > > > > >> > > > > > > > >> > Hi Yun and Yufei, > > > > > > > >> > > > > > > > > >> > > Specifically, why does CreateGenericTableRESTRequest use > > the > > > > > > shaded > > > > > > > >> > Jackson? > > > > > > > >> > > > > > > > > >> > As discussed off list, request / response payload classes > > have > > > > to > > > > > > work > > > > > > > >> with > > > > > > > >> > the version of Jackson included with the Iceberg Spark > jars > > > > > (because > > > > > > > >> they > > > > > > > >> > own the RESTClient). > > > > > > > >> > > > > > > > > >> > That in itself is fine. > > > > > > > >> > > > > > > > > >> > I'd like to propose a different approach to implementing > > that > > > in > > > > > > > >> Polaris, > > > > > > > >> > though. > > > > > > > >> > > > > > > > > >> > Instead of compiling against relocated classes, why don't > we > > > > > compile > > > > > > > >> > against the original Jackson jar, and later relocate the > > Spark > > > > > > Client > > > > > > > to > > > > > > > >> > "org.apache.iceberg.shaded.com.fasterxml.jackson.*" ? > > > > > > > >> > > > > > > > > >> > I believe Jackson is the only relocation concern. > > > > > > > >> > > > > > > > > >> > After relocation we can publish both the "thin" client for > > use > > > > > with > > > > > > > >> > --package in Spark, and the "fat" jar for use with --jar. > > Both > > > > > > > artifacts > > > > > > > >> > will depend on the relocated Iceberg artifacts. > > > > > > > >> > > > > > > > > >> > WDYT? > > > > > > > >> > > > > > > > > >> > Cheers, > > > > > > > >> > Dmitri. > > > > > > > >> > > > > > > > > >> > On Fri, Jun 20, 2025 at 1:05 PM Dmitri Bourlatchkov < > > > > > > di...@apache.org > > > > > > > > > > > > > > > >> > wrote: > > > > > > > >> > > > > > > > > >> > > Thanks for the quick response, Yun! > > > > > > > >> > > > > > > > > > >> > > > org.apache.polaris#polaris-core > > > > > > > >> > > > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12 > > > > > > > >> > > > > > > > > > >> > > IIRC, polaris-core uses Jackson. iceberg-spark-runtime > > also > > > > uses > > > > > > > >> Jackson, > > > > > > > >> > > but it shades it. > > > > > > > >> > > > > > > > > > >> > > I believe I saw issues with using both shaded and > > non-shaded > > > > > > Jackson > > > > > > > >> in > > > > > > > >> > > the same Spark env. with Iceberg. > > > > > > > >> > > > > > > > > > >> > > This may or may not be a concern for our Spark Client. > > What > > > I > > > > > mean > > > > > > > is > > > > > > > >> > that > > > > > > > >> > > it may need some more consideration to be sure. > > > > > > > >> > > > > > > > > > >> > > Specifically, why does CreateGenericTableRESTRequest use > > the > > > > > > shaded > > > > > > > >> > > Jackson? > > > > > > > >> > > > > > > > > > >> > > WDYT? > > > > > > > >> > > > > > > > > > >> > > Thanks, > > > > > > > >> > > Dmitri. > > > > > > > >> > > > > > > > > > >> > > On Fri, Jun 20, 2025 at 12:47 PM yun zou < > > > > > > > yunzou.colost...@gmail.com> > > > > > > > >> > > wrote: > > > > > > > >> > > > > > > > > > >> > >> *-- What is the maven artifact that Spark can > > automatically > > > > > pull > > > > > > > >> > >> (via--packages)* > > > > > > > >> > >> > > > > > > > >> > >> Our spark client pulls the following: > > > > > > > >> > >> > > > > > > > >> > >> org.apache.polaris#polaris-spark-3.5_2.12 > > > > > > > >> > >> > > > > > > > >> > >> org.apache.polaris#polaris-core > > > > > > > >> > >> > > > > > > > >> > >> org.apache.polaris#polaris-api-management-model > > > > > > > >> > >> > > > > > > > >> > >> org.apache.iceberg#iceberg-spark-runtime-3.5_2.12 > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> Prior to the change, it also pulled iceberg-core and > avro > > > > > 1.20.0. > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> *-- Does that artifact use shaded dependencies* > > > > > > > >> > >> > > > > > > > >> > >> Any usage of classes from iceberg-spark-runtime uses > the > > > > shaded > > > > > > > >> > libraries > > > > > > > >> > >> shipped along with the artifacts. > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> *-- Does that artifact depend on the Iceberg Spark > > bundle?* > > > > > > > >> > >> > > > > > > > >> > >> If you are referring to our spark client, it depends on > > > > > > > >> > >> iceberg-spark-runtime, > > > > > > > >> > >> not other bundles. > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> *-- Is the _code_ running in Spark the same when the > > > Polaris > > > > > > Spark > > > > > > > >> > Client > > > > > > > >> > >> ispulled via --packages and via --jars?* > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> yes, the jar and package will use the same code, where > > the > > > > jar > > > > > > > simply > > > > > > > >> > >> packs > > > > > > > >> > >> everything > > > > > > > >> > >> > > > > > > > >> > >> for the user and there is no need to download any other > > > > > > dependency. > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> Best Regards, > > > > > > > >> > >> > > > > > > > >> > >> Yun > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> On Fri, Jun 20, 2025 at 9:18 AM Dmitri Bourlatchkov < > > > > > > > >> di...@apache.org> > > > > > > > >> > >> wrote: > > > > > > > >> > >> > > > > > > > >> > >> > Some questions for clarification: > > > > > > > >> > >> > > > > > > > > >> > >> > * What is the maven artifact that Spark can > > automatically > > > > > pull > > > > > > > (via > > > > > > > >> > >> > --packages)? > > > > > > > >> > >> > * Does that artifact use shaded dependencies? > > > > > > > >> > >> > * Does that artifact depend on the Iceberg Spark > > bundle? > > > > > > > >> > >> > * Is the _code_ running in Spark the same when the > > > Polaris > > > > > > Spark > > > > > > > >> > Client > > > > > > > >> > >> is > > > > > > > >> > >> > pulled via --packages and via --jars? > > > > > > > >> > >> > > > > > > > > >> > >> > I know I could have figured that out from code, but > I'm > > > > > asking > > > > > > > here > > > > > > > >> > >> because > > > > > > > >> > >> > I think we may need to review our approach to > > publishing > > > > > these > > > > > > > >> > >> artifacts. > > > > > > > >> > >> > > > > > > > > >> > >> > I believe that regardless of the method of including > > the > > > > > Client > > > > > > > >> into > > > > > > > >> > >> Spark > > > > > > > >> > >> > runtime, the code has to be exactly the same.... and > I > > > > doubt > > > > > it > > > > > > > is > > > > > > > >> the > > > > > > > >> > >> same > > > > > > > >> > >> > now. WDYT? > > > > > > > >> > >> > > > > > > > > >> > >> > Thanks, > > > > > > > >> > >> > Dmitri. > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > On Fri, Jun 20, 2025 at 10:15 AM Dmitri Bourlatchkov > < > > > > > > > >> > di...@apache.org> > > > > > > > >> > >> > wrote: > > > > > > > >> > >> > > > > > > > > >> > >> > > Hi All, > > > > > > > >> > >> > > > > > > > > > >> > >> > > Re: PR [1908] let's use this thread to clarify the > > > > problems > > > > > > > we're > > > > > > > >> > >> trying > > > > > > > >> > >> > > to solve and options for solutions. > > > > > > > >> > >> > > > > > > > > > >> > >> > > As for me, it looks like some refactoring in the > way > > > the > > > > > > Spark > > > > > > > >> > Client > > > > > > > >> > >> is > > > > > > > >> > >> > > built and published may be needed. > > > > > > > >> > >> > > > > > > > > > >> > >> > > I think it makes sense to clarify this before 1.0 > to > > > > avoid > > > > > > > >> changes > > > > > > > >> > to > > > > > > > >> > >> > > Maven coordinates right after 1.0 > > > > > > > >> > >> > > > > > > > > > >> > >> > > [1908] https://github.com/apache/polaris/pull/1908 > > > > > > > >> > >> > > > > > > > > > >> > >> > > Thanks, > > > > > > > >> > >> > > Dmitri. > > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >