Re: Apache Spark 3.5.0 Expectations (?)

Bjørn Jørgensen Wed, 31 May 2023 07:16:09 -0700

@Cheng Pan

https://issues.apache.org/jira/browse/HIVE-22126


ons. 31. mai 2023 kl. 03:58 skrev Cheng Pan <[email protected]>:

> @Bjørn Jørgensen
>
> I did some investigation on upgrading Guava after Spark drop Hadoop2
> support, but unfortunately, the Hive still depends on it, the worse thing
> is, that Guava’s classes are marked as shared in IsolatedClientLoader[1],
> which means Spark can not upgrade Guava even after upgrading the built-in
> Hive from current 2.3.9 to a new version which does not stick on an old
> Guava, to avoid breaking the old version of Hive Metastore client.
>
> I don't find clues why Guava classes need to be marked as shared, can
> anyone bring some background?
>
> [1]
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L215
>
> Thanks,
> Cheng Pan
>
>
> > On May 31, 2023, at 03:49, Bjørn Jørgensen <[email protected]>
> wrote:
> >
> > @Dongjoon Hyun Thank you.
> >
> > I have two points to discuss.
> > First, we are currently conducting tests with Python versions 3.8 and
> 3.9.
> > Should we consider replacing 3.9 with 3.11?
> >
> > Secondly, I'd like to know the status of Google Guava.
> > With Hadoop version 2 no longer being utilized, is there any other
> factor that is posing a blockage for this?
> >
> > tir. 30. mai 2023 kl. 10:39 skrev Mich Talebzadeh <
> [email protected]>:
> > I don't know whether it is related but Scala 2.12.17 is fine for the
> Spark 3 family (compile and run) . I spent a day compiling  Spark 3.4.0
> code against Scala 2.13.8 with maven and was getting all sorts of weird and
> wonderful errors at runtime.
> >
> > HTH
> >
> > Mich Talebzadeh,
> > Lead Solutions Architect/Engineering Lead
> > Palantir Technologies Limited
> > London
> > United Kingdom
> >
> >    view my Linkedin profile
> >
> >  https://en.everybodywiki.com/Mich_Talebzadeh
> >  Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
> >
> >
> > On Tue, 30 May 2023 at 01:59, Jungtaek Lim <[email protected]>
> wrote:
> > Shall we initiate a new discussion thread for Scala 2.13 by default?
> While I'm not an expert on this area, it sounds like the change is major
> and (probably) breaking. It seems to be worth having a separate discussion
> thread rather than just treat it like one of 25 items.
> >
> > On Tue, May 30, 2023 at 9:54 AM Sean Owen <[email protected]> wrote:
> > It does seem risky; there are still likely libs out there that don't
> cross compile for 2.13. I would make it the default at 4.0, myself.
> >
> > On Mon, May 29, 2023 at 7:16 PM Hyukjin Kwon <[email protected]>
> wrote:
> > While I support going forward with a higher version, actually using
> Scala 2.13 by default is a big deal especially in a way that:
> >     • Users would likely download the built-in version assuming that
> it’s backward binary compatible.
> >     • PyPI doesn't allow specifying the Scala version, meaning that
> users wouldn’t have a way to 'pip install pyspark' based on Scala 2.12.
> > I wonder if it’s safer to do it in Spark 4 (which I believe will be
> discussed soon).
> >
> >
> > On Mon, 29 May 2023 at 13:21, Jia Fan <[email protected]> wrote:
> > Thanks Dongjoon!
> > There are some ticket I want to share.
> > SPARK-39420 Support ANALYZE TABLE on v2 tables
> > SPARK-42750 Support INSERT INTO by name
> > SPARK-43521 Support CREATE TABLE LIKE FILE
> >
> > Dongjoon Hyun <[email protected]> 于2023年5月29日周一 08:42写道：
> > Hi, All.
> >
> > Apache Spark 3.5.0 is scheduled for August (1st Release Candidate) and
> currently a few notable things are under discussions in the mailing list.
> >
> > I believe it's a good time to share a short summary list (containing
> both completed and in-progress items) to give a highlight in advance and to
> collect your targets too.
> >
> > Please share your expectations or working items if you want to
> prioritize them more in the community in Apache Spark 3.5.0 timeframe.
> >
> > (Sorted by ID)
> > SPARK-40497 Upgrade Scala 2.13.11
> > SPARK-42452 Remove hadoop-2 profile from Apache Spark 3.5.0
> > SPARK-42913 Upgrade to Hadoop 3.3.5 (aws-java-sdk-bundle: 1.12.262 ->
> 1.12.316)
> > SPARK-43024 Upgrade Pandas to 2.0.0
> > SPARK-43200 Remove Hadoop 2 reference in docs
> > SPARK-43347 Remove Python 3.7 Support
> > SPARK-43348 Support Python 3.8 in PyPy3
> > SPARK-43351 Add Spark Connect Go prototype code and example
> > SPARK-43379 Deprecate old Java 8 versions prior to 8u371
> > SPARK-43394 Upgrade to Maven 3.8.8
> > SPARK-43436 Upgrade to RocksDbjni 8.1.1.1
> > SPARK-43446 Upgrade to Apache Arrow 12.0.0
> > SPARK-43447 Support R 4.3.0
> > SPARK-43489 Remove protobuf 2.5.0
> > SPARK-43519 Bump Parquet to 1.13.1
> > SPARK-43581 Upgrade kubernetes-client to 6.6.2
> > SPARK-43588 Upgrade to ASM 9.5
> > SPARK-43600 Update K8s doc to recommend K8s 1.24+
> > SPARK-43738 Upgrade to DropWizard Metrics 4.2.18
> > SPARK-43831 Build and Run Spark on Java 21
> > SPARK-43832 Upgrade to Scala 2.12.18
> > SPARK-43836 Make Scala 2.13 as default in Spark 3.5
> > SPARK-43842 Upgrade gcs-connector to 2.2.14
> > SPARK-43844 Update to ORC 1.9.0
> > UMBRELLA: Add SQL functions into Scala, Python and R API
> >
> > Thanks,
> > Dongjoon.
> >
> > PS. The above is not a list of release blockers. Instead, it could be a
> nice-to-have from someone's perspective.
> >
> >
> > --
> > Bjørn Jørgensen
> > Vestre Aspehaug 4, 6010 Ålesund
> > Norge
> >
> > +47 480 94 297
>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: Apache Spark 3.5.0 Expectations (?)

Reply via email to