Hello, This explanation is splendidly detailed and requires further understanding. However, on a first thought with regard to the point raised below and I quote:
"... There is a company claiming something non-Apache like "Apache Spark 3.4.0 minus SPARK-40436" with the name "Apache Spark 3.4.0." There is a potential risk for the consumers of this product offered that can be justified as below: To maintain the integrity of the Apache Spark project and ensure reliable and secure software, it is a common practice to use official releases from the ASF. If a third party company is claiming to provide a modified version of Apache Spark (in the form of software as a service), it is strongly recommended " for consumers" to carefully review the modifications involved, understand the reasoning behind these modifications and/or omissions, and evaluate the potential implications before using and maintaining this offering in production environments. The third party company has to clearly state and advertise the reasoning behind this so-called hacking, specifically with reference to "### 3. Action Items ### --We should communicate and help the company to fix the misleading messages and remove Scala-version segmentation situations per Spark version". HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 5 Jun 2023 at 08:46, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Hi, All and Matei (as the Chair of Apache Spark PMC). > > Sorry for a long email, I want to share two topics and corresponding > action items. > You can go to "Section 3: Action Items" directly for the conclusion. > > > ### 1. ASF Policy Violation ### > > ASF has a rule for "MAY I CALL MY MODIFIED CODE 'APACHE'?" > > https://www.apache.org/foundation/license-faq.html#Name-changes > > For example, when we call `Apache Spark 3.4.0`, it's supposed to be the > same with one of our official distributions. > > https://downloads.apache.org/spark/spark-3.4.0/ > > Specifically, in terms of the Scala version, we believe it should have > Scala 2.12.17 because of 'SPARK-40436 Upgrade Scala to 2.12.17'. > > There is a company claiming something non-Apache like "Apache Spark 3.4.0 > minus SPARK-40436" with the name "Apache Spark 3.4.0." > > - The company website shows "X.Y (includes Apache Spark 3.4.0, Scala > 2.12)" > - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark > version 3.4.0" > - UI shows Apache Spark logo and `3.4.0`. > - However, Scala Version is '2.12.15' > > [image: Screenshot 2023-06-04 at 9.37.16 PM.png][image: Screenshot > 2023-06-04 at 10.14.45 PM.png] > > Lastly, this is not a single instance. For example, the same company also > claims "Apache Spark 3.3.2" with a mismatched Scala version. > > > ### 2. Scala Issues ### > > In addition to (1), although we proceeded with good intentions and great > care > including dev mailing list discussion, there are several concerning areas > which > need more attention and our love. > > a) Scala Spark users will experience UX inconvenience from Spark 3.5. > > SPARK-42493 Make Python the first tab for code examples > > For the record, we discussed it here. > - https://lists.apache.org/thread/1p8s09ysrh4jqsfd47qdtrl7rm4rrs05 > "[DISCUSS] Show Python code examples first in Spark documentation" > > b) Scala version upgrade is blocked by the Ammonite library dev cycle > currently. > > Although we discussed it here and it had good intentions, > the current master branch cannot use the latest Scala. > > - https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk > "Ammonite as REPL for Spark Connect" > SPARK-42884 Add Ammonite REPL integration > > Specifically, the following are blocked and I'm monitoring the > Ammonite repository. > - SPARK-40497 Upgrade Scala to 2.13.11 > - SPARK-43832 Upgrade Scala to 2.12.18 > - According to https://github.com/com-lihaoyi/Ammonite/issues , > Scala 3.3.0 LTS support also looks infeasible. > > Although we may be able to wait for a while, there are two fundamental > solutions > to unblock this situation in a long-term maintenance perspective. > - Replace it with a Scala-shell based implementation > - Move `connector/connect/client/jvm/pom.xml` outside from Spark repo. > Maybe, we can put it into the new repo like Rust and Go client. > > c) Scala 2.13 and above needs Apache Spark 4.0. > > In "Apache Spark 3.5.0 Expectations?" and "Apache Spark 4.0 > Timeframe?" threads, > we discussed Spark 3.5.0 scope and decided to revert > "SPARK-43836 Make Scala 2.13 as default in Spark 3.5". > Apache Spark 4.0.0 is the only way to support Scala 2.13 or higher. > > - https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0 > ("Apache Spark 3.5.0 Expectations?") > - https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6 > ("Apache Spark 4.0 Timeframe?") > > A candidate(or mentioned) timeframe was "Spark 4.0.0: 2024.06" and > Scala 3.3.0 LTS. > - https://scala-lang.org/blog/2023/05/30/scala-3.3.0-released.html > > d) Java 21 LTS is Apache Spark 3.5.0's stretched goal > > SPARK-43831 Build and Run Spark on Java 21 > > However, this needs SPARK-40497 (Scala 2.13.11) and SPARK-43832 (Scala > 2.12.18) > which are blocked by Ammonite library as mentioned in (b) > > > ### 3. Action Items ### > > To provide a clarity to the Apache Spark Scala community, > > - We should communicate and help the company to fix the misleading > messages and > remove Scala-version segmentation situations per Spark version. > > - Apache Spark PMC should include this incident report and the result > in the next Apache Spark Quarterly Report (August). > > - I will start a vote for Apache Spark 4.0.0 timeframe next week after > receiving more feedback. > Since 4.0.0 is not limited to the Scala issues, we will vote on the > timeline only. > > - Lastly, we need to re-evaluate the risk of `Ammonite` library before > Apache Spark 3.5.0 release. > If it blocks Scala upgrade and Java 21 support, we had better avoid it > at all cost. > > > WDTY? > > Thanks, > Dongjoon. >