Thank you, Sean.

I'll reply as a comment for some areas first.

>  I believe releasing "Apache Foo X.Y + patches" is acceptable,
> if it is substantially Apache Foo X.Y.

1. For the naming, yes, but the company should use different version
numbers instead of the exact "3.4.0". As I shared the screenshot in my
previous email, the company exposes "Apache Spark 3.4.0" exactly because
they build their distribution without changing their version number at all.


>  I'm sure this one is about Databricks but I'm also sure Cloudera,
Hortonworks, etc had Spark releases with patches, too.

2. According to
https://mvnrepository.com/artifact/org.apache.spark/spark-core,
all the other companies followed  "Semantic Versioning" or added additional
version numbers at their distributions, didn't they? AFAIK, nobody claims
to take over the exact, "3.4.0" version string, in source code level like
this company.


> The principle here is consumer confusion.
> Is anyone substantially misled?
> Here I don't think so.

3. This company not only causes the 'Scala Version Segmentation'
environment in a subtle way, but also defames Apache Spark 3.4.0 by
removing many bug fixes of SPARK-40436 (Upgrade Scala to 2.12.17) and
SPARK-39414 (Upgrade Scala to 2.12.16) for some unknown reason. Apparently,
this looks like not a superior version of Apache Spark 3.4.0. For me, it's
the inferior version. If a company disagrees with Scala 2.12.17 for some
internal reason, they are able to stick to 2.12.15, of course. However,
Apache Spark PMC should not allow them to lie to the customers that "Apache
Spark 3.4.0" uses Scala 2.12.15 by default. That's the reason why I
initiated this email because I'm considering this as a serious blocker to
make Apache Spark Scala improvement.
    - https://github.com/scala/scala/releases/tag/v2.12.17 (21 Merged PRs)
    - https://github.com/scala/scala/releases/tag/v2.12.16 (68 Merged PRs)


> 2b/ If a single dependency blocks important updates, yeah it's fair to
remove it, IMHO. I wouldn't remove in 3.5 unless the other updates are
critical, and it's not clear they are. In 4.0 yes.

4. Apache Spark 3.5 is not exposed yet and we didn't cut the feature
branch. So, it's the opposite. I believe it's the best time to remove it
before cutting branch-3.5.


Dongjoon



On Mon, Jun 5, 2023 at 5:58 AM Sean Owen <sro...@gmail.com> wrote:

> 1/ Regarding naming - I believe releasing "Apache Foo X.Y + patches" is
> acceptable, if it is substantially Apache Foo X.Y. This is common practice
> for downstream vendors. It's fair nominative use. The principle here is
> consumer confusion. Is anyone substantially misled? Here I don't think so.
> I know that we have in the past decided it would not be OK, for example, to
> release a product with "Apache Spark 4.0" now as there is no such release,
> even building from master. A vendor should elaborate the changes
> somewhere, ideally. I'm sure this one is about Databricks but I'm also sure
> Cloudera, Hortonworks, etc had Spark releases with patches, too.
>
> 2a/ That issue seems to be about just flipping which code sample is shown
> by default. It seemed widely agree that this would slightly help more users
> than it harms. I agree with the change and don't see a need to escalate.
> the question of further Python parity is a big one but is separate.
>
> 2b/ If a single dependency blocks important updates, yeah it's fair to
> remove it, IMHO. I wouldn't remove in 3.5 unless the other updates are
> critical, and it's not clear they are. In 4.0 yes.
>
> 2c/ Scala 2.13 is already supported in 3.x, and does not require 4.0. This
> was about what the default non-Scala release convenience binaries use.
> Sticking to 2.12 in 3.x doesn't seem like an issue, even desirable.
>
> 2d/ Same as 2b
>
> 3/ I don't think 1/ is an incident. Yes to moving towards 4.0 after 3.5,
> IMHO, and to removing Ammonite in 4.0 if there is no resolution forthcoming
>
> On Mon, Jun 5, 2023 at 2:46 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Hi, All and Matei (as the Chair of Apache Spark PMC).
>>
>> Sorry for a long email, I want to share two topics and corresponding
>> action items.
>> You can go to "Section 3: Action Items" directly for the conclusion.
>>
>>
>> ### 1. ASF Policy Violation ###
>>
>> ASF has a rule for "MAY I CALL MY MODIFIED CODE 'APACHE'?"
>>
>>     https://www.apache.org/foundation/license-faq.html#Name-changes
>>
>> For example, when we call `Apache Spark 3.4.0`, it's supposed to be the
>> same with one of our official distributions.
>>
>>     https://downloads.apache.org/spark/spark-3.4.0/
>>
>> Specifically, in terms of the Scala version, we believe it should have
>> Scala 2.12.17 because of 'SPARK-40436 Upgrade Scala to 2.12.17'.
>>
>> There is a company claiming something non-Apache like "Apache Spark 3.4.0
>> minus SPARK-40436" with the name "Apache Spark 3.4.0."
>>
>>     - The company website shows "X.Y (includes Apache Spark 3.4.0, Scala
>> 2.12)"
>>     - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running
>> Spark version 3.4.0"
>>     - UI shows Apache Spark logo and `3.4.0`.
>>     - However, Scala Version is '2.12.15'
>>
>> [image: Screenshot 2023-06-04 at 9.37.16 PM.png][image: Screenshot
>> 2023-06-04 at 10.14.45 PM.png]
>>
>> Lastly, this is not a single instance. For example, the same company also
>> claims "Apache Spark 3.3.2" with a mismatched Scala version.
>>
>>
>> ### 2. Scala Issues ###
>>
>> In addition to (1), although we proceeded with good intentions and great
>> care
>> including dev mailing list discussion, there are several concerning areas
>> which
>> need more attention and our love.
>>
>> a) Scala Spark users will experience UX inconvenience from Spark 3.5.
>>
>>     SPARK-42493 Make Python the first tab for code examples
>>
>>     For the record, we discussed it here.
>>     - https://lists.apache.org/thread/1p8s09ysrh4jqsfd47qdtrl7rm4rrs05
>>       "[DISCUSS] Show Python code examples first in Spark documentation"
>>
>> b) Scala version upgrade is blocked by the Ammonite library dev cycle
>> currently.
>>
>>     Although we discussed it here and it had good intentions,
>>     the current master branch cannot use the latest Scala.
>>
>>     - https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk
>>     "Ammonite as REPL for Spark Connect"
>>      SPARK-42884 Add Ammonite REPL integration
>>
>>     Specifically, the following are blocked and I'm monitoring the
>> Ammonite repository.
>>     - SPARK-40497 Upgrade Scala to 2.13.11
>>     - SPARK-43832 Upgrade Scala to 2.12.18
>>     - According to https://github.com/com-lihaoyi/Ammonite/issues ,
>>       Scala 3.3.0 LTS support also looks infeasible.
>>
>>     Although we may be able to wait for a while, there are two
>> fundamental solutions
>>     to unblock this situation in a long-term maintenance perspective.
>>     - Replace it with a Scala-shell based implementation
>>     - Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
>>        Maybe, we can put it into the new repo like Rust and Go client.
>>
>> c) Scala 2.13 and above needs Apache Spark 4.0.
>>
>>     In "Apache Spark 3.5.0 Expectations?" and "Apache Spark 4.0
>> Timeframe?" threads,
>>     we discussed Spark 3.5.0 scope and decided to revert
>>     "SPARK-43836 Make Scala 2.13 as default in Spark 3.5".
>>     Apache Spark 4.0.0 is the only way to support Scala 2.13 or higher.
>>
>>     - https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0
>> ("Apache Spark 3.5.0 Expectations?")
>>     - https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>> ("Apache Spark 4.0 Timeframe?")
>>
>>      A candidate(or mentioned) timeframe was "Spark 4.0.0: 2024.06" and
>> Scala 3.3.0 LTS.
>>      - https://scala-lang.org/blog/2023/05/30/scala-3.3.0-released.html
>>
>> d) Java 21 LTS is Apache Spark 3.5.0's stretched goal
>>
>>     SPARK-43831 Build and Run Spark on Java 21
>>
>>     However, this needs SPARK-40497 (Scala 2.13.11) and SPARK-43832
>> (Scala 2.12.18)
>>     which are blocked by Ammonite library as mentioned in (b)
>>
>>
>> ### 3. Action Items ###
>>
>> To provide a clarity to the Apache Spark Scala community,
>>
>> - We should communicate and help the company to fix the misleading
>> messages and
>>   remove Scala-version segmentation situations per Spark version.
>>
>> - Apache Spark PMC should include this incident report and the result
>>   in the next Apache Spark Quarterly Report (August).
>>
>> - I will start a vote for Apache Spark 4.0.0 timeframe next week after
>> receiving more feedback.
>>   Since 4.0.0 is not limited to the Scala issues, we will vote on the
>> timeline only.
>>
>> - Lastly, we need to re-evaluate the risk of  `Ammonite` library before
>> Apache Spark 3.5.0 release.
>>   If it blocks Scala upgrade and Java 21 support, we had better avoid it
>> at all cost.
>>
>>
>> WDTY?
>>
>> Thanks,
>> Dongjoon.
>>
>

Reply via email to