Hi, Sean.

"+ patches" or "powered by Apache Spark 3.4.0" is not a problem as you
mentioned. For the record, I also didn't bring up any old story here.

> "Apache Spark 3.4.0 + patches"

However, "including Apache Spark 3.4.0" still causes confusion even in a
different way because of those missing patches, SPARK-40436 (Upgrade Scala
to 2.12.17) and SPARK-39414 (Upgrade Scala to 2.12.16). Technically,
Databricks Runtime doesn't include Apache Spark 3.4.0 while it claims it to
the users.

[image: image.png]

It's a sad story from the Apache Spark Scala perspective because the users
cannot even try to use the correct Scala 2.12.17 version in the runtime.

All items I've shared are connected via a single theme, hurting Apache
Spark Scala users.
>From (1) building Spark, (2) creating a fragmented Scala Spark runtime
environment and (3) hidden user-facing documentation.

Of course, I don't think those are designed in an organized way
intentionally. It just happens at the same time.

Based on your comments, let me ask you two questions. (1) When Databricks
builds its internal Spark from its private code repository, is it a company
policy to always expose "Apache 3.4.0" to the users like the following by
ignoring all changes (whatever they are). And, (2) Do you insist that it is
normative and clear to the users and the community?

> - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark
version 3.4.0"
> - UI shows Apache Spark logo and `3.4.0`.


Dongjoon.


On Mon, Jun 5, 2023 at 10:40 AM Sean Owen <sro...@gmail.com> wrote:

> On Mon, Jun 5, 2023 at 12:01 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> 1. For the naming, yes, but the company should use different version
>> numbers instead of the exact "3.4.0". As I shared the screenshot in my
>> previous email, the company exposes "Apache Spark 3.4.0" exactly because
>> they build their distribution without changing their version number at all.
>>
>
> I don't believe this is supported by guidance on the underlying issue
> here, which is trademark. There is nothing wrong with nominative use, and I
> think that's what this is. A thing can be "Apache Spark 3.4.0 + patches"
> and be described that way.
> Calling it "Apache Spark 3.4.0.vendor123" is argubaly more confusing IMHO,
> as there is no such Apache Spark version.
>
>
>
>> 2. According to
>> https://mvnrepository.com/artifact/org.apache.spark/spark-core,
>> all the other companies followed  "Semantic Versioning" or added
>> additional version numbers at their distributions, didn't they? AFAIK,
>> nobody claims to take over the exact, "3.4.0" version string, in source
>> code level like this company.
>>
>
> Here you're talking about software artifact numbering, for companies that
> were also releasing their own maintenance branch of OSS. That pretty much
> requires some sub-versioning scheme. I think that's fine too, although as
> above I think this is arguably _worse_ w.r.t. reuse of the Apache name and
> namespace.
> I'm not aware of any policy on this, and don't find this problematic
> myself. Doesn't mean it's right, but does mean implicitly this has never
> before been viewed as an issue?
>
> The one I'm aware of was releasing a product "including Apache Spark 2.0"
> before it existed, which does seem to potentially cause confusion, and that
> was addressed.
>
> Can you describe what policy is violated? we can disagree about what we'd
> prefer or not, but the question is, what if anything is disallowed? I'm not
> seeing that.
>
>
>> 3. This company not only causes the 'Scala Version Segmentation'
>> environment in a subtle way, but also defames Apache Spark 3.4.0 by
>> removing many bug fixes of SPARK-40436 (Upgrade Scala to 2.12.17) and
>> SPARK-39414 (Upgrade Scala to 2.12.16) for some unknown reason. Apparently,
>> this looks like not a superior version of Apache Spark 3.4.0. For me, it's
>> the inferior version. If a company disagrees with Scala 2.12.17 for some
>> internal reason, they are able to stick to 2.12.15, of course. However,
>> Apache Spark PMC should not allow them to lie to the customers that "Apache
>> Spark 3.4.0" uses Scala 2.12.15 by default. That's the reason why I
>> initiated this email because I'm considering this as a serious blocker to
>> make Apache Spark Scala improvement.
>>     - https://github.com/scala/scala/releases/tag/v2.12.17 (21 Merged
>> PRs)
>>     - https://github.com/scala/scala/releases/tag/v2.12.16 (68 Merged
>> PRs)
>>
>
> To be clear, this seems unrelated to your first two points above?
>
> I'm having trouble following what you are arguing here. You are saying a
> vendor release based on "Apache Spark 3.4.0" is not the same in some
> material way that you don't like. That's a fine position to take, but I
> think the product is still substantially describable as "Apache Spark
> 3.4.0 + patches". You can take up the issue with the vendor.
>
> But more importantly, I am not seeing how that constrains anything in
> Apache Spark? those updates were merged to OSS. But even taking up the
> point you describe, why is the scala maintenance version even such a
> material issue that is so severe it warrants PMC action?
>
> Could you connect the dots a little more?
>
>
>>

Reply via email to