So I think if the Spark PMC wants to ask Databricks something that could be
reasonable (although I'm a little fuzzy as to the ask), but that
conversation might belong on private@ (I could be wrong of course).

On Tue, Jun 6, 2023 at 3:29 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> I concur with you Sean.
>
> If I understand correctly the point raised by the thread owner, in
> heterogeneous environments that we work, it is up to the practitioner to
> ensure that there is version compatibility among OS versions, spark version
> and the target artefact in consideration. For example if I try to connect
> to Google BigQuery from spark 3.4.0, my OS or for that matter, the docker
> needs to run Java 8 regardless of  spark Java version, otherwise it will
> fail.
>
> I think these details should be left to the trenches, because these
> arguments about versioning become tangential in the big picture.  Case in
> point, my current OS scala version is 2.13.8 but works fine with Spark
> built on 2.12.17.
>
> HTH
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 6 Jun 2023 at 01:37, Sean Owen <sro...@gmail.com> wrote:
>
>> I think the issue is whether a distribution of Spark is so materially
>> different from OSS that it causes problems for the larger community of
>> users. There's a legitimate question of whether such a thing can be called
>> "Apache Spark + changes", as describing it that way becomes meaningfully
>> inaccurate. And if it's inaccurate, then it's a trademark usage issue, and
>> a matter for the PMC to act on. I certainly recall this type of problem
>> from the early days of Hadoop - the project itself had 2 or 3 live branches
>> in development (was it 0.20.x vs 0.23.x vs 1.x? YARN vs no YARN?) picked up
>> by different vendors and it was unclear what "Apache Hadoop" meant in a
>> vendor distro. Or frankly, upstream.
>>
>> In comparison, variation in Scala maintenance release seems trivial. I'm
>> not clear from the thread what actual issue this causes to users. Is there
>> more to it - does this go hand in hand with JDK version and Ammonite, or
>> are those separate? What's an example of the practical user issue. Like, I
>> compile vs Spark 3.4.0 and because of Scala version differences it doesn't
>> run on some vendor distro? That's not great, but seems like a vendor
>> problem. Unless you tell me we are getting tons of bug reports to OSS Spark
>> as a result or something.
>>
>> Is the implication that something in OSS Spark is being blocked to prefer
>> some set of vendor choices? because the changes you're pointing to seem to
>> be going into Apache Spark, actually. It'd be more useful to be specific
>> and name names at this point, seems fine.
>>
>> The rest of this is just a discussion about Databricks choices. (If it's
>> not clear, I'm at Databricks but do not work on the Spark distro). We can
>> discuss but it seems off-topic _if_ it can't be connected to a problem for
>> OSS Spark. Anyway:
>>
>> If it helps, _some_ important patches are described at
>> https://docs.databricks.com/release-notes/runtime/maintenance-updates.html
>> ; I don't think this is exactly hidden.
>>
>> Out of curiosity, how would you describe this software in the UI instead?
>> "3.4.0" is shorthand, because this is a little dropdown menu; the terminal
>> output is likewise not a place to list all patches. You would propose
>> requesting calling this "3.4.0 + patches"? That's the best I can think of,
>> but I don't think it addresses what you're getting at anyway. I think you'd
>> just prefer Databricks make a different choice, which is legitimate, but,
>> an issue to take up with Databricks, not here.
>>
>>
>> On Mon, Jun 5, 2023 at 6:58 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>>
>>> Hi, Sean.
>>>
>>> "+ patches" or "powered by Apache Spark 3.4.0" is not a problem as you
>>> mentioned. For the record, I also didn't bring up any old story here.
>>>
>>> > "Apache Spark 3.4.0 + patches"
>>>
>>> However, "including Apache Spark 3.4.0" still causes confusion even in a
>>> different way because of those missing patches, SPARK-40436 (Upgrade Scala
>>> to 2.12.17) and SPARK-39414 (Upgrade Scala to 2.12.16). Technically,
>>> Databricks Runtime doesn't include Apache Spark 3.4.0 while it claims it to
>>> the users.
>>>
>>> [image: image.png]
>>>
>>> It's a sad story from the Apache Spark Scala perspective because the
>>> users cannot even try to use the correct Scala 2.12.17 version in the
>>> runtime.
>>>
>>> All items I've shared are connected via a single theme, hurting Apache
>>> Spark Scala users.
>>> From (1) building Spark, (2) creating a fragmented Scala Spark runtime
>>> environment and (3) hidden user-facing documentation.
>>>
>>> Of course, I don't think those are designed in an organized way
>>> intentionally. It just happens at the same time.
>>>
>>> Based on your comments, let me ask you two questions. (1) When
>>> Databricks builds its internal Spark from its private code repository, is
>>> it a company policy to always expose "Apache 3.4.0" to the users like the
>>> following by ignoring all changes (whatever they are). And, (2) Do you
>>> insist that it is normative and clear to the users and the community?
>>>
>>> > - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark
>>> version 3.4.0"
>>> > - UI shows Apache Spark logo and `3.4.0`.
>>>
>>>>
>>>>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Reply via email to