So I think if the Spark PMC wants to ask Databricks something that could be reasonable (although I'm a little fuzzy as to the ask), but that conversation might belong on private@ (I could be wrong of course).
On Tue, Jun 6, 2023 at 3:29 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > I concur with you Sean. > > If I understand correctly the point raised by the thread owner, in > heterogeneous environments that we work, it is up to the practitioner to > ensure that there is version compatibility among OS versions, spark version > and the target artefact in consideration. For example if I try to connect > to Google BigQuery from spark 3.4.0, my OS or for that matter, the docker > needs to run Java 8 regardless of spark Java version, otherwise it will > fail. > > I think these details should be left to the trenches, because these > arguments about versioning become tangential in the big picture. Case in > point, my current OS scala version is 2.13.8 but works fine with Spark > built on 2.12.17. > > HTH > > Mich Talebzadeh, > Lead Solutions Architect/Engineering Lead > Palantir Technologies Limited > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 6 Jun 2023 at 01:37, Sean Owen <sro...@gmail.com> wrote: > >> I think the issue is whether a distribution of Spark is so materially >> different from OSS that it causes problems for the larger community of >> users. There's a legitimate question of whether such a thing can be called >> "Apache Spark + changes", as describing it that way becomes meaningfully >> inaccurate. And if it's inaccurate, then it's a trademark usage issue, and >> a matter for the PMC to act on. I certainly recall this type of problem >> from the early days of Hadoop - the project itself had 2 or 3 live branches >> in development (was it 0.20.x vs 0.23.x vs 1.x? YARN vs no YARN?) picked up >> by different vendors and it was unclear what "Apache Hadoop" meant in a >> vendor distro. Or frankly, upstream. >> >> In comparison, variation in Scala maintenance release seems trivial. I'm >> not clear from the thread what actual issue this causes to users. Is there >> more to it - does this go hand in hand with JDK version and Ammonite, or >> are those separate? What's an example of the practical user issue. Like, I >> compile vs Spark 3.4.0 and because of Scala version differences it doesn't >> run on some vendor distro? That's not great, but seems like a vendor >> problem. Unless you tell me we are getting tons of bug reports to OSS Spark >> as a result or something. >> >> Is the implication that something in OSS Spark is being blocked to prefer >> some set of vendor choices? because the changes you're pointing to seem to >> be going into Apache Spark, actually. It'd be more useful to be specific >> and name names at this point, seems fine. >> >> The rest of this is just a discussion about Databricks choices. (If it's >> not clear, I'm at Databricks but do not work on the Spark distro). We can >> discuss but it seems off-topic _if_ it can't be connected to a problem for >> OSS Spark. Anyway: >> >> If it helps, _some_ important patches are described at >> https://docs.databricks.com/release-notes/runtime/maintenance-updates.html >> ; I don't think this is exactly hidden. >> >> Out of curiosity, how would you describe this software in the UI instead? >> "3.4.0" is shorthand, because this is a little dropdown menu; the terminal >> output is likewise not a place to list all patches. You would propose >> requesting calling this "3.4.0 + patches"? That's the best I can think of, >> but I don't think it addresses what you're getting at anyway. I think you'd >> just prefer Databricks make a different choice, which is legitimate, but, >> an issue to take up with Databricks, not here. >> >> >> On Mon, Jun 5, 2023 at 6:58 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Hi, Sean. >>> >>> "+ patches" or "powered by Apache Spark 3.4.0" is not a problem as you >>> mentioned. For the record, I also didn't bring up any old story here. >>> >>> > "Apache Spark 3.4.0 + patches" >>> >>> However, "including Apache Spark 3.4.0" still causes confusion even in a >>> different way because of those missing patches, SPARK-40436 (Upgrade Scala >>> to 2.12.17) and SPARK-39414 (Upgrade Scala to 2.12.16). Technically, >>> Databricks Runtime doesn't include Apache Spark 3.4.0 while it claims it to >>> the users. >>> >>> [image: image.png] >>> >>> It's a sad story from the Apache Spark Scala perspective because the >>> users cannot even try to use the correct Scala 2.12.17 version in the >>> runtime. >>> >>> All items I've shared are connected via a single theme, hurting Apache >>> Spark Scala users. >>> From (1) building Spark, (2) creating a fragmented Scala Spark runtime >>> environment and (3) hidden user-facing documentation. >>> >>> Of course, I don't think those are designed in an organized way >>> intentionally. It just happens at the same time. >>> >>> Based on your comments, let me ask you two questions. (1) When >>> Databricks builds its internal Spark from its private code repository, is >>> it a company policy to always expose "Apache 3.4.0" to the users like the >>> following by ignoring all changes (whatever they are). And, (2) Do you >>> insist that it is normative and clear to the users and the community? >>> >>> > - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark >>> version 3.4.0" >>> > - UI shows Apache Spark logo and `3.4.0`. >>> >>>> >>>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau