Thanks a lot for the questions and comments/feedback! To address your questions Dongjoon, I do not intend for these updates to the documentation to be tied to the potential changes/suggestions you ask about.
In other words, this proposal is only about adjusting the documentation to target the majority of people reading it - namely the large and growing number of Python users - and new users in particular as they are often already familiar with and have a preference for Python when evaluating or starting to use Spark. While we may want to strengthen support for Python in other ways, I think such efforts should be tracked separately from this. Allan On Thu, Feb 23, 2023 at 1:44 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > If this is not just flip flopping the document pages and involves other > changes, then a proper impact analysis needs to be done to assess the > efforts involved. Personally I don't think it really matters. > > HTH > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 23 Feb 2023 at 01:40, Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> > 1. Does this suggestion imply Python API implementation will be the new >> blocker in the future in terms of feature parity among languages? Until >> now, Python API feature parity was one of the audit items because it's not >> enforced. In other words, Scala and Java have been the full feature because >> they are the underlying main developer languages while Python/R/SQL >> environments were the nice-to-have. >> >> I think it wouldn't be treated as a blocker .. but I do believe we have >> added all new features into the Python side for the last couple of >> releases. So, I wouldn't worry about this at this moment - we have been >> doing fine in terms of feature parity. >> >> > 2. Does this suggestion assume that the Python environment is easier >> for users than Scala/Java always? Given that we support Python 3.8 to 3.11, >> the support matrix for Python library dependency is a problem for the >> Apache Spark community to solve in order to claim that. As we say >> at SPARK-41454, Python language also introduces breaking changes to us >> historically and we have many `Pinned` python libraries issues. >> >> Yes. In fact, regardless of this change, I do believe we should test more >> versions, etc. At least scheduled jobs like we're doing JDK and Scala >> versions. >> >> >> FWIW, my take about this change is: people use Python and PySpark more >> (according to the chart and stats provided) so let's put those examples >> first :-). >> >> >> On Thu, 23 Feb 2023 at 10:27, Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> I have two questions to clarify the scope and boundaries. >>> >>> 1. Does this suggestion imply Python API implementation will be the new >>> blocker in the future in terms of feature parity among languages? Until >>> now, Python API feature parity was one of the audit items because it's not >>> enforced. In other words, Scala and Java have been the full feature because >>> they are the underlying main developer languages while Python/R/SQL >>> environments were the nice-to-have. >>> >>> 2. Does this suggestion assume that the Python environment is easier for >>> users than Scala/Java always? Given that we support Python 3.8 to 3.11, the >>> support matrix for Python library dependency is a problem for the Apache >>> Spark community to solve in order to claim that. As we say at SPARK-41454, >>> Python language also introduces breaking changes to us historically and we >>> have many `Pinned` python libraries issues. >>> >>> Changing documentation is easy, but I hope we can give clear >>> communication and direction in this effort because this is one of the most >>> user-facing changes. >>> >>> Dongjoon. >>> >>> On Wed, Feb 22, 2023 at 5:26 PM 416161...@qq.com <ruife...@foxmail.com> >>> wrote: >>> >>>> +1 LGTM >>>> >>>> ------------------------------ >>>> Ruifeng Zheng >>>> ruife...@foxmail.com >>>> >>>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=Ruifeng+Zheng&icon=https%3A%2F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=> >>>> >>>> >>>> >>>> ------------------ Original ------------------ >>>> *From:* "Xinrong Meng" <xinrong.apa...@gmail.com>; >>>> *Date:* Thu, Feb 23, 2023 09:17 AM >>>> *To:* "Allan Folting"<afolting...@gmail.com>; >>>> *Cc:* "dev"<dev@spark.apache.org>; >>>> *Subject:* Re: [DISCUSS] Show Python code examples first in Spark >>>> documentation >>>> >>>> +1 Good idea! >>>> >>>> On Thu, Feb 23, 2023 at 7:41 AM Jack Goodson <jackagood...@gmail.com> >>>> wrote: >>>> >>>>> Good idea, at the company I work at we discussed using Scala as our >>>>> primary language because technically it is slightly stronger than python >>>>> but ultimately chose python in the end as it’s easier for other devs to be >>>>> on boarded to our platform and future hiring for the team etc would be >>>>> easier >>>>> >>>>> On Thu, 23 Feb 2023 at 12:20 PM, Hyukjin Kwon <gurwls...@gmail.com> >>>>> wrote: >>>>> >>>>>> +1 I like this idea too. >>>>>> >>>>>> On Thu, Feb 23, 2023 at 6:00 AM Allan Folting <afolting...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I would like to propose that we show Python code examples first in >>>>>>> the Spark documentation where we have multiple programming language >>>>>>> examples. >>>>>>> An example is on the Quick Start page: >>>>>>> https://spark.apache.org/docs/latest/quick-start.html >>>>>>> >>>>>>> I propose this change because Python has become more popular than >>>>>>> the other languages supported in Apache Spark. There are a lot more >>>>>>> users >>>>>>> of Spark in Python than Scala today and Python attracts a broader set of >>>>>>> new users. >>>>>>> For Python usage data, see https://www.tiobe.com/tiobe-index/ and >>>>>>> https://insights.stackoverflow.com/trends?tags=r%2Cscala%2Cpython%2Cjava >>>>>>> . >>>>>>> >>>>>>> Also, this change aligns with Python already being the first tab on >>>>>>> our home page: >>>>>>> https://spark.apache.org/ >>>>>>> >>>>>>> Anyone who wants to use another language can still just click on the >>>>>>> other tabs. >>>>>>> >>>>>>> I created a draft PR for the Spark SQL, DataFrames and Datasets >>>>>>> Guide page as a first step: >>>>>>> https://github.com/apache/spark/pull/40087 >>>>>>> >>>>>>> >>>>>>> I would appreciate it if you could share your thoughts on this >>>>>>> proposal. >>>>>>> >>>>>>> >>>>>>> Thanks a lot, >>>>>>> Allan Folting >>>>>>> >>>>>>