Thanks a lot for the questions and comments/feedback!

To address your questions Dongjoon, I do not intend for these updates to
the documentation to be tied to the potential changes/suggestions you ask
about.

In other words, this proposal is only about adjusting the documentation to
target the majority of people reading it - namely the large and growing
number of Python users - and new users in particular as they are often
already familiar with and have a preference for Python when evaluating or
starting to use Spark.

While we may want to strengthen support for Python in other ways, I think
such efforts should be tracked separately from this.

Allan

On Thu, Feb 23, 2023 at 1:44 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> If this is not just flip flopping the document pages and involves other
> changes, then a proper impact analysis needs to be done to assess the
> efforts involved. Personally I don't think it really matters.
>
> HTH
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 23 Feb 2023 at 01:40, Hyukjin Kwon <gurwls...@gmail.com> wrote:
>
>> > 1. Does this suggestion imply Python API implementation will be the new
>> blocker in the future in terms of feature parity among languages? Until
>> now, Python API feature parity was one of the audit items because it's not
>> enforced. In other words, Scala and Java have been the full feature because
>> they are the underlying main developer languages while Python/R/SQL
>> environments were the nice-to-have.
>>
>> I think it wouldn't be treated as a blocker .. but I do believe we have
>> added all new features into the Python side for the last couple of
>> releases. So, I wouldn't worry about this at this moment - we have been
>> doing fine in terms of feature parity.
>>
>> > 2. Does this suggestion assume that the Python environment is easier
>> for users than Scala/Java always? Given that we support Python 3.8 to 3.11,
>> the support matrix for Python library dependency is a problem for the
>> Apache Spark community to solve in order to claim that. As we say
>> at SPARK-41454, Python language also introduces breaking changes to us
>> historically and we have many `Pinned` python libraries issues.
>>
>> Yes. In fact, regardless of this change, I do believe we should test more
>> versions, etc. At least scheduled jobs like we're doing JDK and Scala
>> versions.
>>
>>
>> FWIW, my take about this change is: people use Python and PySpark more
>> (according to the chart and stats provided) so let's put those examples
>> first :-).
>>
>>
>> On Thu, 23 Feb 2023 at 10:27, Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>>
>>> I have two questions to clarify the scope and boundaries.
>>>
>>> 1. Does this suggestion imply Python API implementation will be the new
>>> blocker in the future in terms of feature parity among languages? Until
>>> now, Python API feature parity was one of the audit items because it's not
>>> enforced. In other words, Scala and Java have been the full feature because
>>> they are the underlying main developer languages while Python/R/SQL
>>> environments were the nice-to-have.
>>>
>>> 2. Does this suggestion assume that the Python environment is easier for
>>> users than Scala/Java always? Given that we support Python 3.8 to 3.11, the
>>> support matrix for Python library dependency is a problem for the Apache
>>> Spark community to solve in order to claim that. As we say at SPARK-41454,
>>> Python language also introduces breaking changes to us historically and we
>>> have many `Pinned` python libraries issues.
>>>
>>> Changing documentation is easy, but I hope we can give clear
>>> communication and direction in this effort because this is one of the most
>>> user-facing changes.
>>>
>>> Dongjoon.
>>>
>>> On Wed, Feb 22, 2023 at 5:26 PM 416161...@qq.com <ruife...@foxmail.com>
>>> wrote:
>>>
>>>> +1 LGTM
>>>>
>>>> ------------------------------
>>>> Ruifeng Zheng
>>>> ruife...@foxmail.com
>>>>
>>>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=Ruifeng+Zheng&icon=https%3A%2F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=>
>>>>
>>>>
>>>>
>>>> ------------------ Original ------------------
>>>> *From:* "Xinrong Meng" <xinrong.apa...@gmail.com>;
>>>> *Date:* Thu, Feb 23, 2023 09:17 AM
>>>> *To:* "Allan Folting"<afolting...@gmail.com>;
>>>> *Cc:* "dev"<dev@spark.apache.org>;
>>>> *Subject:* Re: [DISCUSS] Show Python code examples first in Spark
>>>> documentation
>>>>
>>>> +1 Good idea!
>>>>
>>>> On Thu, Feb 23, 2023 at 7:41 AM Jack Goodson <jackagood...@gmail.com>
>>>> wrote:
>>>>
>>>>> Good idea, at the company I work at we discussed using Scala as our
>>>>> primary language because technically it is slightly stronger than python
>>>>> but ultimately chose python in the end as it’s easier for other devs to be
>>>>> on boarded to our platform and future hiring for the team etc would be
>>>>> easier
>>>>>
>>>>> On Thu, 23 Feb 2023 at 12:20 PM, Hyukjin Kwon <gurwls...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1 I like this idea too.
>>>>>>
>>>>>> On Thu, Feb 23, 2023 at 6:00 AM Allan Folting <afolting...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I would like to propose that we show Python code examples first in
>>>>>>> the Spark documentation where we have multiple programming language
>>>>>>> examples.
>>>>>>> An example is on the Quick Start page:
>>>>>>> https://spark.apache.org/docs/latest/quick-start.html
>>>>>>>
>>>>>>> I propose this change because Python has become more popular than
>>>>>>> the other languages supported in Apache Spark. There are a lot more 
>>>>>>> users
>>>>>>> of Spark in Python than Scala today and Python attracts a broader set of
>>>>>>> new users.
>>>>>>> For Python usage data, see https://www.tiobe.com/tiobe-index/ and
>>>>>>> https://insights.stackoverflow.com/trends?tags=r%2Cscala%2Cpython%2Cjava
>>>>>>> .
>>>>>>>
>>>>>>> Also, this change aligns with Python already being the first tab on
>>>>>>> our home page:
>>>>>>> https://spark.apache.org/
>>>>>>>
>>>>>>> Anyone who wants to use another language can still just click on the
>>>>>>> other tabs.
>>>>>>>
>>>>>>> I created a draft PR for the Spark SQL, DataFrames and Datasets
>>>>>>> Guide page as a first step:
>>>>>>> https://github.com/apache/spark/pull/40087
>>>>>>>
>>>>>>>
>>>>>>> I would appreciate it if you could share your thoughts on this
>>>>>>> proposal.
>>>>>>>
>>>>>>>
>>>>>>> Thanks a lot,
>>>>>>> Allan Folting
>>>>>>>
>>>>>>

Reply via email to