> 1. Does this suggestion imply Python API implementation will be the new
blocker in the future in terms of feature parity among languages? Until
now, Python API feature parity was one of the audit items because it's not
enforced. In other words, Scala and Java have been the full feature because
they are the underlying main developer languages while Python/R/SQL
environments were the nice-to-have.

I think it wouldn't be treated as a blocker .. but I do believe we have
added all new features into the Python side for the last couple of
releases. So, I wouldn't worry about this at this moment - we have been
doing fine in terms of feature parity.

> 2. Does this suggestion assume that the Python environment is easier for
users than Scala/Java always? Given that we support Python 3.8 to 3.11, the
support matrix for Python library dependency is a problem for the Apache
Spark community to solve in order to claim that. As we say at SPARK-41454,
Python language also introduces breaking changes to us historically and we
have many `Pinned` python libraries issues.

Yes. In fact, regardless of this change, I do believe we should test more
versions, etc. At least scheduled jobs like we're doing JDK and Scala
versions.


FWIW, my take about this change is: people use Python and PySpark more
(according to the chart and stats provided) so let's put those examples
first :-).


On Thu, 23 Feb 2023 at 10:27, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:

> I have two questions to clarify the scope and boundaries.
>
> 1. Does this suggestion imply Python API implementation will be the new
> blocker in the future in terms of feature parity among languages? Until
> now, Python API feature parity was one of the audit items because it's not
> enforced. In other words, Scala and Java have been the full feature because
> they are the underlying main developer languages while Python/R/SQL
> environments were the nice-to-have.
>
> 2. Does this suggestion assume that the Python environment is easier for
> users than Scala/Java always? Given that we support Python 3.8 to 3.11, the
> support matrix for Python library dependency is a problem for the Apache
> Spark community to solve in order to claim that. As we say at SPARK-41454,
> Python language also introduces breaking changes to us historically and we
> have many `Pinned` python libraries issues.
>
> Changing documentation is easy, but I hope we can give clear
> communication and direction in this effort because this is one of the most
> user-facing changes.
>
> Dongjoon.
>
> On Wed, Feb 22, 2023 at 5:26 PM 416161...@qq.com <ruife...@foxmail.com>
> wrote:
>
>> +1 LGTM
>>
>> ------------------------------
>> Ruifeng Zheng
>> ruife...@foxmail.com
>>
>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=Ruifeng+Zheng&icon=https%3A%2F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=>
>>
>>
>>
>> ------------------ Original ------------------
>> *From:* "Xinrong Meng" <xinrong.apa...@gmail.com>;
>> *Date:* Thu, Feb 23, 2023 09:17 AM
>> *To:* "Allan Folting"<afolting...@gmail.com>;
>> *Cc:* "dev"<dev@spark.apache.org>;
>> *Subject:* Re: [DISCUSS] Show Python code examples first in Spark
>> documentation
>>
>> +1 Good idea!
>>
>> On Thu, Feb 23, 2023 at 7:41 AM Jack Goodson <jackagood...@gmail.com>
>> wrote:
>>
>>> Good idea, at the company I work at we discussed using Scala as our
>>> primary language because technically it is slightly stronger than python
>>> but ultimately chose python in the end as it’s easier for other devs to be
>>> on boarded to our platform and future hiring for the team etc would be
>>> easier
>>>
>>> On Thu, 23 Feb 2023 at 12:20 PM, Hyukjin Kwon <gurwls...@gmail.com>
>>> wrote:
>>>
>>>> +1 I like this idea too.
>>>>
>>>> On Thu, Feb 23, 2023 at 6:00 AM Allan Folting <afolting...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I would like to propose that we show Python code examples first in the
>>>>> Spark documentation where we have multiple programming language examples.
>>>>> An example is on the Quick Start page:
>>>>> https://spark.apache.org/docs/latest/quick-start.html
>>>>>
>>>>> I propose this change because Python has become more popular than the
>>>>> other languages supported in Apache Spark. There are a lot more users of
>>>>> Spark in Python than Scala today and Python attracts a broader set of new
>>>>> users.
>>>>> For Python usage data, see https://www.tiobe.com/tiobe-index/ and
>>>>> https://insights.stackoverflow.com/trends?tags=r%2Cscala%2Cpython%2Cjava
>>>>> .
>>>>>
>>>>> Also, this change aligns with Python already being the first tab on
>>>>> our home page:
>>>>> https://spark.apache.org/
>>>>>
>>>>> Anyone who wants to use another language can still just click on the
>>>>> other tabs.
>>>>>
>>>>> I created a draft PR for the Spark SQL, DataFrames and Datasets Guide
>>>>> page as a first step:
>>>>> https://github.com/apache/spark/pull/40087
>>>>>
>>>>>
>>>>> I would appreciate it if you could share your thoughts on this
>>>>> proposal.
>>>>>
>>>>>
>>>>> Thanks a lot,
>>>>> Allan Folting
>>>>>
>>>>

Reply via email to