I think all the slaves need the same (or a compatible) version of Python
installed since they run Python code in PySpark jobs natively.

On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:

> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed
>>
>> Not to nitpick, but maybe this is important. The Python license is 
>> GPL-compatible
>> but not GPL <https://docs.python.org/3/license.html>:
>>
>> Note GPL-compatible doesn’t mean that we’re distributing Python under the
>> GPL. All Python licenses, unlike the GPL, let you distribute a modified
>> version without making your changes open source. The GPL-compatible
>> licenses make it possible to combine Python with other software that is
>> released under the GPL; the others don’t.
>>
>> Nick
>> ​
>>
>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i do not think so.
>>>
>>> does the python 2.7 need to be installed on all slaves? if so, we do not
>>> have direct access to those.
>>>
>>> also, spark is easy for us to ship with our software since its apache 2
>>> licensed, and it only needs to be present on the machine that launches the
>>> app (thanks to yarn).
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed, so the
>>> client would have to download it and install it themselves, and this would
>>> mean its an independent install which has to be audited and approved and
>>> now you are in for a lot of fun. basically it will never happen.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <joshro...@databricks.com>
>>> wrote:
>>>
>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>>> imagine that they're also capable of installing a standalone Python
>>>> alongside that Spark version (without changing Python systemwide). For
>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>> require any special permissions to install (you don't need root / sudo
>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>
>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> yeah, the practical concern is that we have no control over java or
>>>>> python version on large company clusters. our current reality for the vast
>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>
>>>>> i dont like it either, but i cannot change it.
>>>>>
>>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>> customers.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>
>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>> riding out Python 2.6 until then.
>>>>>>
>>>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>>>> for the next several years? Even though the core Python devs stopped
>>>>>> supporting it in 2013?
>>>>>>
>>>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>>>> support? What are the criteria?
>>>>>>
>>>>>> I understand the practical concern here. If companies are stuck using
>>>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>>>> concern against the maintenance burden on this project, I would say that
>>>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position 
>>>>>> to
>>>>>> take. There are many tiny annoyances one has to put up with to support 
>>>>>> 2.6.
>>>>>>
>>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>>
>>>>>> Nick
>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>> ju...@esbet.es>님이 작성:
>>>>>>
>>>>>>> Unfortunately, Koert is right.
>>>>>>>
>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>
>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>> IMO.
>>>>>>>
>>>>>>>
>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>> escribió:
>>>>>>>
>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>
>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>
>>>>>>> so i think its a bad idea
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>> juliet.hougl...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>> support sounds very reasonable to me.
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>
>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>> believe we currently do).
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <allenzhang...@126.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> plus 1,
>>>>>>>>>>
>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <meethu.mat...@flytxt.com>
>>>>>>>>>> 写道:
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>> We use Python 2.7
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Meethu Mathew
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <r...@databricks.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Does anybody here care about us dropping support for Python 2.6
>>>>>>>>>>> in Spark 2.0?
>>>>>>>>>>>
>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that 
>>>>>>>>>>> Spark
>>>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious 
>>>>>>>>>>> if
>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

Reply via email to