Yes, python script also works, without the notebook.

in python script (.py) scenario, there will be no sparkContext created by
default. Can run the scipt as follows:


On Thu, Sep 15, 2016 at 9:55 AM, Nirmal Fernando <[email protected]> wrote:

> This is great! Yes, @Supun let's try a python script as well.
>
> On Thu, Sep 15, 2016 at 7:46 AM, Srinath Perera <[email protected]> wrote:
>
>> Thanks Supun! I believe this would work.
>>
>> I assume same will work with python scripts ( without notebooks as well).
>>
>> --Srinath
>>
>>
>>
>> On Wed, Sep 14, 2016 at 6:47 PM, Supun Sethunga <[email protected]> wrote:
>>
>>> Hi Srinath/Nirmal
>>>
>>> I managed to get the $subject working. Here I connected iPython/Jupyter
>>> Notebook to pyspark, and pyspark submits the job to the remote spark
>>> cluster (created by DAS).  One of the advantages of using Notebook is that
>>> a user can load the data in DAS tables as spark dataframe, and can
>>> interactively work on it.
>>>
>>> But it also have the following limitations:
>>>
>>>    - Client side need a spark distribution. (To use pyspark)
>>>    - Have to limit the cores allocated to the Spark App used by DAS
>>>    (CarbonAnalytics), so that the Spark App created by pySpark can run in
>>>    parallel.
>>>    - Have to set the spark-classpath at the client side, with the jars
>>>    used by DAS, so that the once the job is submitted, spark-executor knows
>>>    where to look for the classes.
>>>
>>>
>>> *Training Models:*
>>>
>>> As we discussed offline, for large datasets, we can directly use
>>> algorithms in spark's mllib and ml. This is very straight forward, as the
>>> data we get from DAS is a spark-dataframe, and hence can train models on
>>> top of the dataframe (or can convert it to rdd).
>>> And for small and medium datasets, we can convert the spark-dataframe to
>>> pandas-dataframe using df.toPandas(), which will load all the to memory,
>>> and then train sklearn algorithms on top of that.
>>>
>>> A sample python script can b found at [1].
>>>
>>> [1] https://github.com/SupunS/play-ground/blob/master/pyspar
>>> k/PySpark-Sample.ipynb
>>>
>>> --
>>> *Supun Sethunga*
>>> Senior Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>> Blog: http://supunsetunga.blogspot.com
>>>
>>
>>
>>
>> --
>> ============================
>> Srinath Perera, Ph.D.
>>    http://people.apache.org/~hemapani/
>>    http://srinathsview.blogspot.com/
>>
>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Team Lead - WSO2 Machine Learner
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>


-- 
*Supun Sethunga*
Senior Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
Blog: http://supunsetunga.blogspot.com
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to