Sorry prev reply was incomplete. :)

In python script (.py) scenario, there will be no sparkContext created by
default. So the sc.stop() we used in [1] won't be needed. Thats the
only change to the script.

We can run it as follows:

 <SPARK_HOME>./bin/spark-submit --master spark://<*spark-master-ip*>:7077
--conf
"spark.driver.extraJavaOptions=-Dwso2_custom_conf_dir=/home/supun/Downloads/wso2das-3.1.0/repository/conf"
/home/supun/Supun/MachineLearning/python/PySpark-Sample.py

[1] https://github.com/SupunS/play-ground/blob/master/pyspar
k/PySpark-Sample.ipynb

Regards,
Supun

On Thu, Sep 15, 2016 at 11:23 AM, Supun Sethunga <[email protected]> wrote:

> Yes, python script also works, without the notebook.
>
> in python script (.py) scenario, there will be no sparkContext created by
> default. Can run the scipt as follows:
>
>
> On Thu, Sep 15, 2016 at 9:55 AM, Nirmal Fernando <[email protected]> wrote:
>
>> This is great! Yes, @Supun let's try a python script as well.
>>
>> On Thu, Sep 15, 2016 at 7:46 AM, Srinath Perera <[email protected]> wrote:
>>
>>> Thanks Supun! I believe this would work.
>>>
>>> I assume same will work with python scripts ( without notebooks as
>>> well).
>>>
>>> --Srinath
>>>
>>>
>>>
>>> On Wed, Sep 14, 2016 at 6:47 PM, Supun Sethunga <[email protected]> wrote:
>>>
>>>> Hi Srinath/Nirmal
>>>>
>>>> I managed to get the $subject working. Here I connected iPython/Jupyter
>>>> Notebook to pyspark, and pyspark submits the job to the remote spark
>>>> cluster (created by DAS).  One of the advantages of using Notebook is that
>>>> a user can load the data in DAS tables as spark dataframe, and can
>>>> interactively work on it.
>>>>
>>>> But it also have the following limitations:
>>>>
>>>>    - Client side need a spark distribution. (To use pyspark)
>>>>    - Have to limit the cores allocated to the Spark App used by DAS
>>>>    (CarbonAnalytics), so that the Spark App created by pySpark can run in
>>>>    parallel.
>>>>    - Have to set the spark-classpath at the client side, with the jars
>>>>    used by DAS, so that the once the job is submitted, spark-executor knows
>>>>    where to look for the classes.
>>>>
>>>>
>>>> *Training Models:*
>>>>
>>>> As we discussed offline, for large datasets, we can directly use
>>>> algorithms in spark's mllib and ml. This is very straight forward, as the
>>>> data we get from DAS is a spark-dataframe, and hence can train models on
>>>> top of the dataframe (or can convert it to rdd).
>>>> And for small and medium datasets, we can convert the spark-dataframe
>>>> to pandas-dataframe using df.toPandas(), which will load all the to memory,
>>>> and then train sklearn algorithms on top of that.
>>>>
>>>> A sample python script can b found at [1].
>>>>
>>>> [1] https://github.com/SupunS/play-ground/blob/master/pyspar
>>>> k/PySpark-Sample.ipynb
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Senior Software Engineer
>>>> WSO2, Inc.
>>>> http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324
>>>> Blog: http://supunsetunga.blogspot.com
>>>>
>>>
>>>
>>>
>>> --
>>> ============================
>>> Srinath Perera, Ph.D.
>>>    http://people.apache.org/~hemapani/
>>>    http://srinathsview.blogspot.com/
>>>
>>
>>
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Team Lead - WSO2 Machine Learner
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>
>
> --
> *Supun Sethunga*
> Senior Software Engineer
> WSO2, Inc.
> http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
> Blog: http://supunsetunga.blogspot.com
>



-- 
*Supun Sethunga*
Senior Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
Blog: http://supunsetunga.blogspot.com
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to