Sorry prev reply was incomplete. :) In python script (.py) scenario, there will be no sparkContext created by default. So the sc.stop() we used in [1] won't be needed. Thats the only change to the script.
We can run it as follows: <SPARK_HOME>./bin/spark-submit --master spark://<*spark-master-ip*>:7077 --conf "spark.driver.extraJavaOptions=-Dwso2_custom_conf_dir=/home/supun/Downloads/wso2das-3.1.0/repository/conf" /home/supun/Supun/MachineLearning/python/PySpark-Sample.py [1] https://github.com/SupunS/play-ground/blob/master/pyspar k/PySpark-Sample.ipynb Regards, Supun On Thu, Sep 15, 2016 at 11:23 AM, Supun Sethunga <[email protected]> wrote: > Yes, python script also works, without the notebook. > > in python script (.py) scenario, there will be no sparkContext created by > default. Can run the scipt as follows: > > > On Thu, Sep 15, 2016 at 9:55 AM, Nirmal Fernando <[email protected]> wrote: > >> This is great! Yes, @Supun let's try a python script as well. >> >> On Thu, Sep 15, 2016 at 7:46 AM, Srinath Perera <[email protected]> wrote: >> >>> Thanks Supun! I believe this would work. >>> >>> I assume same will work with python scripts ( without notebooks as >>> well). >>> >>> --Srinath >>> >>> >>> >>> On Wed, Sep 14, 2016 at 6:47 PM, Supun Sethunga <[email protected]> wrote: >>> >>>> Hi Srinath/Nirmal >>>> >>>> I managed to get the $subject working. Here I connected iPython/Jupyter >>>> Notebook to pyspark, and pyspark submits the job to the remote spark >>>> cluster (created by DAS). One of the advantages of using Notebook is that >>>> a user can load the data in DAS tables as spark dataframe, and can >>>> interactively work on it. >>>> >>>> But it also have the following limitations: >>>> >>>> - Client side need a spark distribution. (To use pyspark) >>>> - Have to limit the cores allocated to the Spark App used by DAS >>>> (CarbonAnalytics), so that the Spark App created by pySpark can run in >>>> parallel. >>>> - Have to set the spark-classpath at the client side, with the jars >>>> used by DAS, so that the once the job is submitted, spark-executor knows >>>> where to look for the classes. >>>> >>>> >>>> *Training Models:* >>>> >>>> As we discussed offline, for large datasets, we can directly use >>>> algorithms in spark's mllib and ml. This is very straight forward, as the >>>> data we get from DAS is a spark-dataframe, and hence can train models on >>>> top of the dataframe (or can convert it to rdd). >>>> And for small and medium datasets, we can convert the spark-dataframe >>>> to pandas-dataframe using df.toPandas(), which will load all the to memory, >>>> and then train sklearn algorithms on top of that. >>>> >>>> A sample python script can b found at [1]. >>>> >>>> [1] https://github.com/SupunS/play-ground/blob/master/pyspar >>>> k/PySpark-Sample.ipynb >>>> >>>> -- >>>> *Supun Sethunga* >>>> Senior Software Engineer >>>> WSO2, Inc. >>>> http://wso2.com/ >>>> lean | enterprise | middleware >>>> Mobile : +94 716546324 >>>> Blog: http://supunsetunga.blogspot.com >>>> >>> >>> >>> >>> -- >>> ============================ >>> Srinath Perera, Ph.D. >>> http://people.apache.org/~hemapani/ >>> http://srinathsview.blogspot.com/ >>> >> >> >> >> -- >> >> Thanks & regards, >> Nirmal >> >> Team Lead - WSO2 Machine Learner >> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >> Mobile: +94715779733 >> Blog: http://nirmalfdo.blogspot.com/ >> >> >> > > > -- > *Supun Sethunga* > Senior Software Engineer > WSO2, Inc. > http://wso2.com/ > lean | enterprise | middleware > Mobile : +94 716546324 > Blog: http://supunsetunga.blogspot.com > -- *Supun Sethunga* Senior Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324 Blog: http://supunsetunga.blogspot.com
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
